high_scalability high_scalability-2010 high_scalability-2010-759 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Ivan Zuzak wrote a fascinating article on Real-time feed processing and filtering using Google App Engine to build Feed-buster , a service that inserts MediaRSS tags into feeds that don't have them . He talks about using polling and PubSubHubBub (real-time) to process FriendFeed feeds. Ivan is trying to devise a separate filtering service where: filtering services should be applied as close to the publisher as possible so notifications that nobody wants don’t waste network resource. processing services should be applied as close to the subscriber so that the original update may be transported through the network as a single notification for as long as possible. Besides being a generally interesting article, Ivan makes an insightful observation on the nature of using polling services in combination with metered Infrastructure/Platform services: Polling is bad because AppEngine applications have a fixed free daily quota for consumed resources, when the numbe
sentIndex sentText sentNum sentScore
1 Ivan Zuzak wrote a fascinating article on Real-time feed processing and filtering using Google App Engine to build Feed-buster , a service that inserts MediaRSS tags into feeds that don't have them . [sent-1, score-0.842]
2 He talks about using polling and PubSubHubBub (real-time) to process FriendFeed feeds. [sent-2, score-0.457]
3 Ivan is trying to devise a separate filtering service where: filtering services should be applied as close to the publisher as possible so notifications that nobody wants don’t waste network resource. [sent-3, score-1.026]
4 processing services should be applied as close to the subscriber so that the original update may be transported through the network as a single notification for as long as possible. [sent-4, score-0.632]
5 This fits directly in with the ideas in Cloud Programming Directly Feeds Cost Allocation Back into Software Design. [sent-6, score-0.171]
6 My general preference is to poll a distributed queue for work items. [sent-7, score-0.221]
7 It's robust and allows your system to control it's own resource usage by determining when to poll. [sent-8, score-0.174]
8 Otherwise you can easily be overwhelmed by fast pushers. [sent-9, score-0.215]
9 Your budget is being overwhelmed by the polling requests. [sent-11, score-0.83]
10 And the more you try approximate real-time with frequent polling requests the more your budget is busted. [sent-12, score-0.812]
11 It's a cool example of how costs, algorithm, and platform choices all feed into and shape product architectures. [sent-13, score-0.358]
wordName wordTfidf (topN-words)
[('polling', 0.457), ('ivan', 0.334), ('feed', 0.215), ('overwhelmed', 0.215), ('feeds', 0.211), ('filtering', 0.164), ('budget', 0.158), ('ff', 0.149), ('incloud', 0.14), ('metered', 0.14), ('applied', 0.128), ('close', 0.122), ('daily', 0.119), ('friendfeed', 0.118), ('preference', 0.118), ('polls', 0.118), ('services', 0.118), ('quota', 0.116), ('transported', 0.113), ('approximate', 0.113), ('exhausted', 0.109), ('determining', 0.108), ('overwhelming', 0.106), ('poll', 0.103), ('service', 0.102), ('appengine', 0.102), ('observation', 0.097), ('tags', 0.096), ('consumed', 0.094), ('insightful', 0.094), ('directly', 0.092), ('notification', 0.09), ('frequent', 0.084), ('allocation', 0.084), ('article', 0.082), ('notifications', 0.08), ('waste', 0.079), ('fits', 0.079), ('shape', 0.078), ('wrote', 0.07), ('processed', 0.07), ('nobody', 0.069), ('robust', 0.066), ('fascinating', 0.066), ('choices', 0.065), ('otherwise', 0.062), ('algorithm', 0.062), ('generally', 0.061), ('original', 0.061), ('nature', 0.059)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000002 759 high scalability-2010-01-11-Strategy: Don't Use Polling for Real-time Feeds
Introduction: Ivan Zuzak wrote a fascinating article on Real-time feed processing and filtering using Google App Engine to build Feed-buster , a service that inserts MediaRSS tags into feeds that don't have them . He talks about using polling and PubSubHubBub (real-time) to process FriendFeed feeds. Ivan is trying to devise a separate filtering service where: filtering services should be applied as close to the publisher as possible so notifications that nobody wants don’t waste network resource. processing services should be applied as close to the subscriber so that the original update may be transported through the network as a single notification for as long as possible. Besides being a generally interesting article, Ivan makes an insightful observation on the nature of using polling services in combination with metered Infrastructure/Platform services: Polling is bad because AppEngine applications have a fixed free daily quota for consumed resources, when the numbe
2 0.18262731 205 high scalability-2008-01-10-Letting Clients Know What's Changed: Push Me or Pull Me?
Introduction: I had a false belief I thought I came here to stay We're all just visiting All just breaking like waves The oceans made me, but who came up with me? Push me, pull me, push me, or pull me out . So true Perl Jam (Push me Pull me lyrics) , so true. I too have wondered how web clients should be notified of model changes. Should servers push events to clients or should clients pull events from servers? A topic worthy of its own song if ever there was one. To pull events the client simply starts a timer and makes a request to the server. This is polling. You can either pull a complete set of fresh data or get a list of changes. The server "knows" if anything you are interested in has changed and makes those changes available to you. Knowing what has changed can be relatively simple with a publish-subscribe type backend or you can get very complex with fine grained bit maps of attributes and keeping per client state on what I client still needs to see. Polling is heavy man.
3 0.15146509 1538 high scalability-2013-10-28-Design Decisions for Scaling Your High Traffic Feeds
Introduction: Guest post by Thierry Schellenbach, Founder/CTO of Fashiolista.com, follow @tschellenbach on Twitter and Github Fashiolista started out as a hobby project which we built on the side. We had absolutely no idea it would grow into one of the largest online fashion communities. The entire first version took about two weeks to develop and our feed implementation was dead simple. We’ve come a long way since then and I’d like to share our experience with scaling feed systems. Feeds are a core component of many large startups such as Pinterest, Instagram, Wanelo and Fashiolista. At Fashiolista the feed system powers the flat feed , aggregated feed and the notification system. This article will explain the troubles we ran into when scaling our feeds and the design decisions involved with building your own solution. Understanding the basics of how these feed systems work is essential as more and more applications rely on them. Furthermore we’ve open sourced Feedly , the Python m
4 0.14129221 1051 high scalability-2011-06-01-Why is your network so slow? Your switch should tell you.
Introduction: Who hasn't cursed their network for being slow while waiting for that annoying little hour glass of pain to release all its grains of sand? But what's really going on? Is your network really slow? PacketPushers Show 45 – Arista – EOS Network Software Architecture has a good explanation of what may be really at fault (paraphrased): Network operators get calls from application guys saying the network is slow, but the problem is usually dropped packets due to congestion. It's not usually latency, it's usually packet loss. Packet loss causes TCP to back off and retransmit, which causes applications to appear slow. Packet loss can be caused by a flakey transceiver, but the problem is usually network congestion. Somewhere on the network there's fan-in, a bottleneck develops, queues build up to a certain point, and when a queue overflows it drops packets. Often the first sign of this happening is application slowness. Queues get deeper and deeper because the network is getting more
5 0.1284631 138 high scalability-2007-10-30-Feedblendr Architecture - Using EC2 to Scale
Introduction: A man had a dream. His dream was to blend a bunch of RSS/Atom/RDF feeds into a single feed. The man is Beau Lebens of Feedville and like most dreamers he was a little short on coin. So he took refuge in the home of a cheap hosting provider and Beau realized his dream, creating FEEDblendr . But FEEDblendr chewed up so much CPU creating blended feeds that the cheap hosting provider ordered Beau to find another home. Where was Beau to go? He eventually found a new home in the virtual machine room of Amazon's EC2. This is the story of how Beau was finally able to create his one feeds safe within the cradle of affordable CPU cycles. Site: http://feedblendr.com/ The Platform EC2 (Fedora Core 6 Lite distro) S3 Apache PHP MySQL DynDNS (for round robin DNS) The Stats Beau is a developer with some sysadmin skills, not a web server admin, so a lot of learning was involved in creating FEEDblendr. FEEDblendr uses 2 EC2 instances. The same Amazon Instance (AMI) is
7 0.11163773 318 high scalability-2008-05-14-New Facebook Chat Feature Scales to 70 Million Users Using Erlang
8 0.11013148 1303 high scalability-2012-08-13-Ask HighScalability: Facing scaling issues with news feeds on Redis. Any advice?
9 0.10961482 7 high scalability-2007-07-12-FeedBurner Architecture
10 0.09888076 301 high scalability-2008-04-08-Google AppEngine - A First Look
11 0.098610625 1452 high scalability-2013-05-06-7 Not So Sexy Tips for Saving Money On Amazon
13 0.093926102 1048 high scalability-2011-05-27-Stuff The Internet Says On Scalability For May 27, 2011
14 0.093113616 343 high scalability-2008-06-09-Apple's iPhone to Use a Centralized Push Based Notification Architecture
15 0.087476611 1175 high scalability-2012-01-17-Paper: Feeding Frenzy: Selectively Materializing Users’ Event Feeds
16 0.085583344 38 high scalability-2007-07-30-Build an Infinitely Scalable Infrastructure for $100 Using Amazon Services
17 0.084015265 761 high scalability-2010-01-17-Applications Become Black Boxes Using Markets to Scale and Control Costs
18 0.08352226 1421 high scalability-2013-03-11-Low Level Scalability Solutions - The Conditioning Collection
19 0.082835503 1074 high scalability-2011-07-06-11 Common Web Use Cases Solved in Redis
20 0.082621865 1313 high scalability-2012-08-28-Making Hadoop Run Faster
topicId topicWeight
[(0, 0.111), (1, 0.047), (2, -0.008), (3, 0.019), (4, -0.024), (5, -0.054), (6, 0.024), (7, 0.025), (8, -0.024), (9, -0.009), (10, 0.02), (11, 0.042), (12, 0.009), (13, -0.049), (14, -0.021), (15, 0.003), (16, -0.046), (17, -0.02), (18, 0.037), (19, -0.036), (20, -0.008), (21, -0.037), (22, 0.021), (23, -0.02), (24, 0.045), (25, 0.021), (26, 0.042), (27, 0.033), (28, -0.022), (29, -0.01), (30, -0.001), (31, -0.033), (32, 0.046), (33, 0.015), (34, 0.024), (35, -0.044), (36, 0.042), (37, 0.007), (38, -0.056), (39, 0.001), (40, 0.015), (41, 0.025), (42, 0.015), (43, 0.024), (44, 0.04), (45, 0.017), (46, -0.075), (47, 0.021), (48, -0.063), (49, -0.054)]
simIndex simValue blogId blogTitle
same-blog 1 0.95764852 759 high scalability-2010-01-11-Strategy: Don't Use Polling for Real-time Feeds
Introduction: Ivan Zuzak wrote a fascinating article on Real-time feed processing and filtering using Google App Engine to build Feed-buster , a service that inserts MediaRSS tags into feeds that don't have them . He talks about using polling and PubSubHubBub (real-time) to process FriendFeed feeds. Ivan is trying to devise a separate filtering service where: filtering services should be applied as close to the publisher as possible so notifications that nobody wants don’t waste network resource. processing services should be applied as close to the subscriber so that the original update may be transported through the network as a single notification for as long as possible. Besides being a generally interesting article, Ivan makes an insightful observation on the nature of using polling services in combination with metered Infrastructure/Platform services: Polling is bad because AppEngine applications have a fixed free daily quota for consumed resources, when the numbe
2 0.64235401 343 high scalability-2008-06-09-Apple's iPhone to Use a Centralized Push Based Notification Architecture
Introduction: Update 2: Hank Williams says iPhone Background Processing: Not Fixed But Halfway There . Excellent analysis of all the reasons you need real background processing. Hey, you can't even build an alarm clock! Hard to believe some commenters say it's not so. . Update: Josh Lowensohn of Webware tells us Why users should be scared of Apple's new notification system . A big item on the iPhone developer iWishlist has been background processing. If you can't write an app to poll for new data in the background how will you keep your even more important non-foreground app in sync? Live from the Apple developer conference we learn the solution is a centralized push based architecture. Here's the relevant MacRumorsLive transcript: Thanking the developers for their hard work. Now talking about how the #1 request has been background support. Apple wants to solve this problem. The wrong solution would be to allow for background processes -- bad for battery life and performance. Po
3 0.6391747 1267 high scalability-2012-06-18-The Clever Ways Chrome Hides Latency by Anticipating Your Every Need
Introduction: Ilya Grigorik has written another wonderful article lavishly detailing the extraordinary tactics Chrome employs to hide network latency from users: Chrome Networking: DNS Prefetch & TCP Preconnect . Ilya springs some surpising factoids on us, revealing how the web has slowed and super sized: The size of an average page has grown to 1059kB and is now composed of over 80 subresource requests . An average DNS lookup takes between 60 and 120ms. This creates a 100-200ms of latency before a request can be sent because of th full round-trip (RTT) to perform the TCP handshake. Slow mobile experiences are largely due to the much higher RTT's (200-1000ms) on wireless networks. Reducing the number of outbound connections and the total byte size of your pages is the single best optimization you can make for mobile today. Chrome reduces apparent latency using a host of clever anticipatory mechanisms: Learns the network topology as you use it via a Predictor object that anticipate
4 0.63590407 678 high scalability-2009-08-09-Writing about cisco loadbalancer?
Introduction: Guys, At one of my jobs I have to administer a CISCO ACE (application control engine) hardware load-balancer. I don't particularly love this beast, but it's very very powerful. There appears to be little real-world info out there, so it could be interesting writing an article on that. But I don't have other HW LB's to compare it to and I don't want to rehash the product page. What would interest you in a 'product review' of a loadbalancer? No replies means it's not an interesting topic, so no article then ;-)
5 0.62291735 406 high scalability-2008-10-08-Strategy: Flickr - Do the Essential Work Up-front and Queue the Rest
Introduction: This strategy is stated perfectly by Flickr's Myles Grant: The Flickr engineering team is obsessed with making pages load as quickly as possible. To that end, we’re refactoring large amounts of our code to do only the essential work up front, and rely on our queuing system to do the rest. Flickr uses a queuing system to process 11 million tasks a day. Leslie Michael Orchard also does a great job explaining the queuing meme in his excellent post Queue everything and delight everyone . Asynchronous work queues are how you scalably solve problems that are too big to handle in real-time. The process: Identify the minimum feedback the client (UI, API) needs to know an operation succeeded . It's enough, for example, to update a client's view when a posting a message to a microblogging service. The client probably isn't aware of all the other steps that happen when a message is added and doesn't really care when they happen as long as the obvious cases happen in an appropariate
6 0.62073642 293 high scalability-2008-03-31-Read HighScalability on Your Mobile Phone Using WidSets Widgets
7 0.61203891 205 high scalability-2008-01-10-Letting Clients Know What's Changed: Push Me or Pull Me?
8 0.60831755 1373 high scalability-2012-12-17-11 Uses For the Humble Presents Queue, er, Message Queue
9 0.60038984 491 high scalability-2009-01-13-Product: Gearman - Open Source Message Queuing System
10 0.59866136 431 high scalability-2008-10-27-Notify.me Architecture - Synchronicity Kills
11 0.59349614 301 high scalability-2008-04-08-Google AppEngine - A First Look
12 0.59092599 1074 high scalability-2011-07-06-11 Common Web Use Cases Solved in Redis
13 0.58809686 1001 high scalability-2011-03-09-Google and Netflix Strategy: Use Partial Responses to Reduce Request Sizes
14 0.58349389 209 high scalability-2008-01-12-Gandi.net, french registrar launches in granular server resources.
15 0.58105928 1048 high scalability-2011-05-27-Stuff The Internet Says On Scalability For May 27, 2011
16 0.57890558 960 high scalability-2010-12-20-Netflix: Use Less Chatty Protocols in the Cloud - Plus 26 Fixes
17 0.57384932 1303 high scalability-2012-08-13-Ask HighScalability: Facing scaling issues with news feeds on Redis. Any advice?
18 0.56762064 1421 high scalability-2013-03-11-Low Level Scalability Solutions - The Conditioning Collection
19 0.56544662 1519 high scalability-2013-09-18-If You're Programming a Cell Phone Like a Server You're Doing it Wrong
20 0.56449074 1055 high scalability-2011-06-08-Stuff to Watch from Google IO 2011
topicId topicWeight
[(1, 0.07), (2, 0.3), (56, 0.297), (61, 0.115), (79, 0.081), (94, 0.031)]
simIndex simValue blogId blogTitle
1 0.95551753 779 high scalability-2010-02-16-Seven Signs You May Need a NoSQL Database
Introduction: While exploring deep into some dusty old library stacks, I dug up Nostradamus' long lost NoSQL codex. What are the chances? Strangely, it also gave the plot to the next Dan Brown novel, but I left that out for reasons of sanity. About NoSQL, here is what Nosty (his friends call him Nosty) predicted are the signs you may need a NoSQL database... You noticed a lot of your database fields are really serialized complex objects in disguise . Why bother with a RDBMS at all then? Storing serialized objects in a relational database is like being on the pill while trying to get pregnant, a bit counter productive. Just use a schemaless database from the start. Using a standard query language has become too confining . You just want to be free. SQL is so easy, so convenient, and so standard, it's really not a challenge anymore. You need to be different. Then NoSQL is for you. Each has their own completely different query mechanism . Your toolbox only contains a hammer . Hammers wh
2 0.91993034 941 high scalability-2010-11-15-How Google's Instant Previews Reduces HTTP Requests
Introduction: In a strange case of synchronicity, Google just published Instant Previews: Under the hood , a very well written blog post by Matías Pelenur of the Instant Previews team, giving some fascinating inside details on how Google implemented Instant Previews . It's syncronicty because I had just posted Strategy: Biggest Performance Impact Is To Reduce The Number Of HTTP Requests and one of the major ideas behind the design Instant Previews is to reduce the number of HTTP requests through a few well chosen tricks. Cosmic! Some of what Google does to reduce HTTP requests: Data URIs , which are are base64 encodings of image data, are used instead of static images that are served from the server. This means the whole preview can be pieced together from image slices in one request as both the data and the image are returned in the same request. Google found that even though base64 encoding adds about 33% to the size of the image, tests showed that gzip-compressed data URIs are compara
same-blog 3 0.91903836 759 high scalability-2010-01-11-Strategy: Don't Use Polling for Real-time Feeds
Introduction: Ivan Zuzak wrote a fascinating article on Real-time feed processing and filtering using Google App Engine to build Feed-buster , a service that inserts MediaRSS tags into feeds that don't have them . He talks about using polling and PubSubHubBub (real-time) to process FriendFeed feeds. Ivan is trying to devise a separate filtering service where: filtering services should be applied as close to the publisher as possible so notifications that nobody wants don’t waste network resource. processing services should be applied as close to the subscriber so that the original update may be transported through the network as a single notification for as long as possible. Besides being a generally interesting article, Ivan makes an insightful observation on the nature of using polling services in combination with metered Infrastructure/Platform services: Polling is bad because AppEngine applications have a fixed free daily quota for consumed resources, when the numbe
4 0.90902746 1394 high scalability-2013-01-25-Stuff The Internet Says On Scalability For January 25, 2013
Introduction: Sorry, Stuff the Internet Says has been called on the account of a power outage. Gods of rain and tree have interfered with thee. Instead, how about watching a little Python? (that's Monty, not the language)
5 0.8945601 67 high scalability-2007-08-17-What is the best hosting option?
Introduction: The questions was extracted from: http://highscalability.com/plentyoffish-architecture#comment-126 For startup like Markus, what is the best hosting option (and grow more later)? host your own server or use ISP co-location option? He still has to pay huge money on the bandwidth with that payload, right?
6 0.88241303 1022 high scalability-2011-04-13-Paper: NoSQL Databases - NoSQL Introduction and Overview
7 0.87722802 854 high scalability-2010-07-09-Hot Scalability Links for July 9, 2010
8 0.87291485 732 high scalability-2009-10-29-Digg - Looking to the Future with Cassandra
9 0.85388082 659 high scalability-2009-07-20-A Scalability Lament
10 0.85271573 446 high scalability-2008-11-18-Scalability Perspectives #2: Van Jacobson – Content-Centric Networking
11 0.83853942 815 high scalability-2010-04-27-Paper: Dapper, Google's Large-Scale Distributed Systems Tracing Infrastructure
12 0.81848288 1322 high scalability-2012-09-14-Stuff The Internet Says On Scalability For September 14, 2012
13 0.7807864 1565 high scalability-2013-12-16-22 Recommendations for Building Effective High Traffic Web Software
14 0.77336353 487 high scalability-2009-01-08-Paper: Sharding with Oracle Database
15 0.76852643 45 high scalability-2007-07-30-Product: SmarterStats
16 0.76247925 717 high scalability-2009-10-07-How to Avoid the Top 5 Scale-Out Pitfalls
17 0.76035738 461 high scalability-2008-12-05-Sprinkle - Provisioning Tool to Build Remote Servers
18 0.75002581 556 high scalability-2009-04-05-At Some Point the Cost of Servers Outweighs the Cost of Programmers
19 0.74816185 1183 high scalability-2012-01-30-37signals Still Happily Scaling on Moore RAM and SSDs
20 0.74761736 1077 high scalability-2011-07-11-ATMCash Exploits Virtualization for Security - Immutability and Reversion