high_scalability high_scalability-2012 high_scalability-2012-1229 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: The adding jitter strategy was one of the most commented on techniques from 7 Years Of YouTube Scalability Lessons In 30 Minutes on HackerNews . Probably because it’s one of the emergent phenomena that you really can’t predict and is shocking when you see it in real life. Here’s the technique: Add Entropy Back into Your System If your system doesn’t jitter then you get thundering herds . Distributed applications are really weather systems. Debugging them is as deterministic as predicting the weather. Jitter introduces more randomness because surprisingly, things tend to stack up. For example, cache expirations. For a popular video they cache things as best they can. The most popular video they might cache for 24 hours. If everything expires at one time then every machine will calculate the expiration at the same time. This creates a thundering herd. By jittering you are saying randomly expire between 18-30 hours. That prevents things from stackin
sentIndex sentText sentNum sentScore
1 The adding jitter strategy was one of the most commented on techniques from 7 Years Of YouTube Scalability Lessons In 30 Minutes on HackerNews . [sent-1, score-0.556]
2 Probably because it’s one of the emergent phenomena that you really can’t predict and is shocking when you see it in real life. [sent-2, score-0.255]
3 Here’s the technique: Add Entropy Back into Your System If your system doesn’t jitter then you get thundering herds . [sent-3, score-0.642]
4 Jitter introduces more randomness because surprisingly, things tend to stack up. [sent-6, score-0.161]
5 For a popular video they cache things as best they can. [sent-8, score-0.09]
6 The most popular video they might cache for 24 hours. [sent-9, score-0.09]
7 If everything expires at one time then every machine will calculate the expiration at the same time. [sent-10, score-0.147]
8 By jittering you are saying randomly expire between 18-30 hours. [sent-12, score-0.171]
9 Systems have a tendency to self synchronize as operations line up and try to destroy themselves. [sent-15, score-0.219]
10 Each one actually removes entropy from the system so you have to add some back in. [sent-19, score-0.161]
11 Comments from HackerNews really help to fill out the topic with more detail: There were some other examples of the thundering herd problem : Adblock Plus had a thundering herd problem : Ad blocking lists check for updates every 5 days. [sent-20, score-0.744]
12 Updates that had been scheduled for Saturday or Sunday would spill over to Monday. [sent-22, score-0.093]
13 Facebook uses jitter a lot in their cache infrastructure. [sent-24, score-0.561]
14 The Multicast DNS extension makes extensive use of randomness to reduce network collisions. [sent-26, score-0.161]
15 Jitter is used in all sorts of applications, from cron jobs to configuration management to memcache key expiration. [sent-28, score-0.266]
16 Any time you have a lot of nodes that all need to do an operation at a specific time you rely on jitter to keep resources from bottlenecking. [sent-29, score-0.471]
17 I frequently find myself adding a sleep for PID mod some appropriate constant to the beginning of big distributed batch jobs in order to keep shared resources (NFS, database, etc) from getting hammered all at once. [sent-32, score-0.516]
18 Not everyone adds jitter : Jeff Dean at Google says they prefer to have a known hit every once in awhile, so they will have all the cron jobs go off at the same time versus introducing jitter. [sent-38, score-0.737]
19 The Linux kernel tries to schedule timer events for the same deadline time. [sent-40, score-0.455]
20 That allows the processor to sleep longer because the kernel doesn't need to wake up as often just to handle 1 or 2 timer events. [sent-41, score-0.451]
wordName wordTfidf (topN-words)
[('jitter', 0.471), ('thundering', 0.171), ('herd', 0.165), ('firing', 0.165), ('entropy', 0.161), ('randomness', 0.161), ('timer', 0.148), ('jobs', 0.147), ('events', 0.132), ('sleep', 0.131), ('cron', 0.119), ('reoccurring', 0.099), ('sally', 0.099), ('getthundering', 0.099), ('jittering', 0.099), ('poisson', 0.099), ('kernel', 0.098), ('uniformly', 0.093), ('saturday', 0.093), ('pid', 0.093), ('spill', 0.093), ('cache', 0.09), ('eg', 0.089), ('phenomena', 0.089), ('stacking', 0.085), ('commented', 0.085), ('shocking', 0.083), ('hammered', 0.083), ('emergent', 0.083), ('sunday', 0.083), ('mod', 0.08), ('expires', 0.08), ('tendency', 0.078), ('hackernews', 0.078), ('deadline', 0.077), ('headroom', 0.077), ('distributed', 0.075), ('converge', 0.075), ('destroy', 0.074), ('wake', 0.074), ('expire', 0.072), ('updates', 0.072), ('periodic', 0.071), ('awhile', 0.071), ('predicting', 0.07), ('prevents', 0.069), ('prime', 0.068), ('weather', 0.067), ('expiration', 0.067), ('synchronize', 0.067)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 1229 high scalability-2012-04-17-YouTube Strategy: Adding Jitter isn't a Bug
Introduction: The adding jitter strategy was one of the most commented on techniques from 7 Years Of YouTube Scalability Lessons In 30 Minutes on HackerNews . Probably because it’s one of the emergent phenomena that you really can’t predict and is shocking when you see it in real life. Here’s the technique: Add Entropy Back into Your System If your system doesn’t jitter then you get thundering herds . Distributed applications are really weather systems. Debugging them is as deterministic as predicting the weather. Jitter introduces more randomness because surprisingly, things tend to stack up. For example, cache expirations. For a popular video they cache things as best they can. The most popular video they might cache for 24 hours. If everything expires at one time then every machine will calculate the expiration at the same time. This creates a thundering herd. By jittering you are saying randomly expire between 18-30 hours. That prevents things from stackin
2 0.28611767 1184 high scalability-2012-01-31-Performance in the Cloud: Business Jitter is Bad
Introduction: One of the benefits of web applications is that they are generally transported via TCP, which is a connection-oriented protocol designed to assure delivery. TCP has a variety of native mechanisms through which delivery issues can be addressed – from window sizes to selective acks to idle time specification to ramp up parameters. All these technical knobs and buttons serve as a way for operators and administrators to tweak the protocol, often at run time, to ensure the exchange of requests and responses upon which web applications rely. This is unlike UDP, which is more of a “fire and forget” protocol in which the server doesn’t really care if you receive the data or not. Now, voice and streaming video and audio over the web has always leveraged UDP and thus it has always been highly sensitive to jitter. Jitter is, without getting into layer one (physical) jargon, an undesirable delay in the otherwise consistent delivery of packets. It causes the delay of and sometimes
3 0.24839427 1215 high scalability-2012-03-26-7 Years of YouTube Scalability Lessons in 30 Minutes
Introduction: If you started out building a dating site and instead ended up building a video sharing site (YouTube) that handles 4 billion views a day, then it’s just possible you learned something along the way. And indeed, Mike Solomon, one of the original engineers at YouTube, did learn a lot and he has given a talk about it at PyCon : Scalability at YouTube . This isn’t an architecture driven talk where we are led through a description of how a lot of boxes connect to each other. Mike could give that sort of talk. He has worked on building YouTube’s servlet infrastructure, video indexing feature, video transcoding system, their full text search, a CDN, and much more. But instead, he’s taken a step back, took a long look around at what time has wrought, and shared some deep lessons, obviously hard won from experience. The key takeaway away of the talk for me was doing a lot with really simple tools . While many teams are moving on to more complex ecosystems, YouTube really does keep it
Introduction: In Taming The Long Latency Tail we covered Luiz Barroso ’s exploration of the long tail latency (some operations are really slow) problems generated by large fanout architectures (a request is composed of potentially thousands of other requests). You may have noticed there weren’t a lot of solutions. That’s where a talk I attended, Achieving Rapid Response Times in Large Online Services ( slide deck ), by Jeff Dean , also of Google, comes in: In this talk, I’ll describe a collection of techniques and practices lowering response times in large distributed systems whose components run on shared clusters of machines, where pieces of these systems are subject to interference by other tasks, and where unpredictable latency hiccups are the norm, not the exception. The goal is to use software techniques to reduce variability given the increasing variability in underlying hardware, the need to handle dynamic workloads on a shared infrastructure, and the need to use lar
5 0.10827444 274 high scalability-2008-03-12-YouTube Architecture
Introduction: Update 2: YouTube Reaches One Billion Views Per Day . That’s at least 11,574 views per second, 694,444 views per minute, and 41,666,667 views per hour. Update: YouTube: The Platform . YouTube adds a new rich set of APIs in order to become your video platform leader--all for free. Upload, edit, watch, search, and comment on video from your own site without visiting YouTube. Compose your site internally from APIs because you'll need to expose them later anyway. YouTube grew incredibly fast, to over 100 million video views per day, with only a handful of people responsible for scaling the site. How did they manage to deliver all that video to all those users? And how have they evolved since being acquired by Google? Information Sources Google Video Platform Apache Python Linux (SuSe) MySQL psyco, a dynamic python->C compiler lighttpd for video instead of Apache What's Inside? The Stats Supports the delivery of over 100 million vide
7 0.096534111 1568 high scalability-2013-12-23-What Happens While Your Brain Sleeps is Surprisingly Like How Computers Stay Sane
8 0.095301867 1207 high scalability-2012-03-12-Google: Taming the Long Latency Tail - When More Machines Equals Worse Results
9 0.0951479 1359 high scalability-2012-11-15-Gone Fishin': Justin.Tv's Live Video Broadcasting Architecture
10 0.095065787 796 high scalability-2010-03-16-Justin.tv's Live Video Broadcasting Architecture
11 0.093950585 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
12 0.093193516 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
13 0.092683554 1456 high scalability-2013-05-13-The Secret to 10 Million Concurrent Connections -The Kernel is the Problem, Not the Solution
14 0.092483878 1000 high scalability-2011-03-08-Medialets Architecture - Defeating the Daunting Mobile Device Data Deluge
15 0.091304883 673 high scalability-2009-08-07-Strategy: Break Up the Memcache Dog Pile
16 0.090530016 1600 high scalability-2014-02-21-Stuff The Internet Says On Scalability For February 21st, 2014
17 0.089546137 471 high scalability-2008-12-19-Gigaspaces curbs latency outliers with Java Real Time
18 0.08874815 406 high scalability-2008-10-08-Strategy: Flickr - Do the Essential Work Up-front and Queue the Rest
19 0.087233931 1404 high scalability-2013-02-11-At Scale Even Little Wins Pay Off Big - Google and Facebook Examples
20 0.08701361 1536 high scalability-2013-10-23-Strategy: Use Linux Taskset to Pin Processes or Let the OS Schedule It?
topicId topicWeight
[(0, 0.152), (1, 0.098), (2, -0.012), (3, -0.022), (4, -0.024), (5, 0.006), (6, 0.034), (7, 0.076), (8, -0.052), (9, -0.008), (10, -0.003), (11, -0.004), (12, -0.007), (13, 0.004), (14, 0.017), (15, -0.006), (16, 0.01), (17, -0.008), (18, -0.003), (19, -0.012), (20, -0.002), (21, 0.008), (22, 0.059), (23, 0.038), (24, -0.036), (25, 0.029), (26, 0.033), (27, 0.088), (28, -0.018), (29, 0.016), (30, 0.017), (31, 0.0), (32, 0.031), (33, 0.024), (34, 0.025), (35, 0.02), (36, -0.024), (37, -0.047), (38, 0.035), (39, 0.055), (40, 0.028), (41, -0.034), (42, -0.048), (43, -0.038), (44, -0.012), (45, 0.012), (46, -0.007), (47, -0.016), (48, 0.042), (49, -0.017)]
simIndex simValue blogId blogTitle
same-blog 1 0.96294421 1229 high scalability-2012-04-17-YouTube Strategy: Adding Jitter isn't a Bug
Introduction: The adding jitter strategy was one of the most commented on techniques from 7 Years Of YouTube Scalability Lessons In 30 Minutes on HackerNews . Probably because it’s one of the emergent phenomena that you really can’t predict and is shocking when you see it in real life. Here’s the technique: Add Entropy Back into Your System If your system doesn’t jitter then you get thundering herds . Distributed applications are really weather systems. Debugging them is as deterministic as predicting the weather. Jitter introduces more randomness because surprisingly, things tend to stack up. For example, cache expirations. For a popular video they cache things as best they can. The most popular video they might cache for 24 hours. If everything expires at one time then every machine will calculate the expiration at the same time. This creates a thundering herd. By jittering you are saying randomly expire between 18-30 hours. That prevents things from stackin
2 0.73762983 1215 high scalability-2012-03-26-7 Years of YouTube Scalability Lessons in 30 Minutes
Introduction: If you started out building a dating site and instead ended up building a video sharing site (YouTube) that handles 4 billion views a day, then it’s just possible you learned something along the way. And indeed, Mike Solomon, one of the original engineers at YouTube, did learn a lot and he has given a talk about it at PyCon : Scalability at YouTube . This isn’t an architecture driven talk where we are led through a description of how a lot of boxes connect to each other. Mike could give that sort of talk. He has worked on building YouTube’s servlet infrastructure, video indexing feature, video transcoding system, their full text search, a CDN, and much more. But instead, he’s taken a step back, took a long look around at what time has wrought, and shared some deep lessons, obviously hard won from experience. The key takeaway away of the talk for me was doing a lot with really simple tools . While many teams are moving on to more complex ecosystems, YouTube really does keep it
Introduction: In Taming The Long Latency Tail we covered Luiz Barroso ’s exploration of the long tail latency (some operations are really slow) problems generated by large fanout architectures (a request is composed of potentially thousands of other requests). You may have noticed there weren’t a lot of solutions. That’s where a talk I attended, Achieving Rapid Response Times in Large Online Services ( slide deck ), by Jeff Dean , also of Google, comes in: In this talk, I’ll describe a collection of techniques and practices lowering response times in large distributed systems whose components run on shared clusters of machines, where pieces of these systems are subject to interference by other tasks, and where unpredictable latency hiccups are the norm, not the exception. The goal is to use software techniques to reduce variability given the increasing variability in underlying hardware, the need to handle dynamic workloads on a shared infrastructure, and the need to use lar
Introduction: Jeff Dean gave a talk at SFBay ACM and at about 3 minutes in he goes over how Google runs jobs on computers, which is different than how most shops distribute workloads. It’s common for machines to be dedicated to one service, say run a database, run a cache, run this, or run that. The logic is: Better control over responsiveness as you generally know the traffic loads a machine will experience and you can over provision a box to be safe. Easier to manage, load balance, configure, upgrade, create and make highly available. Since you know what a machine does another machine can be provisioned to do the same work. The problem is monocropping hardware though conceptually clean for humans and safe for applications, is hugely wasteful. Machines are woefully underutilized, even in a virtualized world. What Google does is use a shared environment in a datacenter where all kinds of stuff run on each computer. Batch computation and interactive computations a
5 0.72766256 978 high scalability-2011-01-26-Google Pro Tip: Use Back-of-the-envelope-calculations to Choose the Best Design
Introduction: How do you know which is the "best" design for a given problem? If, for example, you were given the problem of generating an image search results page of 30 thumbnails, would you load images sequentially? In parallel? Would you cache? How would you decide? If you could harness the power of the multiverse you could try every possible option in the design space and see which worked best. But that's crazy impractical, isn't it? Another option is to consider the order of various algorithm alternatives. As a prophet for the Golden Age of Computational Thinking , Google would definitely do this, but what else might Google do? Use Back-of-the-envelope Calculations to Evaluate Different Designs Jeff Dean , Head of Google's School of Infrastructure Wizardry—instrumental in many of Google's key systems: ad serving, BigTable; search, MapReduce, ProtocolBuffers—advocates evaluating different designs using back-of-the-envelope calculations . He gives the full story in this Stanfor
6 0.7083804 1468 high scalability-2013-05-31-Stuff The Internet Says On Scalability For May 31, 2013
7 0.69923341 1413 high scalability-2013-02-27-42 Monster Problems that Attack as Loads Increase
8 0.68777066 1387 high scalability-2013-01-15-More Numbers Every Awesome Programmer Must Know
9 0.68665421 1421 high scalability-2013-03-11-Low Level Scalability Solutions - The Conditioning Collection
10 0.68514735 1418 high scalability-2013-03-06-Low Level Scalability Solutions - The Aggregation Collection
11 0.67913151 1498 high scalability-2013-08-07-RAFT - In Search of an Understandable Consensus Algorithm
12 0.67703235 1622 high scalability-2014-03-31-How WhatsApp Grew to Nearly 500 Million Users, 11,000 cores, and 70 Million Messages a Second
13 0.67605191 1204 high scalability-2012-03-06-Ask For Forgiveness Programming - Or How We'll Program 1000 Cores
14 0.67241621 189 high scalability-2007-12-21-Strategy: Limit Result Sets
15 0.67077923 910 high scalability-2010-09-30-Facebook and Site Failures Caused by Complex, Weakly Interacting, Layered Systems
16 0.66947001 1415 high scalability-2013-03-04-7 Life Saving Scalability Defenses Against Load Monster Attacks
17 0.6691649 1209 high scalability-2012-03-14-The Azure Outage: Time Is a SPOF, Leap Day Doubly So
18 0.66699791 274 high scalability-2008-03-12-YouTube Architecture
19 0.66204298 1568 high scalability-2013-12-23-What Happens While Your Brain Sleeps is Surprisingly Like How Computers Stay Sane
20 0.66193908 951 high scalability-2010-12-01-8 Commonly Used Scalable System Design Patterns
topicId topicWeight
[(1, 0.048), (2, 0.246), (8, 0.266), (10, 0.025), (30, 0.031), (40, 0.034), (51, 0.012), (61, 0.096), (77, 0.037), (79, 0.104), (85, 0.03)]
simIndex simValue blogId blogTitle
1 0.92227405 1243 high scalability-2012-05-10-Paper: Paxos Made Moderately Complex
Introduction: If you are a normal human being and find the Paxos protocol confusing, then this paper, Paxos Made Moderately Complex , is a great find. Robbert van Renesse from Cornell University has written a clear and well written paper with excellent explanations. The Abstract: For anybody who has ever tried to implement it, Paxos is by no means a simple protocol, even though it is based on relatively simple invariants. This paper provides imperative pseudo-code for the full Paxos (or Multi-Paxos) protocol without shying away from discussing various implementation details. The initial description avoids optimizations that complicate comprehension. Next we discuss liveness, and list various optimizations that make the protocol practical. Related Articles Paxos on HighScalability.com
2 0.87499791 186 high scalability-2007-12-13-un-article: the setup behind microsoft.com
Introduction: On the blogs.technet.com article on microsoft.com's infrastructure: The article reads like a blatant ad for it's own products, and is light on the technical side. The juicy bits are here, so you know what the fuss is about: Cytrix Netscaler (= loadbalancer with various optimizations) W2K8 + IIS7 and antivirus software on the webservers 650GB/day ISS log files 8-9GBit/s (unknown if CDN's are included) Simple network filtering: stateless access lists blocking unwanted ports on the routers/switches (hence the debated "no firewalls" claim). Note that this information may not reflect present reality very well; the spokesman appears to be reciting others words.
same-blog 3 0.86991477 1229 high scalability-2012-04-17-YouTube Strategy: Adding Jitter isn't a Bug
Introduction: The adding jitter strategy was one of the most commented on techniques from 7 Years Of YouTube Scalability Lessons In 30 Minutes on HackerNews . Probably because it’s one of the emergent phenomena that you really can’t predict and is shocking when you see it in real life. Here’s the technique: Add Entropy Back into Your System If your system doesn’t jitter then you get thundering herds . Distributed applications are really weather systems. Debugging them is as deterministic as predicting the weather. Jitter introduces more randomness because surprisingly, things tend to stack up. For example, cache expirations. For a popular video they cache things as best they can. The most popular video they might cache for 24 hours. If everything expires at one time then every machine will calculate the expiration at the same time. This creates a thundering herd. By jittering you are saying randomly expire between 18-30 hours. That prevents things from stackin
4 0.8650583 155 high scalability-2007-11-15-Video: Dryad: A general-purpose distributed execution platform
Introduction: Dryad is Microsoft's answer to Google's map-reduce . What's the question: How do you process really large amounts of data? My initial impression of Dryad is it's like a giant Unix command line filter on steroids. There are lots of inputs, outputs, tees, queues, and merge sorts all connected together by a master exec program. What else does Dryad have to offer the scalable infrastructure wars? Dryad models programs as the execution of a directed acyclic graph. Each vertex is a program and edges are typed communication channels (files, TCP pipes, and shared memory channels within a process). Map-reduce uses a different model. It's more like a large distributed sort where the programmer defines functions for mapping, partitioning, and reducing. Each approach seems to borrow from the spirit of its creating organization. The graph approach seems a bit too complicated and map-reduce seems a bit too simple. How ironic, in the Alanis Morissette sense. Dryad is a middleware layer that ex
5 0.86088115 272 high scalability-2008-03-08-Product: FAI - Fully Automatic Installation
Introduction: From their website: FAI is an automated installation tool to install or deploy Debian GNU/Linux and other distributions on a bunch of different hosts or a Cluster. It's more flexible than other tools like kickstart for Red Hat, autoyast and alice for SuSE or Jumpstart for SUN Solaris. FAI can also be used for configuration management of a running system. You can take one or more virgin PCs, turn on the power and after a few minutes Linux is installed, configured and running on all your machines, without any interaction necessary. FAI it's a scalable method for installing and updating all your computers unattended with little effort involved. It's a centralized management system for your Linux deployment. FAI's target group are system administrators who have to install Linux onto one or even hundreds of computers. It's not only a tool for doing a Cluster installation but a general purpose installation tool. It can be used for installing a Beowulf cluster, a rendering farm,
6 0.82931477 1108 high scalability-2011-08-31-Pud is the Anti-Stack - Windows, CFML, Dropbox, Xeround, JungleDisk, ELB
7 0.81215137 1183 high scalability-2012-01-30-37signals Still Happily Scaling on Moore RAM and SSDs
8 0.75102645 1207 high scalability-2012-03-12-Google: Taming the Long Latency Tail - When More Machines Equals Worse Results
9 0.74183822 350 high scalability-2008-07-15-ZooKeeper - A Reliable, Scalable Distributed Coordination System
10 0.72910219 1017 high scalability-2011-04-06-Netflix: Run Consistency Checkers All the time to Fixup Transactions
11 0.72706103 1215 high scalability-2012-03-26-7 Years of YouTube Scalability Lessons in 30 Minutes
12 0.72494024 1438 high scalability-2013-04-10-Check Yourself Before You Wreck Yourself - Avocado's 5 Early Stages of Architecture Evolution
13 0.7244997 628 high scalability-2009-06-13-Neo4j - a Graph Database that Kicks Buttox
14 0.72141176 1418 high scalability-2013-03-06-Low Level Scalability Solutions - The Aggregation Collection
15 0.72134328 306 high scalability-2008-04-21-The Search for the Source of Data - How SimpleDB Differs from a RDBMS
16 0.72058511 252 high scalability-2008-02-18-limit on the number of databases open
17 0.71986204 1074 high scalability-2011-07-06-11 Common Web Use Cases Solved in Redis
18 0.71964955 1266 high scalability-2012-06-18-Google on Latency Tolerant Systems: Making a Predictable Whole Out of Unpredictable Parts
19 0.7195642 533 high scalability-2009-03-11-The Implications of Punctuated Scalabilium for Website Architecture
20 0.71911132 717 high scalability-2009-10-07-How to Avoid the Top 5 Scale-Out Pitfalls