high_scalability high_scalability-2008 high_scalability-2008-275 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: A thoughtful reader recently suggested creating a series of posts based on real-life problems people have experienced and the solutions they've created to slay the little beasties. It's a great idea. Often we learn best from great trials and tribulations. I'll start off the new "Problem Report" feature with a diabolical little problem I dubbed the "Mobbing the Least Used Resource Error." Please post your own. And if you know someone with an interesting problem report, please tag them too. It could be a lot of fun. Of course, feel free to scrub your posts of all embarrassing details, but be sure to keep the heroic parts in :-) The Problem There's an unexpected and frequently fatal type of error that can happen when new resources are added to a horizontally scaled architecture. Because the new resource has the least of something, load or connections or whatever, a load balancer configured with a least metric will instantaneously direct all new traffic to that new resource. And
sentIndex sentText sentNum sentScore
1 A thoughtful reader recently suggested creating a series of posts based on real-life problems people have experienced and the solutions they've created to slay the little beasties. [sent-1, score-0.189]
2 I'll start off the new "Problem Report" feature with a diabolical little problem I dubbed the "Mobbing the Least Used Resource Error. [sent-4, score-0.391]
3 And if you know someone with an interesting problem report, please tag them too. [sent-6, score-0.132]
4 Of course, feel free to scrub your posts of all embarrassing details, but be sure to keep the heroic parts in :-) The Problem There's an unexpected and frequently fatal type of error that can happen when new resources are added to a horizontally scaled architecture. [sent-8, score-0.774]
5 Because the new resource has the least of something, load or connections or whatever, a load balancer configured with a least metric will instantaneously direct all new traffic to that new resource. [sent-9, score-1.442]
6 All the traffic that was meant to be spread across your entire cluster is now directed like a laser beam to one small part of it. [sent-12, score-0.294]
7 I love this problem because it's such a Heisenberg. [sent-13, score-0.132]
8 Everyone is screaming for more storage space so you bring up a new filer. [sent-14, score-0.463]
9 All new data streams flow to the new filer and it crumbles and crawls because it can't handle the load for the entire system. [sent-15, score-0.719]
10 It's in the very act of turning up more storage you bring your system down. [sent-16, score-0.191]
11 Let's say you add database slaves to handle load. [sent-18, score-0.306]
12 Your load balancer redirects traffic to the new slaves, but the slaves are trying to sync, yet they can't sink because they are getting hammered by the new traffic. [sent-19, score-1.179]
13 Unless your system is very flexible you can't scale anymore by adding resources because you can't repartition the data. [sent-24, score-0.205]
14 The Solution The solution depends of course on the resource in question. [sent-28, score-0.191]
15 Butting knowing a potential problem is present gives you the heads up you need to avoid destruction. [sent-29, score-0.229]
16 For filers migrate storage from existing filers to the new filers so storage is evened out. [sent-30, score-1.404]
17 Then new storage will be allocated evenly across all the filers. [sent-31, score-0.336]
18 Consistent Hashing to assign resources to a pool of servers in a scalable fashion. [sent-34, score-0.207]
19 For servers use random or round-robin balancing when the load balancer can receive incorrect feedback from pool servers. [sent-35, score-0.476]
20 The Thundering Herd Problem is supposedly the same problem described here, but it doesn't seem the same to me. [sent-36, score-0.249]
wordName wordTfidf (topN-words)
[('filers', 0.351), ('slaves', 0.198), ('balancer', 0.178), ('new', 0.147), ('problem', 0.132), ('ca', 0.13), ('breaklet', 0.125), ('bam', 0.125), ('problemthere', 0.125), ('screaming', 0.125), ('least', 0.12), ('problemis', 0.117), ('supposedly', 0.117), ('report', 0.116), ('filer', 0.112), ('trials', 0.112), ('dubbed', 0.112), ('redistribute', 0.112), ('scrub', 0.112), ('posts', 0.108), ('handle', 0.108), ('resource', 0.108), ('fatal', 0.108), ('sink', 0.108), ('pool', 0.106), ('repartition', 0.104), ('herd', 0.104), ('crawls', 0.104), ('cruel', 0.104), ('hammered', 0.104), ('hates', 0.104), ('redirects', 0.104), ('storage', 0.102), ('beam', 0.101), ('instantaneously', 0.101), ('laser', 0.101), ('load', 0.101), ('resources', 0.101), ('heroic', 0.099), ('embarrassing', 0.099), ('heads', 0.097), ('indicating', 0.095), ('traffic', 0.092), ('incorrect', 0.091), ('bring', 0.089), ('evenly', 0.087), ('course', 0.083), ('alive', 0.081), ('suggested', 0.081), ('metric', 0.08)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999976 275 high scalability-2008-03-14-Problem: Mobbing the Least Used Resource Error
Introduction: A thoughtful reader recently suggested creating a series of posts based on real-life problems people have experienced and the solutions they've created to slay the little beasties. It's a great idea. Often we learn best from great trials and tribulations. I'll start off the new "Problem Report" feature with a diabolical little problem I dubbed the "Mobbing the Least Used Resource Error." Please post your own. And if you know someone with an interesting problem report, please tag them too. It could be a lot of fun. Of course, feel free to scrub your posts of all embarrassing details, but be sure to keep the heroic parts in :-) The Problem There's an unexpected and frequently fatal type of error that can happen when new resources are added to a horizontally scaled architecture. Because the new resource has the least of something, load or connections or whatever, a load balancer configured with a least metric will instantaneously direct all new traffic to that new resource. And
2 0.12184534 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
Introduction: For everything given something seems to be taken. Caching is a great scalability solution, but caching also comes with problems . Sharding is a great scalability solution, but as Foursquare recently revealed in a post-mortem about their 17 hours of downtime, sharding also has problems. MongoDB, the database Foursquare uses, also contributed their post-mortem of what went wrong too. Now that everyone has shared and resharded, what can we learn to help us skip these mistakes and quickly move on to a different set of mistakes? First, like for Facebook , huge props to Foursquare and MongoDB for being upfront and honest about their problems. This helps everyone get better and is a sign we work in a pretty cool industry. Second, overall, the fault didn't flow from evil hearts or gross negligence. As usual the cause was more mundane: a key system, that could be a little more robust, combined with a very popular application built by a small group of people, under immense pressure
3 0.12030613 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
Introduction: Pinterest has been riding an exponential growth curve, doubling every month and half. They've gone from 0 to 10s of billions of page views a month in two years, from 2 founders and one engineer to over 40 engineers, from one little MySQL server to 180 Web Engines, 240 API Engines, 88 MySQL DBs (cc2.8xlarge) + 1 slave each, 110 Redis Instances, and 200 Memcache Instances.Stunning growth. So what's Pinterest's story? To tell their story we have our bards, Pinterest'sYashwanth NelapatiandMarty Weiner, who tell the dramatic story of Pinterest's architecture evolution in a talk titledScaling Pinterest. This is the talk they would have liked to hear a year and half ago when they were scaling fast and there were a lot of options to choose from. And they made a lot of incorrect choices.This is a great talk. It's full of amazing details. It's also very practical, down to earth, and it contains strategies adoptable by nearly anyone. Highly recommended.Two of my favorite lessons from the talk:Arc
Introduction: At a Cloud Computing Meetup , Siddharth "Sid" Anand of Netflix, backed by a merry band of Netflixians, gave an interesting talk: Keeping Movies Running Amid Thunderstorms . While the talk gave a good overview of their move to the cloud, issues with capacity planning, thundering herds , latency problems, and simian armageddon , I found myself most taken with how they handle software deployment in the cloud . I've worked on half a dozen or more build and deployment systems, some small, some quite large, but never for a large organization like Netflix in the cloud. The cloud has this amazing capability that has never existed before that enables a novel approach to fault-tolerant software deployments: the ability to spin up huge numbers of instances to completely run a new release while running the old release at the same time . The process goes something like: A canary machine is launched first with the new software load running real traffic to sanity test the load in a p
5 0.11034157 761 high scalability-2010-01-17-Applications Become Black Boxes Using Markets to Scale and Control Costs
Introduction: This is an excerpt from my article Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud. We tend to think compute of resources as residing primarily in datacenters. Given the fast pace of innovation we will likely see compute resources become pervasive. Some will reside in datacenters, but compute resources can be anywhere, not just in the datacenter, we'll actually see the bulk of compute resources live outside of datacenters in the future. Given the diversity of compute resources it's reasonable to assume they won't be homogeneous or conform to a standard API. They will specialize by service. Programmers will have to use those specialized service interfaces to build applications that are adaptive enough to take advantage of whatever leverage they can find, whenever and wherever they can find it. Once found the application will have to reorganize on the fly to use whatever new resources it has found and let go of whatever resources it doe
8 0.10749171 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
9 0.10405953 1331 high scalability-2012-10-02-An Epic TripAdvisor Update: Why Not Run on the Cloud? The Grand Experiment.
10 0.1039672 63 high scalability-2007-08-09-Lots of questions for high scalability - high availability
11 0.10072256 1359 high scalability-2012-11-15-Gone Fishin': Justin.Tv's Live Video Broadcasting Architecture
12 0.099445239 796 high scalability-2010-03-16-Justin.tv's Live Video Broadcasting Architecture
13 0.098724313 906 high scalability-2010-09-22-Applying Scalability Patterns to Infrastructure Architecture
14 0.095009625 1413 high scalability-2013-02-27-42 Monster Problems that Attack as Loads Increase
15 0.094564289 152 high scalability-2007-11-13-Flickr Architecture
16 0.093270719 1240 high scalability-2012-05-07-Startups are Creating a New System of the World for IT
17 0.091796987 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
18 0.09104529 68 high scalability-2007-08-20-TypePad Architecture
19 0.090243936 157 high scalability-2007-11-16-Product: lbpool - Load Balancing JDBC Pool
20 0.088107646 5 high scalability-2007-07-10-mixi.jp Architecture
topicId topicWeight
[(0, 0.182), (1, 0.084), (2, -0.003), (3, -0.067), (4, -0.015), (5, -0.048), (6, 0.024), (7, -0.01), (8, -0.02), (9, -0.036), (10, -0.024), (11, 0.032), (12, -0.007), (13, 0.004), (14, 0.052), (15, 0.016), (16, 0.048), (17, 0.001), (18, -0.034), (19, 0.06), (20, -0.014), (21, 0.028), (22, -0.014), (23, -0.037), (24, -0.011), (25, -0.024), (26, 0.026), (27, 0.065), (28, -0.022), (29, 0.014), (30, 0.036), (31, -0.028), (32, 0.01), (33, 0.045), (34, -0.02), (35, -0.016), (36, -0.008), (37, -0.025), (38, 0.017), (39, -0.007), (40, -0.012), (41, -0.008), (42, 0.028), (43, 0.031), (44, -0.032), (45, 0.058), (46, 0.002), (47, 0.002), (48, 0.025), (49, -0.032)]
simIndex simValue blogId blogTitle
same-blog 1 0.97630489 275 high scalability-2008-03-14-Problem: Mobbing the Least Used Resource Error
Introduction: A thoughtful reader recently suggested creating a series of posts based on real-life problems people have experienced and the solutions they've created to slay the little beasties. It's a great idea. Often we learn best from great trials and tribulations. I'll start off the new "Problem Report" feature with a diabolical little problem I dubbed the "Mobbing the Least Used Resource Error." Please post your own. And if you know someone with an interesting problem report, please tag them too. It could be a lot of fun. Of course, feel free to scrub your posts of all embarrassing details, but be sure to keep the heroic parts in :-) The Problem There's an unexpected and frequently fatal type of error that can happen when new resources are added to a horizontally scaled architecture. Because the new resource has the least of something, load or connections or whatever, a load balancer configured with a least metric will instantaneously direct all new traffic to that new resource. And
2 0.83792901 76 high scalability-2007-08-29-Skype Failed the Boot Scalability Test: Is P2P fundamentally flawed?
Introduction: Skype's 220 millions users lost service for a stunning two days. The primary cause for Skype's nightmare (can you imagine the beeper storm that went off?) was a massive global roll-out of a Window's patch triggering the simultaneous reboot of millions of machines across the globe. The secondary cause was a bug in Skype's software that prevented "self-healing" in the face of such attacks. The flood of log-in requests and a lack of "peer-to-peer resources" melted their system. Who's fault is it? Is Skype to blame? Is Microsoft to blame? Or is the peer-to-peer model itself fundamentally flawed in some way? Let's be real, how could Skype possibly test booting 220 million servers over a random configuration of resources? Answer: they can't. Yes, it's Skype's responsibility, but they are in a bit of a pickle on this one. The boot scenario is one of the most basic and one of the most difficult scalability scenarios to plan for and test. You can't simulate the viciousness of real-life
3 0.80053866 533 high scalability-2009-03-11-The Implications of Punctuated Scalabilium for Website Architecture
Introduction: Update: How do you design and handle peak load on the Cloud? by Cloudiquity. Gives a formula to try and predict and plan for peak load and talks about how GigaSpaces XAP, Scalr, RightScale and FreedomOSS can be used to handle peak load within EC2. Theo Schlossnagle, with his usual insight, talks about in Dissecting today's surges how the nature of internet traffic has evolved over time. Traffic now spikes like a heart attack, larger and more quickly than ever from traffic inflow sources like Digg and The New York Times. Theo relates how At least eight times in the past month, we've experienced from 100% to 1000% sudden increases in traffic across many of our clients and those spike can happen as quickly as 60 seconds. To me this sounds a lot like Punctuated equilibrium in evolution, a force that accounts for much creative growth in species... VMs don't spin up in less than 60 seconds so your ability to respond to such massive quick spikes is limited. This as
4 0.78813493 185 high scalability-2007-12-13-Is premature scalation a real disease?
Introduction: Update 3: InfoQ's Big Architecture Up Front - A Case of Premature Scalaculation? twines several different threads on the topic together into a fine noose. Update 2: Kevin says the biggest problems he sees with startups is they need to scale their backend (no, the other one). Update: My bad. It's hard to sell scalability so just forget it. The premise of Startups and The Problem Of Premature Scalaculation and Don’t scale: 99.999% uptime is for Wal-Mart is that you shouldn't spend precious limited resources worrying about scaling before you've first implemented the functionality that will make you successful enough to have scaling problems in the first place. It's kind of an embodied life force model of system creation. Energy is scarce so any parasites siphoning off energy must be hunted down and destroyed so the body has its best chance of survival. Is this really how it works? If I ever believed this I certainly don't believe it anymore. The world has c
5 0.78240293 691 high scalability-2009-08-31-Squarespace Architecture - A Grid Handles Hundreds of Millions of Requests a Month
Introduction: I first heard an enthusiastic endorsement of Squarespace streaming from the ubiquitous Leo Laporte on one of his many Twit Live shows. Squarespace as a fully hosted, completely managed environment for creating and maintaining a website, blog or portfolio was of interest to me because they promise scalability and this site doesn't have enough of that. But sadly, since they don't offer a link preserving Drupal import our relationship was not meant to be. When a fine reader of High Scalability, Brian Egge, (and all my readers are thrifty, brave, and strong) asked me how Squarespace scaled I said I didn't know, but I would try and find out. I emailed Squarespace a few questions and founder Anthony Casalena and Director of Technical Operations Rolando Berrios were kind enough to reply in some detail. The questions were both from Brian and myself. Answers can be found below. Two things struck me most about Squarespace's approach: They based their system on a memory grid, in this
7 0.76774919 1613 high scalability-2014-03-17-Intuitively Showing How To Scale a Web Application Using a Coffee Shop as an Example
9 0.76119077 1282 high scalability-2012-07-12-4 Strategies for Punching Down Traffic Spikes
10 0.75652295 788 high scalability-2010-03-04-How MySpace Tested Their Live Site with 1 Million Concurrent Users
11 0.75587314 249 high scalability-2008-02-16-S3 Failed Because of Authentication Overload
12 0.75018555 1077 high scalability-2011-07-11-ATMCash Exploits Virtualization for Security - Immutability and Reversion
13 0.73921651 130 high scalability-2007-10-24-Scaling Operations Saves Money and Scales Faster
14 0.73870218 985 high scalability-2011-02-08-Mollom Architecture - Killing Over 373 Million Spams at 100 Requests Per Second
15 0.73553854 1421 high scalability-2013-03-11-Low Level Scalability Solutions - The Conditioning Collection
16 0.73498082 344 high scalability-2008-06-09-FaceStat's Rousing Tale of Scaling Woe and Wisdom Won
17 0.73471487 1260 high scalability-2012-06-07-Case Study on Scaling PaaS infrastructure
19 0.72972429 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
20 0.72888738 1438 high scalability-2013-04-10-Check Yourself Before You Wreck Yourself - Avocado's 5 Early Stages of Architecture Evolution
topicId topicWeight
[(1, 0.106), (2, 0.232), (10, 0.081), (61, 0.033), (77, 0.012), (79, 0.155), (80, 0.258), (85, 0.01), (94, 0.029)]
simIndex simValue blogId blogTitle
Introduction: With the Chapultepec aqueduct , also named the great aqueduct , the Aztecs built a novel uninterruptible water supply for providing fresh water to Tenochtitlan , their fast growing jewel of a capital city. A section of the aqueduct is still around today: It's fun to think about how even 600 years ago how it was built with high availability in mind. We find engineers being engineers , no matter the age: It consisted of a twin pipe distribution system made in part of compacted soil and in part of wood for the crossings of the aqueduct over the bridges built to allow the passage of the canoes. It was finished around 1466 AD, and the main purpose was to supply fresh water to Mexico-Tenochtitlan, to mitigate its thirst. The main source for the aqueduct was the spring of Chapultepec and the purpose of the twin pipes was to ease the maintenance of the system, because the water was conveyed through one pipe, and when it got dirty, the water was diverted to the other pipe
same-blog 2 0.88493794 275 high scalability-2008-03-14-Problem: Mobbing the Least Used Resource Error
Introduction: A thoughtful reader recently suggested creating a series of posts based on real-life problems people have experienced and the solutions they've created to slay the little beasties. It's a great idea. Often we learn best from great trials and tribulations. I'll start off the new "Problem Report" feature with a diabolical little problem I dubbed the "Mobbing the Least Used Resource Error." Please post your own. And if you know someone with an interesting problem report, please tag them too. It could be a lot of fun. Of course, feel free to scrub your posts of all embarrassing details, but be sure to keep the heroic parts in :-) The Problem There's an unexpected and frequently fatal type of error that can happen when new resources are added to a horizontally scaled architecture. Because the new resource has the least of something, load or connections or whatever, a load balancer configured with a least metric will instantaneously direct all new traffic to that new resource. And
3 0.86582482 1274 high scalability-2012-06-29-Stuff The Internet Says On Scalability For June 29, 2012 - The Velocity Edition
Introduction: Judging from the tweet flow, Velocity looked like a riotous good time. In this video on the main themes at Velocity, after a little microphone enhanced violence, John Allspaw and Steve Souders identify resilience and automation as two of the big ideas behind building a faster and stronger web. John says resiliency is the idea that we we don't live in a perfect world so trying to build perfect systems is counter productive. We have to accept failure as a baseline and think in terms of degrees of availability. All abstraction layers leak so every part of a system must be monitorable and open to introspection. A focus on resilience means the web is growing up. Resilience has long been a requirement for "real" systems, it's great to see the web thinking in terms of the complex systems they've always been. For the Alpha and Omega on resilience you'll want to watch Dr. Richard Cook's inspiring talk on How Complex Systems Fail . Here are some of the most enjoyable Quotable Q
4 0.83208072 1170 high scalability-2012-01-06-Stuff The Internet Says On Scalability For January 6, 2012
Introduction: OMG, it's 2012: Harry Bombarda Twilight ; 200 Million : Chinese online shoppers; Quantum 150 qubit computer : all the power of today's supercomputers; Sperm : two aspirins worth could repopulate the world; 1 Billion : the number of iOS and Android apps downloaded in a week; Watson : 250 Servers, 2,880 cores, 10 racks, 16 Terabytes RAM, 80 Teraflops; Reddit: 2 Billion Pageviews Quotable Quotes: Robert Martin : The hallmark of a really good architecture is that it allows major decisions to be deferred. Building Memory-efficient Java Applications: Practices and Challenges : More abstractions = less awareness of costs. Ian Muir : When we do something that Microsoft did not anticipate, it's nothing but pain. @kekline : Want to know a secret - NoSQL's rapid growth is really about NoNormalization Jeremy Zawodny : The fact that I can look back on code I wrote a few years ago and identify ways that I’d do it better is good. It means I’m
5 0.80489594 206 high scalability-2008-01-10-MONO ASP.NET. Will it make the web???
Introduction: I was wondering if it is already possible to scale a MONO's .NET website. I cannot see any real websites (with the term real I mean "a highly visited website") running mono. What do you think? Will MONO ASP.NET scale??? Is it worth planning a site to run with Mono asp.net? Or should we leave it to the future? What do you think?
6 0.79771388 1028 high scalability-2011-04-22-Stuff The Internet Says On Scalability For April 22, 2011
7 0.78448147 851 high scalability-2010-07-02-Hot Scalability Links for July 2, 2010
9 0.77959645 1106 high scalability-2011-08-26-Stuff The Internet Says On Scalability For August 26, 2011
10 0.76456016 542 high scalability-2009-03-17-IBM WebSphere eXtreme Scale (IMDG)
11 0.75720721 1159 high scalability-2011-12-19-How Twitter Stores 250 Million Tweets a Day Using MySQL
12 0.75631362 1220 high scalability-2012-04-02-YouPorn - Targeting 200 Million Views a Day and Beyond
13 0.75616485 1589 high scalability-2014-02-03-How Google Backs Up the Internet Along With Exabytes of Other Data
14 0.75412285 1545 high scalability-2013-11-08-Stuff The Internet Says On Scalability For November 8th, 2013
15 0.75389832 601 high scalability-2009-05-17-Product: Hadoop
16 0.75319159 1316 high scalability-2012-09-04-Changing Architectures: New Datacenter Networks Will Set Your Code and Data Free
17 0.75306827 1027 high scalability-2011-04-20-Packet Pushers: How to Build a Low Cost Data Center
18 0.7530039 1491 high scalability-2013-07-15-Ask HS: What's Wrong with Twitter, Why Isn't One Machine Enough?
19 0.75234574 1186 high scalability-2012-02-02-The Data-Scope Project - 6PB storage, 500GBytes-sec sequential IO, 20M IOPS, 130TFlops
20 0.75194347 289 high scalability-2008-03-27-Amazon Announces Static IP Addresses and Multiple Datacenter Operation