high_scalability high_scalability-2009 high_scalability-2009-639 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Update 6: Some interesting changes from Twitter's Evan Weaver : everything in RAM now, database is a backup; peaks at 300 tweets/second; every tweet followed by average 126 people; vector cache of tweet IDs; row cache; fragment cache; page cache; keep separate caches; GC makes Ruby optimization resistant so went with Scala; Thrift and HTTP are used internally; 100s internal requests for every external request; rewrote MQ but kept interface the same; 3 queues are used to load balance requests; extensive A/B testing for backwards capability; switched to C memcached client for speed; optimize critical path; faster to get the cached results from the network memory than recompute them locally. Update 5: Twitter on Scala . A Conversation with Steve Jenson, Alex Payne, and Robey Pointer by Bill Venners. A fascinating discussion of why Twitter moved to the Java JVM for their server infrastructure (long lived processes) and why they moved to Scala to program against it (high level langu
sentIndex sentText sentNum sentScore
1 A fascinating discussion of why Twitter moved to the Java JVM for their server infrastructure (long lived processes) and why they moved to Scala to program against it (high level language, static typing, functional). [sent-4, score-0.346]
2 Tells how Twitter changed their infrastructure to go from handling 3 requests to 139 requests a second. [sent-7, score-0.236]
3 They moved to a messaging model, asynchronous process, 3 levels of cache, and moved their middleware to a mixture C and Scala/JVM. [sent-8, score-0.284]
4 My uneducated guess is it's not a language or architecture problem, but more a problem of not being able to add hardware fast enough into their data center. [sent-13, score-0.205]
5 Update: Twitter releases Starling - light-weight persistent queue server that speaks the MemCache protocol. [sent-15, score-0.146]
6 Early design decisions that worked well in the small melted under the crush of new users chirping tweets to all their friends. [sent-18, score-0.176]
7 Web darling Ruby on Rails was fingered early for the scaling problems, but Blaine Cook, Twitter's lead architect, held Ruby blameless: For us, it’s really about scaling horizontally - to that end, Rails and Ruby haven’t been stumbling blocks, compared to any other language or framework. [sent-19, score-0.416]
8 The performance boosts associated with a “faster” language would give us a 10-20% improvement, but thanks to architectural changes that Ruby and Rails happily accommodated, Twitter is 10000% faster than it was in January. [sent-20, score-0.154]
9 - For example, if getting a count is slow, you can memoize the count into memcache in a millisecond. [sent-49, score-0.172]
10 So rather than doing a query, a friend's status is updated in cache instead. [sent-52, score-0.12]
11 So they want to store critical attributes in a hash and lazy load the other attributes on access. [sent-56, score-0.14]
12 - Send message to invalidate friend's cache in the background instead of doing all individually, synchronously. [sent-64, score-0.186]
13 - Moved to Rinda , which a shared queue that uses a tuplespace model, along the lines of Linda. [sent-68, score-0.142]
14 But the queues are persistent and the messages are lost on failure. [sent-69, score-0.167]
15 Deployment - They do a review and push out new mongrel servers. [sent-80, score-0.136]
16 - An internal server error is given to the user if their mongrel server is replaced. [sent-82, score-0.26]
17 A rolling blackout isn't used because the message queue state is in the mongrels and a rolling approach would cause all the queues in the remaining mongrels to fill up. [sent-84, score-0.589]
18 - The partition scheme will be based on time, not users, because most requests are very temporally local. [sent-94, score-0.118]
19 Use exception notifier and exception logger to get immediate notification of problems so you can address the right away. [sent-140, score-0.158]
20 - Trying to load 3000 friends at once into memory can bring a server down, but when there were only 4 friends it works great. [sent-143, score-0.294]
wordName wordTfidf (topN-words)
[('twitter', 0.397), ('rails', 0.247), ('ruby', 0.192), ('moved', 0.142), ('mongrel', 0.136), ('blaine', 0.129), ('cache', 0.12), ('requests', 0.118), ('intwitter', 0.117), ('friends', 0.116), ('api', 0.114), ('mongrels', 0.105), ('queues', 0.103), ('ids', 0.101), ('pinpoint', 0.093), ('language', 0.092), ('friend', 0.089), ('queue', 0.084), ('exception', 0.079), ('scaling', 0.075), ('tweet', 0.071), ('attributes', 0.07), ('message', 0.066), ('messages', 0.064), ('small', 0.063), ('rolling', 0.063), ('server', 0.062), ('changes', 0.062), ('slave', 0.059), ('darling', 0.058), ('stumbling', 0.058), ('fingered', 0.058), ('accommodated', 0.058), ('tuplespace', 0.058), ('jenson', 0.058), ('advisers', 0.058), ('articlesfor', 0.058), ('chirping', 0.058), ('flaky', 0.058), ('hammers', 0.058), ('terrifying', 0.058), ('uneducated', 0.058), ('videoby', 0.058), ('violates', 0.058), ('getting', 0.058), ('processes', 0.058), ('count', 0.057), ('enough', 0.055), ('statistics', 0.055), ('melted', 0.055)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 639 high scalability-2009-06-27-Scaling Twitter: Making Twitter 10000 Percent Faster
Introduction: Update 6: Some interesting changes from Twitter's Evan Weaver : everything in RAM now, database is a backup; peaks at 300 tweets/second; every tweet followed by average 126 people; vector cache of tweet IDs; row cache; fragment cache; page cache; keep separate caches; GC makes Ruby optimization resistant so went with Scala; Thrift and HTTP are used internally; 100s internal requests for every external request; rewrote MQ but kept interface the same; 3 queues are used to load balance requests; extensive A/B testing for backwards capability; switched to C memcached client for speed; optimize critical path; faster to get the cached results from the network memory than recompute them locally. Update 5: Twitter on Scala . A Conversation with Steve Jenson, Alex Payne, and Robey Pointer by Bill Venners. A fascinating discussion of why Twitter moved to the Java JVM for their server infrastructure (long lived processes) and why they moved to Scala to program against it (high level langu
2 0.34511995 837 high scalability-2010-06-07-Six Ways Twitter May Reach its Big Hairy Audacious Goal of One Billion Users
Introduction: Twitter has a big hairy audacious goal of reaching one billion users by 2013. Three forces stand against Twitter. The world will end in 2012 . But let's be optimistic and assume we'll make it. Next is Facebook. Currently Facebook is the user leader with over 400 million users . Will Facebook stumble or will they rocket to one billion users before Twitter? And lastly, there's Twitter's "low" starting point and "slow" growth rate. Twitter currently has 106 million registered users and adds about 300,000 new users a day. That doesn't add up to a billion in three years. Twitter needs to triple the number of registered users they add per day. How will Twitter reach its goal of over one billion users served? From recent infrastructure announcements and information gleaned at Chirp ( videos ) and other talks, it has become a little clearer how they hope to reach their billion user goal: 1) Make a Big Hairy Audacious Goal 2) Hire Lots of Quality People 3) Hug Developers and Users 4) D
3 0.26393092 313 high scalability-2008-05-02-Friends for Sale Architecture - A 300 Million Page View-Month Facebook RoR App
Introduction: Update: Jake in Does Django really scale better than Rails? thinks apps like FFS shouldn't need so much hardware to scale. In a short three months Friends for Sale (think Hot-or-Not with a market economy) grew to become a top 10 Facebook application handling 200 gorgeous requests per second and a stunning 300 million page views a month. They did all this using Ruby on Rails, two part time developers, a cluster of a dozen machines, and a fairly standard architecture. How did Friends for Sale scale to sell all those beautiful people? And how much do you think your friends are worth on the open market? Site: http://www.facebook.com/apps/application.php?id=7019261521 Information Sources Siqi Chen and Alexander Le, co-creators of Friends for Sale, answering my standard questionairre. Virality on Facebook The Platform Ruby on Rails CentOS 5 (64 bit) Capistrano - update and restart application servers. Memcached MySQL Nginx Starling - distrib
4 0.22611322 556 high scalability-2009-04-05-At Some Point the Cost of Servers Outweighs the Cost of Programmers
Introduction: This is the intriguing quote by Bill Venners in an interview with Twitter's Alex Payne on Twitter's heretical switch from a pure Ruby stack to a Ruby on Rails stack on the front-end and JVM/Scala on the back-end: So performance was also one of the problems with JRuby, which I [Bill Venners] think helps explain better why they'd [Twitter] prefer Scala over Ruby or JRuby for some things. I have often heard Rubyists say that although Ruby is slower than Java, for many things it is plenty fast enough, and they are right. The logic goes further, saying that servers are cheap, and programmers expensive, so it makes sense to tradeoff some runtime performance for programmer productivity. And I think that's very often true too, but not always. If you have enough traffic, at some point the cost of servers outweighs the cost of programmers . I'm not sure whether Twitter is past that point, but they get a lot of traffic. And frankly this isn't an intrinsic tradeoff. Other dynamic languages
5 0.22339016 568 high scalability-2009-04-14-Designing a Scalable Twitter
Introduction: There were many talks recently about twitter scalability and their specific choice of language such as Scala to address their existing Ruby based scalability. In this post i tried to provide a more methodical approach for handling twitter scalability challenges that is centered around the right choice of architecture patterns rather then the language itself. The architecture pattern are given in a generic fashion that is not specific to twitter itself and can serve anyone who is looking to build a scalable real time web application in the near future.
6 0.20513494 721 high scalability-2009-10-13-Why are Facebook, Digg, and Twitter so hard to scale?
8 0.20294696 574 high scalability-2009-04-20-Some things about Memcached from a Twitter software developer
9 0.20027927 417 high scalability-2008-10-15-Outside.in Scales Up with Engine Yard and moving from PHP to Ruby on Rails
10 0.19535357 855 high scalability-2010-07-11-So, Why is Twitter Really Not Using Cassandra to Store Tweets?
11 0.19073671 1159 high scalability-2011-12-19-How Twitter Stores 250 Million Tweets a Day Using MySQL
12 0.18076238 1491 high scalability-2013-07-15-Ask HS: What's Wrong with Twitter, Why Isn't One Machine Enough?
13 0.18051362 544 high scalability-2009-03-18-QCon London 2009: Upgrading Twitter without service disruptions
14 0.17272016 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
15 0.16670495 780 high scalability-2010-02-19-Twitter’s Plan to Analyze 100 Billion Tweets
16 0.16538173 1251 high scalability-2012-05-24-Build your own twitter like real time analytics - a step by step guide
17 0.16273254 783 high scalability-2010-02-24-Hot Scalability Links for February 24, 2010
18 0.16041724 166 high scalability-2007-11-27-Solving the Client Side API Scalability Problem with a Little Game Theory
19 0.15973689 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
20 0.15637237 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
topicId topicWeight
[(0, 0.296), (1, 0.143), (2, -0.072), (3, -0.115), (4, 0.1), (5, -0.062), (6, -0.052), (7, 0.065), (8, -0.014), (9, -0.008), (10, 0.021), (11, 0.122), (12, 0.055), (13, 0.046), (14, -0.115), (15, -0.049), (16, 0.016), (17, -0.072), (18, -0.005), (19, -0.033), (20, -0.057), (21, -0.053), (22, 0.067), (23, 0.021), (24, 0.0), (25, 0.04), (26, 0.042), (27, -0.01), (28, -0.044), (29, 0.057), (30, 0.065), (31, -0.176), (32, -0.163), (33, 0.032), (34, -0.165), (35, -0.049), (36, -0.048), (37, 0.085), (38, -0.164), (39, -0.041), (40, 0.083), (41, -0.081), (42, -0.044), (43, 0.033), (44, -0.039), (45, -0.057), (46, -0.041), (47, -0.015), (48, 0.016), (49, -0.006)]
simIndex simValue blogId blogTitle
same-blog 1 0.97296393 639 high scalability-2009-06-27-Scaling Twitter: Making Twitter 10000 Percent Faster
Introduction: Update 6: Some interesting changes from Twitter's Evan Weaver : everything in RAM now, database is a backup; peaks at 300 tweets/second; every tweet followed by average 126 people; vector cache of tweet IDs; row cache; fragment cache; page cache; keep separate caches; GC makes Ruby optimization resistant so went with Scala; Thrift and HTTP are used internally; 100s internal requests for every external request; rewrote MQ but kept interface the same; 3 queues are used to load balance requests; extensive A/B testing for backwards capability; switched to C memcached client for speed; optimize critical path; faster to get the cached results from the network memory than recompute them locally. Update 5: Twitter on Scala . A Conversation with Steve Jenson, Alex Payne, and Robey Pointer by Bill Venners. A fascinating discussion of why Twitter moved to the Java JVM for their server infrastructure (long lived processes) and why they moved to Scala to program against it (high level langu
Introduction: Toy solutions solving Twitter’s “problems” are a favorite scalability trope. Everybody has this idea that Twitter is easy. With a little architectural hand waving we have a scalable Twitter, just that simple. Well, it’s not that simple as Raffi Krikorian , VP of Engineering at Twitter, describes in his superb and very detailed presentation on Timelines at Scale . If you want to know how Twitter works - then start here. It happened gradually so you may have missed it, but Twitter has grown up. It started as a struggling three-tierish Ruby on Rails website to become a beautifully service driven core that we actually go to now to see if other services are down. Quite a change. Twitter now has 150M world wide active users, handles 300K QPS to generate timelines, and a firehose that churns out 22 MB/sec. 400 million tweets a day flow through the system and it can take up to 5 minutes for a tweet to flow from Lady Gaga’s fingers to her 31 million followers. A couple o
3 0.82649326 837 high scalability-2010-06-07-Six Ways Twitter May Reach its Big Hairy Audacious Goal of One Billion Users
Introduction: Twitter has a big hairy audacious goal of reaching one billion users by 2013. Three forces stand against Twitter. The world will end in 2012 . But let's be optimistic and assume we'll make it. Next is Facebook. Currently Facebook is the user leader with over 400 million users . Will Facebook stumble or will they rocket to one billion users before Twitter? And lastly, there's Twitter's "low" starting point and "slow" growth rate. Twitter currently has 106 million registered users and adds about 300,000 new users a day. That doesn't add up to a billion in three years. Twitter needs to triple the number of registered users they add per day. How will Twitter reach its goal of over one billion users served? From recent infrastructure announcements and information gleaned at Chirp ( videos ) and other talks, it has become a little clearer how they hope to reach their billion user goal: 1) Make a Big Hairy Audacious Goal 2) Hire Lots of Quality People 3) Hug Developers and Users 4) D
4 0.82249421 1491 high scalability-2013-07-15-Ask HS: What's Wrong with Twitter, Why Isn't One Machine Enough?
Introduction: Can anyone convincingly explain why properties sporting traffic statistics that may seem in-line with with the capabilities of a single big-iron machine need so many machines in their architecture? This is a common reaction to architecture profiles on High Scalability: I could do all that on a few machines so they must be doing something really stupid. Lo and behold this same reaction also occurred to the article The Architecture Twitter Uses to Deal with 150M Active Users . On Hacker News papsosouid voiced what a lot of people may have been thinking: I really question the current trend of creating big, complex, fragile architectures to "be able to scale". These numbers are a great example of why, the entire thing could run on a single server, in a very straight forward setup. When you are creating a cluster for scalability, and it has less CPU, RAM and IO than a single server, what are you gaining? They are only doing 6k writes a second for crying out loud. This is a s
5 0.82249296 556 high scalability-2009-04-05-At Some Point the Cost of Servers Outweighs the Cost of Programmers
Introduction: This is the intriguing quote by Bill Venners in an interview with Twitter's Alex Payne on Twitter's heretical switch from a pure Ruby stack to a Ruby on Rails stack on the front-end and JVM/Scala on the back-end: So performance was also one of the problems with JRuby, which I [Bill Venners] think helps explain better why they'd [Twitter] prefer Scala over Ruby or JRuby for some things. I have often heard Rubyists say that although Ruby is slower than Java, for many things it is plenty fast enough, and they are right. The logic goes further, saying that servers are cheap, and programmers expensive, so it makes sense to tradeoff some runtime performance for programmer productivity. And I think that's very often true too, but not always. If you have enough traffic, at some point the cost of servers outweighs the cost of programmers . I'm not sure whether Twitter is past that point, but they get a lot of traffic. And frankly this isn't an intrinsic tradeoff. Other dynamic languages
6 0.78538877 323 high scalability-2008-05-19-Twitter as a scalability case study
7 0.78243124 574 high scalability-2009-04-20-Some things about Memcached from a Twitter software developer
8 0.76721209 1159 high scalability-2011-12-19-How Twitter Stores 250 Million Tweets a Day Using MySQL
9 0.72647548 1148 high scalability-2011-11-29-DataSift Architecture: Realtime Datamining at 120,000 Tweets Per Second
10 0.72580343 568 high scalability-2009-04-14-Designing a Scalable Twitter
11 0.72397107 544 high scalability-2009-03-18-QCon London 2009: Upgrading Twitter without service disruptions
12 0.7104429 1251 high scalability-2012-05-24-Build your own twitter like real time analytics - a step by step guide
13 0.70953935 116 high scalability-2007-10-08-Lessons from Pownce - The Early Years
14 0.69980085 855 high scalability-2010-07-11-So, Why is Twitter Really Not Using Cassandra to Store Tweets?
15 0.6894505 780 high scalability-2010-02-19-Twitter’s Plan to Analyze 100 Billion Tweets
16 0.68648642 783 high scalability-2010-02-24-Hot Scalability Links for February 24, 2010
17 0.68636644 166 high scalability-2007-11-27-Solving the Client Side API Scalability Problem with a Little Game Theory
18 0.66686624 970 high scalability-2011-01-06-BankSimple Mini-Architecture - Using a Next Generation Toolchain
19 0.65538746 307 high scalability-2008-04-21-Using Google AppEngine for a Little Micro-Scalability
20 0.65153879 1538 high scalability-2013-10-28-Design Decisions for Scaling Your High Traffic Feeds
topicId topicWeight
[(1, 0.137), (2, 0.539), (10, 0.02), (26, 0.013), (30, 0.018), (47, 0.014), (56, 0.011), (61, 0.054), (79, 0.053), (85, 0.041), (89, 0.01), (94, 0.027)]
simIndex simValue blogId blogTitle
1 0.99654335 1190 high scalability-2012-02-10-Stuff The Internet Says On Scalability For February 10, 2012
Introduction: HighScalability Tested, Mother Approved: 12,233TPS : Twitter @ Super Bowl; 11 Million Slices : Dominos @ Super Bowl; 500K requests per second : S3; The great mobile money drain . Mobile: high resource costs, low revenue. Mobile traffic on Plenty of Fish is growing at 3% a month , rising to 3 Billion pageviews a month, 40% of signups are mobile, and all traffic will soon be 60-70% mobile. The problem: how do you make money on mobile? Time to chuck microprocessors for a networks of cells? How Networks of Biological Cells Solve Distributed Computing Problems : Computer scientists prove that networks of cells can compute as efficiently as networks of computers linked via the internet. We believe that there is a need for a network model, where nodes are by design below the computation and communication capabilities of Turing machines. Unrelated? GDrive at last and S3 Drops Storage Pricing . If you are StackOverflow and your data is overflowing , what do you do? Mo
2 0.99646622 967 high scalability-2011-01-03-Stuff The Internet Says On Scalability For January 3, 2010
Introduction: Submitted for your reading pleasure... Quotable Quotes @hofmanndavid : Performance and scalability anxiety makes developers want to catch the flying butterflies @tivrfoa : "Scalability solutions aren't magic. They involve partitioning, indexing and replication." Twitter engineer Alan Perlis: Fools ignore complexity; pragmatists suffer it; experts avoid it; geniuses remove it. CIO update: Post-mortem on the Skype outage . Interesting tale of a cascading collapse in complex, distributed, interactive systems. For more background see the highly illuminating Explaining Supernodes by Dan York. RethinkDB and SSD Databases. SSD was not a revolution by Kevin Burton. What’s really shocking to me, is that while SSD and flash storage is very exciting, it wasn’t as revolutionary in 2010 as I would have liked to have seen. The case for Datastore-Side-Scripting . Russell Sullivan predicts real-time web applications are going in the direction of being enti
same-blog 3 0.99588662 639 high scalability-2009-06-27-Scaling Twitter: Making Twitter 10000 Percent Faster
Introduction: Update 6: Some interesting changes from Twitter's Evan Weaver : everything in RAM now, database is a backup; peaks at 300 tweets/second; every tweet followed by average 126 people; vector cache of tweet IDs; row cache; fragment cache; page cache; keep separate caches; GC makes Ruby optimization resistant so went with Scala; Thrift and HTTP are used internally; 100s internal requests for every external request; rewrote MQ but kept interface the same; 3 queues are used to load balance requests; extensive A/B testing for backwards capability; switched to C memcached client for speed; optimize critical path; faster to get the cached results from the network memory than recompute them locally. Update 5: Twitter on Scala . A Conversation with Steve Jenson, Alex Payne, and Robey Pointer by Bill Venners. A fascinating discussion of why Twitter moved to the Java JVM for their server infrastructure (long lived processes) and why they moved to Scala to program against it (high level langu
4 0.99538267 1199 high scalability-2012-02-27-Zen and the Art of Scaling - A Koan and Epigram Approach
Introduction: This is a guest post derived from an email conversation with Russell Sullivan , a computer architect and creator of Alchemy Database, A Hybrid RDBMS/NOSQL-Datastore. Russell (AKA Jak Sprats) has been pondering, considering, and implementing distributed databases for many years. In a recent email conversation he shared 44 of the lessons he has learned from developing the infrastructure for high performance / highly scalable systems. Some are well known, some are debatable, and some obviously result from a deep experience that is worth learning from: There are maybe 20 classic bottlenecks (CPU, NIC overload, memory fragmentation, disk seeks, swap, thread deadlock, packet loss, etc.), have a basic understanding of them, because each is a dark tunnel, and you need a specialised flashlight for each. The true style of scalable architecture is rooted in the nature of bottlenecks. I always talk about commutative operations on a sharded relational model, and I figure if that i
Introduction: This is a guestrepostby Ron Pressler, the founder and CEO ofParallel Universe, a Y Combinator company building advanced middleware for real-time applications. Little's Law helps us determine the maximum request rate a server can handle. When we apply it, we find that the dominating factor limiting a server's capacity is not the hardware but theOS.Should we buy more hardware if software is the problem? If not, how can we remove that software limitation in a way that does not make the code much harder to write and understand?Many modern web applications are composed of multiple (often many)HTTPservices (this is often called a micro-service architecture). This architecture has many advantages in terms of code reuse and maintainability, scalability and fault tolerance. In this post I'd like to examine one particular bottleneck in the approach, which hinders scalability as well as fault tolerance, and various ways to deal with it (I am using the term "scalability" very loosely in this post
6 0.99419576 551 high scalability-2009-03-30-Lavabit Architecture - Creating a Scalable Email Service
7 0.99410385 723 high scalability-2009-10-16-Paper: Scaling Online Social Networks without Pains
8 0.9936949 1283 high scalability-2012-07-13-Stuff The Internet Says On Scalability For July 13, 2012
9 0.99352264 1006 high scalability-2011-03-17-Are long VM instance spin-up times in the cloud costing you money?
10 0.99261981 205 high scalability-2008-01-10-Letting Clients Know What's Changed: Push Me or Pull Me?
11 0.99061298 1155 high scalability-2011-12-12-Netflix: Developing, Deploying, and Supporting Software According to the Way of the Cloud
12 0.99041015 662 high scalability-2009-07-27-Handle 700 Percent More Requests Using Squid and APC Cache
13 0.98725927 50 high scalability-2007-07-31-BerkeleyDB & other distributed high performance key-value databases
14 0.98719072 594 high scalability-2009-05-08-Eight Best Practices for Building Scalable Systems
15 0.98584813 910 high scalability-2010-09-30-Facebook and Site Failures Caused by Complex, Weakly Interacting, Layered Systems
16 0.98383981 1373 high scalability-2012-12-17-11 Uses For the Humble Presents Queue, er, Message Queue
17 0.98379916 1628 high scalability-2014-04-08-Microservices - Not a free lunch!
18 0.98345768 844 high scalability-2010-06-18-Paper: The Declarative Imperative: Experiences and Conjectures in Distributed Logic
19 0.98198575 1126 high scalability-2011-09-27-Use Instance Caches to Save Money: Latency == $$$
20 0.98171932 455 high scalability-2008-12-01-MySQL Database Scale-out and Replication for High Growth Businesses