high_scalability high_scalability-2009 high_scalability-2009-512 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Joe Stump, Lead Architect at Digg, gave this presentation at the Web 2.0 Expo. I couldn't find the actual presentation, but fortunately Kris Jordan took some great notes. That's how key moments in history are accidentally captured forever. Joe was also kind enough to respond to my email questions with a phone call. In this first part of the post Joe shares some timeless wisdom that you may or may not have read before. I of course take some pains to extract all the wit from the original presentation in favor of simple rules. What really struck me however was how Joe thought MemcacheDB Will be the biggest new kid on the block in scaling . MemcacheDB has been around for a little while and I've never thought of it in that way. Well learn why Joe is so excited by MemcacheDB at the end of the post. Impressive Stats 80th-100th largest site in the world 26 million uniques a month 30 million users. Uniques are only half that traffic. Traffic = unique web visitors + APIs + Digg
sentIndex sentText sentNum sentScore
1 2 billion requests a month 13,000 requests a second, peak at 27,000 requests a second. [sent-14, score-0.207]
2 Horizontal partitioning means store a subset of rows on a different machines. [sent-40, score-0.206]
3 Vertical partitioning means putting some columns in one table and some columns in another table. [sent-42, score-0.342]
4 Build a data access layer so partitioning is hidden behind an API. [sent-45, score-0.131]
5 With partitioning comes the CAP Theorem : you can only pick two of the following three: Strong Consistency, High Availability, Partition Tolerance. [sent-46, score-0.131]
6 Denormalization means data is copied in multiple objects and must be kept synchronized. [sent-48, score-0.15]
7 All three are then combined to return a combined single answer to the client. [sent-58, score-0.201]
8 So you have to: - denormalize - avoid joins - avoid large scans across databases by partitioning - cache - add read slaves - don't use NFS Run numbers before you try and fix a problem to make sure things actually will work. [sent-65, score-0.303]
9 Cache changeable items in memcached Cache rarely changed items in APC . [sent-70, score-0.212]
10 Eventually consistent means that writes to one partition will eventually make it to all the other partitions. [sent-78, score-0.236]
11 After a write reads made one after another don't have to return the same value as they could be handled by different partitions. [sent-79, score-0.252]
12 Trust that people have it handled and they'll take care of it. [sent-93, score-0.163]
13 At those write rates it's easy to see why Joe was so excited about MemcacheDB's ability to handle their digg deluge. [sent-127, score-0.505]
14 It conforms to memcache protocol(not completed, see below), so any memcached client can have connectivity with it. [sent-131, score-0.204]
15 Digg uses MemcacheDB to scale out the huge number of writes that happen when data is denormalized. [sent-137, score-0.187]
16 Denormalizing introduces redundancies because you are keeping copies of data in multiple records instead of just one copy in a nicely normalized table. [sent-140, score-0.145]
17 So denormalization means a lot more writes as data must be copied to all the records that contain a copy. [sent-141, score-0.43]
18 MemcacheDB has the performance, especially when you layer memcached's normal partitioning scheme on top. [sent-143, score-0.131]
19 Digg already uses memcache so it's a no-brainer to start using MemcacheDB. [sent-148, score-0.159]
20 So it's an evolutionary step for code and a revolutionary step for performance. [sent-156, score-0.334]
wordName wordTfidf (topN-words)
[('memcachedb', 0.444), ('digg', 0.349), ('joe', 0.259), ('apc', 0.141), ('partitioning', 0.131), ('php', 0.129), ('stump', 0.125), ('denormalization', 0.109), ('writes', 0.099), ('diggs', 0.094), ('evolutionary', 0.091), ('handled', 0.09), ('trust', 0.089), ('return', 0.089), ('uses', 0.088), ('step', 0.086), ('excited', 0.083), ('memcached', 0.078), ('copied', 0.075), ('means', 0.075), ('normalized', 0.073), ('people', 0.073), ('write', 0.073), ('records', 0.072), ('memcache', 0.071), ('revolutionary', 0.071), ('tags', 0.07), ('requests', 0.069), ('columns', 0.068), ('presentation', 0.067), ('items', 0.067), ('kevin', 0.064), ('responsive', 0.064), ('eventually', 0.062), ('add', 0.058), ('avoid', 0.057), ('queuing', 0.057), ('combined', 0.056), ('gavethis', 0.055), ('globals', 0.055), ('architecturefor', 0.055), ('coders', 0.055), ('conforms', 0.055), ('storesan', 0.055), ('thecap', 0.055), ('database', 0.053), ('language', 0.052), ('presentationat', 0.051), ('kris', 0.051), ('adistributed', 0.051)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000005 512 high scalability-2009-02-14-Scaling Digg and Other Web Applications
Introduction: Joe Stump, Lead Architect at Digg, gave this presentation at the Web 2.0 Expo. I couldn't find the actual presentation, but fortunately Kris Jordan took some great notes. That's how key moments in history are accidentally captured forever. Joe was also kind enough to respond to my email questions with a phone call. In this first part of the post Joe shares some timeless wisdom that you may or may not have read before. I of course take some pains to extract all the wit from the original presentation in favor of simple rules. What really struck me however was how Joe thought MemcacheDB Will be the biggest new kid on the block in scaling . MemcacheDB has been around for a little while and I've never thought of it in that way. Well learn why Joe is so excited by MemcacheDB at the end of the post. Impressive Stats 80th-100th largest site in the world 26 million uniques a month 30 million users. Uniques are only half that traffic. Traffic = unique web visitors + APIs + Digg
2 0.37434012 554 high scalability-2009-04-04-Digg Architecture
Introduction: Update 4: : Introducing Digg’s IDDB Infrastructure by Joe Stump. IDDB is a way to partition both indexes (e.g. integer sequences and unique character indexes) and actual tables across multiple storage servers (MySQL and MemcacheDB are currently supported with more to follow). Update 3: : Scaling Digg and Other Web Applications . Update 2: : How Digg Works and How Digg Really Works (wear ear plugs). Brought to you straight from Digg's blog. A very succinct explanation of the major elements of the Digg architecture while tracing a request through the system. I've updated this profile with the new information. Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades . Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. How does Digg handle billions of req
3 0.23107356 721 high scalability-2009-10-13-Why are Facebook, Digg, and Twitter so hard to scale?
Introduction: Real-time social graphs (connectivity between people, places, and things). That's why scaling Facebook is hard says Jeff Rothschild , Vice President of Technology at Facebook. Social networking sites like Facebook, Digg, and Twitter are simply harder than traditional websites to scale. Why is that? Why would social networking sites be any more difficult to scale than traditional web sites? Let's find out. Traditional websites are easier to scale than social networking sites for two reasons: They usually access only their own data and common cached data. Only 1-2% of users are active on the site at one time. Imagine a huge site like Yahoo. When you come to Yahoo they can get your profile record with one get and that's enough to build your view of the website for you. It's relatively straightforward to scale systems based around single records using distributed hashing schemes . And since only a few percent of the people are on the site at once it takes comparatively little
4 0.22324337 833 high scalability-2010-06-01-Sponsored Post: Get Your High Scalability Fix at Digg
Introduction: Get Your High Scalability Fix at Digg Interested in working on cutting-edge high-scale infrastructure at Digg? We're making a big investment in scaling and have committed to the NoSQL (Not o nly SQL) path with Cassandra . We're using other open-source infrastructure to help us scale including Hadoop, RabbitMQ, Zookeeper, Thrift, HDFS and Lucene. We're rewriting Digg from the ground up and we need amazing developers to join our world-class team. If you think you are up for the challenge, or you know someone who might be, take a look at our jobs page for more information.
5 0.1943974 799 high scalability-2010-03-23-Digg: 4000% Performance Increase by Sorting in PHP Rather than MySQL
Introduction: O'Reilly Radar's James Turner conducted a very informative interview with Joe Stump, current CTO of SimpleGeo and former lead architect at Digg , in which Joe makes some of his usually insightful comments on his experience using Cassandra vs MySQL. As Digg started out with a MySQL oriented architecture and has recently been moving full speed to Cassandra, his observations on some of their lessons learned and the motivation for the move are especially valuable. Here are some of the key takeaways you find useful: Precompute on writes, make reads fast . This is an oldie as a scaling strategy, but it's valuable to see how SimpleGeo is applying it to their problem of finding entities within a certain geographical region. Using Cassandra they've built two clusters: one for indexes and one for records. The records cluster, as you might imagine, is a simple data lookup. The index cluster has a carefully constructed key for every lookup scenario. The indexes are computed on the wr
6 0.16664572 1356 high scalability-2012-11-07-Gone Fishin': 10 Ways to Take your Site from One to One Million Users by Kevin Rose
7 0.1624615 715 high scalability-2009-10-06-10 Ways to Take your Site from One to One Million Users by Kevin Rose
8 0.15820575 865 high scalability-2010-07-27-A Metric A$$-Ton of Joe Stump: The Cloud is Cheaper than Bare Metal
9 0.15603337 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
10 0.15072238 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
11 0.14783004 351 high scalability-2008-07-16-The Mother of All Database Normalization Debates on Coding Horror
12 0.14687432 858 high scalability-2010-07-13-Sponsored Post: VoltDB and Digg are Hiring
13 0.14441939 511 high scalability-2009-02-12-MySpace Architecture
14 0.14438559 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
15 0.14377916 687 high scalability-2009-08-24-How Google Serves Data from Multiple Datacenters
17 0.13966726 33 high scalability-2007-07-26-ThemBid Architecture
18 0.13789223 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
19 0.13744338 808 high scalability-2010-04-12-Poppen.de Architecture
20 0.13660386 796 high scalability-2010-03-16-Justin.tv's Live Video Broadcasting Architecture
topicId topicWeight
[(0, 0.265), (1, 0.147), (2, -0.044), (3, -0.103), (4, 0.07), (5, 0.044), (6, -0.048), (7, -0.045), (8, 0.022), (9, 0.003), (10, -0.006), (11, 0.05), (12, -0.041), (13, 0.025), (14, -0.0), (15, -0.043), (16, -0.047), (17, 0.035), (18, -0.007), (19, 0.037), (20, -0.001), (21, 0.073), (22, 0.049), (23, 0.027), (24, -0.012), (25, 0.014), (26, -0.008), (27, 0.043), (28, -0.004), (29, -0.119), (30, 0.088), (31, -0.048), (32, -0.04), (33, 0.02), (34, 0.032), (35, -0.033), (36, -0.026), (37, 0.019), (38, -0.015), (39, -0.026), (40, -0.07), (41, -0.063), (42, -0.061), (43, -0.03), (44, 0.022), (45, -0.074), (46, 0.031), (47, -0.016), (48, 0.006), (49, -0.047)]
simIndex simValue blogId blogTitle
same-blog 1 0.94827968 512 high scalability-2009-02-14-Scaling Digg and Other Web Applications
Introduction: Joe Stump, Lead Architect at Digg, gave this presentation at the Web 2.0 Expo. I couldn't find the actual presentation, but fortunately Kris Jordan took some great notes. That's how key moments in history are accidentally captured forever. Joe was also kind enough to respond to my email questions with a phone call. In this first part of the post Joe shares some timeless wisdom that you may or may not have read before. I of course take some pains to extract all the wit from the original presentation in favor of simple rules. What really struck me however was how Joe thought MemcacheDB Will be the biggest new kid on the block in scaling . MemcacheDB has been around for a little while and I've never thought of it in that way. Well learn why Joe is so excited by MemcacheDB at the end of the post. Impressive Stats 80th-100th largest site in the world 26 million uniques a month 30 million users. Uniques are only half that traffic. Traffic = unique web visitors + APIs + Digg
2 0.84178436 554 high scalability-2009-04-04-Digg Architecture
Introduction: Update 4: : Introducing Digg’s IDDB Infrastructure by Joe Stump. IDDB is a way to partition both indexes (e.g. integer sequences and unique character indexes) and actual tables across multiple storage servers (MySQL and MemcacheDB are currently supported with more to follow). Update 3: : Scaling Digg and Other Web Applications . Update 2: : How Digg Works and How Digg Really Works (wear ear plugs). Brought to you straight from Digg's blog. A very succinct explanation of the major elements of the Digg architecture while tracing a request through the system. I've updated this profile with the new information. Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades . Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. How does Digg handle billions of req
3 0.75435442 721 high scalability-2009-10-13-Why are Facebook, Digg, and Twitter so hard to scale?
Introduction: Real-time social graphs (connectivity between people, places, and things). That's why scaling Facebook is hard says Jeff Rothschild , Vice President of Technology at Facebook. Social networking sites like Facebook, Digg, and Twitter are simply harder than traditional websites to scale. Why is that? Why would social networking sites be any more difficult to scale than traditional web sites? Let's find out. Traditional websites are easier to scale than social networking sites for two reasons: They usually access only their own data and common cached data. Only 1-2% of users are active on the site at one time. Imagine a huge site like Yahoo. When you come to Yahoo they can get your profile record with one get and that's enough to build your view of the website for you. It's relatively straightforward to scale systems based around single records using distributed hashing schemes . And since only a few percent of the people are on the site at once it takes comparatively little
4 0.74430048 1507 high scalability-2013-08-26-Reddit: Lessons Learned from Mistakes Made Scaling to 1 Billion Pageviews a Month
Introduction: Jeremy Edberg , the first paid employee at reddit, teaches us a lot about how to create a successful social site in a really good talk he gave at the RAMP conference. Watch it here at Scaling Reddit from 1 Million to 1 Billion–Pitfalls and Lessons . Jeremy uses a virtue and sin approach. Examples of the mistakes made in scaling reddit are shared and it turns out they did a lot of good stuff too. Somewhat of a shocker is that Jeremy is now a Reliability Architect at Netflix, so we get a little Netflix perspective thrown in for free. Some of the lessons that stood out most for me: Think of SSDs as cheap RAM, not expensive disk . When reddit moved from spinning disks to SSDs for the database the number of servers was reduced from 12 to 1 with a ton of headroom. SSDs are 4x more expensive but you get 16x the performance. Worth the cost. Give users a little bit of power, see what they do with it, and turn the good stuff into features . One of the biggest revelations
5 0.7355541 5 high scalability-2007-07-10-mixi.jp Architecture
Introduction: Mixi is a fast growing social networking site in Japan. They provide services like: diary, community, message, review, and photo album. Having a lot in common with LiveJournal they also developed many of the same approaches. Their write up on how they scaled their system is easily one of the best out there. Site: http://mixi.jp Information Sources mixi.jp - scaling out with open source Platform Linux Apache MySQL Perl Memcached Squid Shard What's Inside? They grew to approximately 4 million users in two years and add over 15,000 new users/day. Ranks 35th on Alexa and 3rd in Japan. More than 100 MySQL servers Add more than 10 servers/month Use non-persistent connections. Diary traffic is 85% read and 15% write. Message traffic is is 75% read and 25% write. Ran into replication performance problems so they had to split the database. Considered splitting vertically by user or splitting horizontally by table type. The ende
6 0.71537089 799 high scalability-2010-03-23-Digg: 4000% Performance Increase by Sorting in PHP Rather than MySQL
7 0.71272391 808 high scalability-2010-04-12-Poppen.de Architecture
8 0.70557338 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
9 0.69937187 1345 high scalability-2012-10-22-Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale
10 0.6984297 106 high scalability-2007-10-02-Secrets to Fotolog's Scaling Success
11 0.69417191 1508 high scalability-2013-08-28-Sean Hull's 20 Biggest Bottlenecks that Reduce and Slow Down Scalability
12 0.68660659 936 high scalability-2010-11-09-Facebook Uses Non-Stored Procedures to Update Social Graphs
14 0.68123484 152 high scalability-2007-11-13-Flickr Architecture
15 0.68108624 828 high scalability-2010-05-17-7 Lessons Learned While Building Reddit to 270 Million Page Views a Month
16 0.68008512 344 high scalability-2008-06-09-FaceStat's Rousing Tale of Scaling Woe and Wisdom Won
17 0.67863679 1171 high scalability-2012-01-09-The Etsy Saga: From Silos to Happy to Billions of Pageviews a Month
18 0.67429751 33 high scalability-2007-07-26-ThemBid Architecture
19 0.67031288 1304 high scalability-2012-08-14-MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) and Familiar (SQL)
20 0.66926152 1080 high scalability-2011-07-15-Stuff The Internet Says On Scalability For July 15, 2011
topicId topicWeight
[(1, 0.142), (2, 0.272), (10, 0.025), (30, 0.039), (41, 0.091), (47, 0.015), (61, 0.073), (73, 0.014), (77, 0.016), (79, 0.113), (85, 0.04), (94, 0.053)]
simIndex simValue blogId blogTitle
1 0.9839474 1019 high scalability-2011-04-08-Stuff The Internet Says On Scalability For April 8, 2011
Introduction: Submitted for your reading pleasure on this tomato killing frosty morn... Now we really know why vampires feed on blood...they are elastically acquiring more compute power. Your Next Computer May Be Made of...Blood! It's those memristors again. Ancient vamps are really just giant super computers. Twitter now at 155 million tweets a day , up from 55 million a year ago. 10,000-core Linux supercomputer built in Amazon cloud By Jon Brodkin. T he 10,000 cores were composed of 1,250 instances with eight cores each, as well as 8.75TB of RAM and 2PB disk space. The cluster ran for eight hours at a cost of $8,500 . Quotable Quotes for $273 Alex: @davidklemke : Holy balls Windows Azure Tables is awesome. Man am I regretting not getting into this cloud stuff sooner, it's scalability heaven. @nik : The volume of tweets we are flowing into HBase is truly staggering #bigdata #datasift @wattersjames : One of the key points I mentioned before: Scalability is being abl
2 0.9741379 1454 high scalability-2013-05-08-Typesafe Interview: Scala + Akka is an IaaS for Your Process Architecture
Introduction: This is an email interview with Viktor Klang , Director of Engineering at Typesafe , on the Scala Futures model & Akka, both topics on which is he is immensely passionate and knowledgeable. How do you structure your application? That’s the question I explored in the article Beyond Threads And Callbacks . An option I did not talk about, mostly because of my own ignorance, is a powerful stack you may not be all that familiar with: Scala and Akka. To remedy my oversight is our acting tour guide, Typesafe’s Viktor Klang, long time Scala hacker and Java enterprise systems architect. Viktor was very patient in answering my questions and was enthusiastic about sharing his knowledge. He’s a guy who definitely knows what he is talking about. I’ve implemented several Actor systems along with the messaging infrastructure, threading, async IO, service orchestration, failover, etc, so I’m innately skeptical about frameworks that remove control from the programmer at
Introduction: Joseph Smarr, former CTO of Plaxo (which explains why I recognized his picture), in I'm a technical lead on the Google+ team. Ask me anything , reveals the stack used for building Google+: Our stack is pretty standard fare for Google apps these days: we use Java servlets for our server code and JavaScript for the browser-side of the UI, largely built with the (open-source) Closure framework, including Closure's JavaScript compiler and template system. A couple nifty tricks we do: we use the HTML5 History API to maintain pretty-looking URLs even though it's an AJAX app (falling back on hash-fragments for older browsers); and we often render our Closure templates server-side so the page renders before any JavaScript is loaded, then the JavaScript finds the right DOM nodes and hooks up event handlers, etc. to make it responsive (as a result, if you're on a slow connection and you click on stuff really fast, you may notice a lag before it does anything, but luckily most people don't run
same-blog 4 0.96848375 512 high scalability-2009-02-14-Scaling Digg and Other Web Applications
Introduction: Joe Stump, Lead Architect at Digg, gave this presentation at the Web 2.0 Expo. I couldn't find the actual presentation, but fortunately Kris Jordan took some great notes. That's how key moments in history are accidentally captured forever. Joe was also kind enough to respond to my email questions with a phone call. In this first part of the post Joe shares some timeless wisdom that you may or may not have read before. I of course take some pains to extract all the wit from the original presentation in favor of simple rules. What really struck me however was how Joe thought MemcacheDB Will be the biggest new kid on the block in scaling . MemcacheDB has been around for a little while and I've never thought of it in that way. Well learn why Joe is so excited by MemcacheDB at the end of the post. Impressive Stats 80th-100th largest site in the world 26 million uniques a month 30 million users. Uniques are only half that traffic. Traffic = unique web visitors + APIs + Digg
5 0.96182197 1267 high scalability-2012-06-18-The Clever Ways Chrome Hides Latency by Anticipating Your Every Need
Introduction: Ilya Grigorik has written another wonderful article lavishly detailing the extraordinary tactics Chrome employs to hide network latency from users: Chrome Networking: DNS Prefetch & TCP Preconnect . Ilya springs some surpising factoids on us, revealing how the web has slowed and super sized: The size of an average page has grown to 1059kB and is now composed of over 80 subresource requests . An average DNS lookup takes between 60 and 120ms. This creates a 100-200ms of latency before a request can be sent because of th full round-trip (RTT) to perform the TCP handshake. Slow mobile experiences are largely due to the much higher RTT's (200-1000ms) on wireless networks. Reducing the number of outbound connections and the total byte size of your pages is the single best optimization you can make for mobile today. Chrome reduces apparent latency using a host of clever anticipatory mechanisms: Learns the network topology as you use it via a Predictor object that anticipate
6 0.96145475 1345 high scalability-2012-10-22-Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale
7 0.96078831 1429 high scalability-2013-03-25-AppBackplane - A Framework for Supporting Multiple Application Architectures
8 0.95981961 1245 high scalability-2012-05-14-DynamoDB Talk Notes and the SSD Hot S3 Cold Pattern
10 0.95904148 274 high scalability-2008-03-12-YouTube Architecture
11 0.95881158 1007 high scalability-2011-03-18-Stuff The Internet Says On Scalability For March 18, 2011
12 0.95862961 1509 high scalability-2013-08-30-Stuff The Internet Says On Scalability For August 30, 2013
13 0.95818174 554 high scalability-2009-04-04-Digg Architecture
14 0.95757627 1602 high scalability-2014-02-26-The WhatsApp Architecture Facebook Bought For $19 Billion
15 0.9574548 674 high scalability-2009-08-07-The Canonical Cloud Architecture
16 0.95732582 350 high scalability-2008-07-15-ZooKeeper - A Reliable, Scalable Distributed Coordination System
17 0.95688379 1596 high scalability-2014-02-14-Stuff The Internet Says On Scalability For February 14th, 2014
18 0.9561578 1436 high scalability-2013-04-05-Stuff The Internet Says On Scalability For April 5, 2013
19 0.95598149 1171 high scalability-2012-01-09-The Etsy Saga: From Silos to Happy to Billions of Pageviews a Month
20 0.95561129 1637 high scalability-2014-04-25-Stuff The Internet Says On Scalability For April 25th, 2014