high_scalability high_scalability-2009 high_scalability-2009-554 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Update 4: : Introducing Digg’s IDDB Infrastructure by Joe Stump. IDDB is a way to partition both indexes (e.g. integer sequences and unique character indexes) and actual tables across multiple storage servers (MySQL and MemcacheDB are currently supported with more to follow). Update 3: : Scaling Digg and Other Web Applications . Update 2: : How Digg Works and How Digg Really Works (wear ear plugs). Brought to you straight from Digg's blog. A very succinct explanation of the major elements of the Digg architecture while tracing a request through the system. I've updated this profile with the new information. Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades . Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. How does Digg handle billions of req
sentIndex sentText sentNum sentScore
1 integer sequences and unique character indexes) and actual tables across multiple storage servers (MySQL and MemcacheDB are currently supported with more to follow). [sent-4, score-0.323]
2 Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades . [sent-10, score-0.671]
3 Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. [sent-11, score-0.596]
4 0 using the default MyISAM storage engine Over 22 million users. [sent-18, score-0.17]
5 230 million plus page views per month 26 million unique visitors per month Several billion page views per month None of the scaling challenges faced had anything to do with PHP. [sent-19, score-1.003]
6 Six specialized graph database servers to run the Recommendation Engine. [sent-23, score-0.294]
7 What's Inside Specialized load balancer appliances monitor the application servers, handle failover, constantly adjust the cluster according to health, balance incoming requests and caching JavaScript, CSS and images. [sent-25, score-0.12]
8 Application servers consist of: Apache+PHP, Memcached, Gearman and other daemons. [sent-28, score-0.134]
9 They had problems with their storage system telling them writes were on disk when they really weren't. [sent-44, score-0.126]
10 To lighten their database load they used the APC PHP accelerator MCache. [sent-48, score-0.148]
11 Memcached is used for caching and memcached servers seemed to be spread across their database and application servers. [sent-49, score-0.162]
12 A specialized daemon monitors connections and kills connections that have been open too long. [sent-50, score-0.132]
13 On a page's first load the PHP code is compiles so any subsequent page loads are very fast. [sent-52, score-0.193]
14 A specialized Recommendation Engine service was built to act as their distributed graph database. [sent-55, score-0.132]
15 Recommendations didn't fit will with the relational model so they made a specialized service. [sent-60, score-0.132]
16 At some point in their growth curve they were unable to grow by adding RAM so had to grow through architecture. [sent-64, score-0.144]
17 This is perhaps due to their large javascript libraries rather than their backend architecture. [sent-66, score-0.116]
18 Engineers often have a bunch of cool features they want to release, but those features can kill an infrastructure if that infrastructure doesn't grow along with the features. [sent-70, score-0.35]
19 You have to wonder if by limiting new features to match their infrastructure might Digg lose ground to other faster moving social bookmarking services? [sent-73, score-0.212]
20 Perhaps if the infrastructure was more easily scaled they could add features faster which would help them compete better? [sent-74, score-0.139]
wordName wordTfidf (topN-words)
[('digg', 0.541), ('myisam', 0.25), ('php', 0.22), ('page', 0.134), ('specialized', 0.132), ('icons', 0.125), ('views', 0.115), ('million', 0.11), ('recommendation', 0.099), ('innodb', 0.091), ('database', 0.089), ('faced', 0.083), ('careful', 0.08), ('features', 0.08), ('apache', 0.074), ('recommendations', 0.074), ('servers', 0.073), ('architecturean', 0.073), ('ear', 0.073), ('worksby', 0.073), ('articleslinkedin', 0.073), ('bookmarking', 0.073), ('plugs', 0.073), ('grow', 0.072), ('unique', 0.07), ('visitors', 0.069), ('unsuspecting', 0.068), ('writes', 0.066), ('statsstarted', 0.065), ('appearance', 0.065), ('infrastructureby', 0.065), ('mysql', 0.063), ('plus', 0.063), ('emphasizes', 0.063), ('learnedthe', 0.063), ('apc', 0.063), ('succinct', 0.063), ('balance', 0.061), ('consist', 0.061), ('sequences', 0.061), ('storage', 0.06), ('infrastructure', 0.059), ('fastcgi', 0.059), ('integer', 0.059), ('mogilefs', 0.059), ('famously', 0.059), ('memcachedb', 0.059), ('load', 0.059), ('perhaps', 0.058), ('javascript', 0.058)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999958 554 high scalability-2009-04-04-Digg Architecture
Introduction: Update 4: : Introducing Digg’s IDDB Infrastructure by Joe Stump. IDDB is a way to partition both indexes (e.g. integer sequences and unique character indexes) and actual tables across multiple storage servers (MySQL and MemcacheDB are currently supported with more to follow). Update 3: : Scaling Digg and Other Web Applications . Update 2: : How Digg Works and How Digg Really Works (wear ear plugs). Brought to you straight from Digg's blog. A very succinct explanation of the major elements of the Digg architecture while tracing a request through the system. I've updated this profile with the new information. Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades . Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. How does Digg handle billions of req
2 0.37434012 512 high scalability-2009-02-14-Scaling Digg and Other Web Applications
Introduction: Joe Stump, Lead Architect at Digg, gave this presentation at the Web 2.0 Expo. I couldn't find the actual presentation, but fortunately Kris Jordan took some great notes. That's how key moments in history are accidentally captured forever. Joe was also kind enough to respond to my email questions with a phone call. In this first part of the post Joe shares some timeless wisdom that you may or may not have read before. I of course take some pains to extract all the wit from the original presentation in favor of simple rules. What really struck me however was how Joe thought MemcacheDB Will be the biggest new kid on the block in scaling . MemcacheDB has been around for a little while and I've never thought of it in that way. Well learn why Joe is so excited by MemcacheDB at the end of the post. Impressive Stats 80th-100th largest site in the world 26 million uniques a month 30 million users. Uniques are only half that traffic. Traffic = unique web visitors + APIs + Digg
3 0.36035022 833 high scalability-2010-06-01-Sponsored Post: Get Your High Scalability Fix at Digg
Introduction: Get Your High Scalability Fix at Digg Interested in working on cutting-edge high-scale infrastructure at Digg? We're making a big investment in scaling and have committed to the NoSQL (Not o nly SQL) path with Cassandra . We're using other open-source infrastructure to help us scale including Hadoop, RabbitMQ, Zookeeper, Thrift, HDFS and Lucene. We're rewriting Digg from the ground up and we need amazing developers to join our world-class team. If you think you are up for the challenge, or you know someone who might be, take a look at our jobs page for more information.
4 0.21562031 621 high scalability-2009-06-06-Graph server
Introduction: I've seen mentioned in few times sites like Digg or LinkedIn using graph servers to hold their social graphs. But the only sort of open source graph server I've found is http://neo4j.org/ . Can anyone recommend an open source graph server? Thanks Aaron
5 0.21268988 858 high scalability-2010-07-13-Sponsored Post: VoltDB and Digg are Hiring
Introduction: Who's Hiring? VoltDB is Hiring Get Your High Scalability Fix at Digg VoltDB Field/Community Engineer VoltDB is attracting more and more users every day. If you have a strong technical background in SQL and Linux, are experienced with production database deployments, and have a passion for customers and community, you could be just the person we are looking for. Are you excited about the prospect of working with users to develop and deploy VoltDB applications, and about helping users participate in the thriving VoltDB community? If so, read on at their job page . Get Your High Scalability Fix at Digg Interested in working on cutting-edge high-scale infrastructure at Digg? We're making a big investment in scaling and have committed to the NoSQL (Not only SQL) path with Cassandra . We're using other open-source infrastructure to help us scale including Hadoop, RabbitMQ, Zookeeper, Thrift, HDFS and Lucene. We're rewriting Digg from the ground up and we need
6 0.1982508 721 high scalability-2009-10-13-Why are Facebook, Digg, and Twitter so hard to scale?
8 0.17018582 33 high scalability-2007-07-26-ThemBid Architecture
9 0.16469802 808 high scalability-2010-04-12-Poppen.de Architecture
10 0.16057068 638 high scalability-2009-06-26-PlentyOfFish Architecture
11 0.16047792 1361 high scalability-2012-11-22-Gone Fishin': PlentyOfFish Architecture
12 0.15984155 793 high scalability-2010-03-10-Saying Yes to NoSQL; Going Steady with Cassandra at Digg
13 0.14797616 1325 high scalability-2012-09-19-The 4 Building Blocks of Architecting Systems for Scale
14 0.14662163 799 high scalability-2010-03-23-Digg: 4000% Performance Increase by Sorting in PHP Rather than MySQL
15 0.14593129 1508 high scalability-2013-08-28-Sean Hull's 20 Biggest Bottlenecks that Reduce and Slow Down Scalability
16 0.14570591 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
17 0.14545371 715 high scalability-2009-10-06-10 Ways to Take your Site from One to One Million Users by Kevin Rose
18 0.14308086 1281 high scalability-2012-07-11-FictionPress: Publishing 6 Million Works of Fiction on the Web
19 0.14207564 1123 high scalability-2011-09-23-The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month…
20 0.14193378 106 high scalability-2007-10-02-Secrets to Fotolog's Scaling Success
topicId topicWeight
[(0, 0.252), (1, 0.113), (2, -0.053), (3, -0.186), (4, 0.074), (5, 0.015), (6, -0.071), (7, -0.083), (8, 0.025), (9, 0.081), (10, -0.006), (11, 0.006), (12, -0.008), (13, -0.011), (14, -0.048), (15, -0.049), (16, -0.045), (17, 0.073), (18, -0.059), (19, 0.093), (20, 0.001), (21, 0.077), (22, -0.012), (23, -0.053), (24, 0.05), (25, 0.057), (26, -0.042), (27, 0.035), (28, -0.031), (29, -0.127), (30, 0.095), (31, -0.011), (32, -0.037), (33, 0.009), (34, 0.053), (35, -0.008), (36, -0.038), (37, 0.014), (38, -0.036), (39, -0.03), (40, -0.108), (41, -0.105), (42, -0.072), (43, -0.021), (44, 0.031), (45, -0.052), (46, 0.026), (47, 0.007), (48, 0.032), (49, -0.055)]
simIndex simValue blogId blogTitle
same-blog 1 0.94424605 554 high scalability-2009-04-04-Digg Architecture
Introduction: Update 4: : Introducing Digg’s IDDB Infrastructure by Joe Stump. IDDB is a way to partition both indexes (e.g. integer sequences and unique character indexes) and actual tables across multiple storage servers (MySQL and MemcacheDB are currently supported with more to follow). Update 3: : Scaling Digg and Other Web Applications . Update 2: : How Digg Works and How Digg Really Works (wear ear plugs). Brought to you straight from Digg's blog. A very succinct explanation of the major elements of the Digg architecture while tracing a request through the system. I've updated this profile with the new information. Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades . Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. How does Digg handle billions of req
2 0.79763001 512 high scalability-2009-02-14-Scaling Digg and Other Web Applications
Introduction: Joe Stump, Lead Architect at Digg, gave this presentation at the Web 2.0 Expo. I couldn't find the actual presentation, but fortunately Kris Jordan took some great notes. That's how key moments in history are accidentally captured forever. Joe was also kind enough to respond to my email questions with a phone call. In this first part of the post Joe shares some timeless wisdom that you may or may not have read before. I of course take some pains to extract all the wit from the original presentation in favor of simple rules. What really struck me however was how Joe thought MemcacheDB Will be the biggest new kid on the block in scaling . MemcacheDB has been around for a little while and I've never thought of it in that way. Well learn why Joe is so excited by MemcacheDB at the end of the post. Impressive Stats 80th-100th largest site in the world 26 million uniques a month 30 million users. Uniques are only half that traffic. Traffic = unique web visitors + APIs + Digg
3 0.70099151 1486 high scalability-2013-07-03-5 Rockin' Tips for Scaling PHP to 30,000 Concurrent Users Per Server
Introduction: Jonathan Block , CTO at RockThePost.com , a crowdfunding company, has written a nice set of tips for smaller sites on how to scale a service on EC2 using a small two person development team. Their service has a typical small scale structure: PHP's Zend Framework 2 Two m1.medium for web servers ELB to split the load master/slave MySQL database Siege for load testing The very sensible tips that can handle 30,000 concurrent users per web server: Use PHP's APC feature . APC is opcode cache that is " really a requirement in order for a website to have a chance at performing well." Put everything that's not a .php request on a CDN . Don't serve static files from your web server. They put everything on S3 and use CloudFront as their CDN. Recent CloudFront problems have caused them to serve directly from S3. Don't make connections to other servers in your PHP code . Making connections to other servers blocks the server and slows down processing. Use the APC k
4 0.69594771 150 high scalability-2007-11-12-Slashdot Architecture - How the Old Man of the Internet Learned to Scale
Introduction: Slashdot effect : overwhelming unprepared sites with an avalanche of reader's clicks after being mentioned on Slashdot. Sure, we now have the "Digg effect" and other hot new stars, but Slashdot was the original. And like many stars from generations past, Slashdot plays the elder statesman's role with with class, dignity, and restraint. Yet with millions and millions of users Slashdot is still box office gold and more than keeps up with the young'ins. And with age comes the wisdom of learning how to handle all those users. Just how does Slashdot scale and what can you learn by going old school? Site: http://slashdot.org Information Sources Slashdot's Setup, Part 1- Hardware Slashdot's Setup, Part 2- Software History of Slashdot Part 3- Going Corporate The History of Slashdot Part 4 - Yesterday, Today, Tomorrow The Platform MySQL Linux (CentOS/RHEL) Pound Apache Perl Memcached LVS The Stats Started building the system in 1999
5 0.69369286 808 high scalability-2010-04-12-Poppen.de Architecture
Introduction: This is a guest a post by Alvaro Videla describing their architecture for Poppen.de , a popular German dating site. This site is very much NSFW, so be careful before clicking on the link. What I found most interesting is how they manage to sucessfully blend a little of the old with a little of the new, using technologies like Nginx, MySQL, CouchDB, and Erlang, Memcached, RabbitMQ, PHP, Graphite, Red5, and Tsung. What is Poppen.de? Poppen.de (NSFW) is the top dating website in Germany, and while it may be a small site compared to giants like Flickr or Facebook, we believe it's a nice architecture to learn from if you are starting to get some scaling problems. The Stats 2.000.000 users 20.000 concurrent users 300.000 private messages per day 250.000 logins per day We have a team of eleven developers, two designers and two sysadmins for this project. Business Model The site works with a freemium model, where users can do for free things like: Search
7 0.67838442 7 high scalability-2007-07-12-FeedBurner Architecture
8 0.6778686 106 high scalability-2007-10-02-Secrets to Fotolog's Scaling Success
9 0.67558616 118 high scalability-2007-10-09-High Load on production Webservers after Sourcecode sync
10 0.67026323 203 high scalability-2008-01-07-How Ruby on Rails Survived a 550k Pageview Digging
11 0.66848618 437 high scalability-2008-11-03-How Sites are Scaling Up for the Election Night Crush
12 0.66304022 356 high scalability-2008-07-22-Scaling Bumper Sticker: A 1 Billion Page Per Month Facebook RoR App
13 0.6600346 884 high scalability-2010-08-23-6 Ways to Kill Your Servers - Learning How to Scale the Hard Way
14 0.65622085 261 high scalability-2008-02-25-Make Your Site Run 10 Times Faster
15 0.65277565 711 high scalability-2009-09-22-How Ravelry Scales to 10 Million Requests Using Rails
16 0.64944094 1220 high scalability-2012-04-02-YouPorn - Targeting 200 Million Views a Day and Beyond
17 0.6469022 519 high scalability-2009-02-23-Database Sharding at Netlog, with MySQL and PHP
18 0.64636046 1188 high scalability-2012-02-06-The Design of 99designs - A Clean Tens of Millions Pageviews Architecture
19 0.64194995 52 high scalability-2007-08-01-Product: Memcached
20 0.64174718 313 high scalability-2008-05-02-Friends for Sale Architecture - A 300 Million Page View-Month Facebook RoR App
topicId topicWeight
[(1, 0.114), (2, 0.245), (10, 0.026), (23, 0.134), (30, 0.046), (40, 0.018), (61, 0.1), (77, 0.012), (79, 0.123), (85, 0.065), (94, 0.048)]
simIndex simValue blogId blogTitle
1 0.95756614 1105 high scalability-2011-08-25-The Cloud and The Consumer: The Impact on Bandwidth and Broadband
Introduction: Cloud-based services for all things digital will either drive – or die by – bandwidth Consumers, by definition, consume. In the realm of the Internet, they consume far more than they produce. Or so it’s been in the past. Broadband connectivity across all providers have long offered asymmetric network feeds because it mirrored reality: an HTTP request is significantly smaller than its corresponding response, and in general web-based activity is heavily biased toward fat download and thin upload speeds. The term “broadband” is really a misnomer, as it focuses only on the download speed and ignores the very narrowband of a typical consumer’s upload speed. cloud computing , or to be more accurate, cloud-hosted services aimed at consumers may very well change the status quo by necessity. As providers continue to push the notion of storing all things digital “in the cloud”, network providers must consider the impact on them – and the satisfaction of their customer base with performa
2 0.9568969 654 high scalability-2009-07-09-No to SQL? Anti-database movement gains steam – My Take
Introduction: In this post i wrote my view on the anti SQL database movement and where the alternative approach fits in: - SQL databases are not going away anytime soon. - The current "one size fit it all" databases thinking was and is wrong. - There is definitely a place for a more a more specialized data management solutions alongside traditional SQL databases. In addition to the options that was mentioned on the original article i pointed out the the in-memory alternative approach and how that fits into the puzzle. I used a real life scenario: scalable Social network based eCommerce site where i outlined how in-memory approach was the only option they could scale and meet their application performance and response time requirements.
3 0.95032501 979 high scalability-2011-01-27-Comet - An Example of the New Key-Code Databases
Introduction: Comet is an active distributed key-value store built at the University of Washington. The paper describing Comet is Comet: An active distributed key-value store , there are also slides , and a MP3 of a presentation given at OSDI '10 . Here's a succinct overview of Comet : Today's cloud storage services, such as Amazon S3 or peer-to-peer DHTs, are highly inflexible and impose a variety of constraints on their clients: specific replication and consistency schemes, fixed data timeouts, limited logging, etc. We witnessed such inflexibility first-hand as part of our Vanish work, where we used a DHT to store encryption keys temporarily. To address this issue, we built Comet, an extensible storage service that allows clients to inject snippets of code that control their data's behavior inside the storage service. I found this paper quite interesting because it takes the initial steps of collocating code with a key-value store, which turns it into what might called a key-code
4 0.94303709 1529 high scalability-2013-10-08-F1 and Spanner Holistically Compared
Introduction: This aricle, F1: A Distributed SQL Database That Scales by Srihari Srinivasan , is republished with permission from a blog you really should follow: Systems We Make - Curating Complex Distributed Systems. With both the F1 and Spanner papers out its now possible to understand their interplay a bit holistically. So lets start by revisiting the key goals of both systems. Key Goals of F1′s design System must be able to scale up by adding resources Ability to re-shard and rebalance data without application changes ACID consistency for transactions Full SQL support, support for indexes Spanner’s objectives Main focus is on managing cross data center replicated data Ability to re-shard and rebalance data Automatically migrates data across machines F1 – An overview F1 is built on top of Spanner. Spanner offers support for for features such as – strong consistency through distributed transactions (2PC), global ordering based on timestam
same-blog 5 0.94208235 554 high scalability-2009-04-04-Digg Architecture
Introduction: Update 4: : Introducing Digg’s IDDB Infrastructure by Joe Stump. IDDB is a way to partition both indexes (e.g. integer sequences and unique character indexes) and actual tables across multiple storage servers (MySQL and MemcacheDB are currently supported with more to follow). Update 3: : Scaling Digg and Other Web Applications . Update 2: : How Digg Works and How Digg Really Works (wear ear plugs). Brought to you straight from Digg's blog. A very succinct explanation of the major elements of the Digg architecture while tracing a request through the system. I've updated this profile with the new information. Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades . Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. How does Digg handle billions of req
6 0.93935025 1559 high scalability-2013-12-06-Stuff The Internet Says On Scalability For December 6th, 2013
7 0.93587399 1109 high scalability-2011-09-02-Stuff The Internet Says On Scalability For September 2, 2011
9 0.93429732 7 high scalability-2007-07-12-FeedBurner Architecture
10 0.93422455 990 high scalability-2011-02-15-Wordnik - 10 million API Requests a Day on MongoDB and Scala
11 0.9173544 1020 high scalability-2011-04-12-Caching and Processing 2TB Mozilla Crash Reports in memory with Hazelcast
12 0.91562736 849 high scalability-2010-06-28-VoltDB Decapitates Six SQL Urban Myths and Delivers Internet Scale OLTP in the Process
13 0.91553754 960 high scalability-2010-12-20-Netflix: Use Less Chatty Protocols in the Cloud - Plus 26 Fixes
14 0.91362685 1460 high scalability-2013-05-17-Stuff The Internet Says On Scalability For May 17, 2013
15 0.91167253 589 high scalability-2009-05-05-Drop ACID and Think About Data
16 0.91127819 1600 high scalability-2014-02-21-Stuff The Internet Says On Scalability For February 21st, 2014
17 0.91039771 1509 high scalability-2013-08-30-Stuff The Internet Says On Scalability For August 30, 2013
18 0.90996766 1439 high scalability-2013-04-12-Stuff The Internet Says On Scalability For April 12, 2013
19 0.90972203 1112 high scalability-2011-09-07-What Google App Engine Price Changes Say About the Future of Web Architecture
20 0.90895051 1436 high scalability-2013-04-05-Stuff The Internet Says On Scalability For April 5, 2013