high_scalability high_scalability-2010 high_scalability-2010-908 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: The BBC's iPlayer site averages 8 million page views a day for 1.3 million users. Technical Architect Simon Frost describes how they scaled their site in Scaling the BBC iPlayer to handle demand : Use frameworks . Frameworks support component based development which makes it convenient for team development, but can introduce delays that have to be minimized. Zend/PHP is used because it supports components and is easy to recruit for. MySQL is used for program metadata. CouchDB is used for key-value access for fast read/write of user-focused data. Prove architecture before building it . Eliminate guesswork by coming up with alternate architectures and create prototypes to determine which option works best. Balance performance with factors like ease of development. Cache a lot . Data is cached in memcached for a few seconds to minutes. Short cache invalidation periods keep the data up to date for the users, but even these short periods make a huge difference in performance.
sentIndex sentText sentNum sentScore
1 The BBC's iPlayer site averages 8 million page views a day for 1. [sent-1, score-0.424]
2 Technical Architect Simon Frost describes how they scaled their site in Scaling the BBC iPlayer to handle demand : Use frameworks . [sent-3, score-0.338]
3 Frameworks support component based development which makes it convenient for team development, but can introduce delays that have to be minimized. [sent-4, score-0.365]
4 Zend/PHP is used because it supports components and is easy to recruit for. [sent-5, score-0.271]
5 CouchDB is used for key-value access for fast read/write of user-focused data. [sent-7, score-0.116]
6 Eliminate guesswork by coming up with alternate architectures and create prototypes to determine which option works best. [sent-9, score-0.314]
7 Data is cached in memcached for a few seconds to minutes. [sent-12, score-0.215]
8 Short cache invalidation periods keep the data up to date for the users, but even these short periods make a huge difference in performance. [sent-13, score-0.937]
9 Much of the invalidation is time or action-based (e. [sent-16, score-0.225]
10 Break the page into personalised and standard components . [sent-19, score-0.296]
11 A common main page is created so that it can be cached separately from personalized data. [sent-20, score-0.695]
12 Varnish's flexible caching policies are used to cache these elements. [sent-23, score-0.393]
13 User favorite lists are cached for as little as a few minutes. [sent-24, score-0.377]
14 Pages are served out of two data centers for high availability. [sent-28, score-0.072]
wordName wordTfidf (topN-words)
[('iplayer', 0.31), ('bbc', 0.276), ('personalized', 0.238), ('invalidation', 0.225), ('cached', 0.215), ('periods', 0.202), ('varnish', 0.185), ('personalised', 0.165), ('favourite', 0.155), ('recruit', 0.155), ('frost', 0.155), ('smoother', 0.148), ('inscaling', 0.143), ('page', 0.131), ('alternate', 0.121), ('prototypes', 0.121), ('averages', 0.119), ('simon', 0.117), ('used', 0.116), ('cache', 0.113), ('separately', 0.111), ('short', 0.106), ('viewing', 0.105), ('convenient', 0.105), ('site', 0.103), ('delays', 0.103), ('elements', 0.092), ('factors', 0.091), ('date', 0.089), ('policies', 0.087), ('html', 0.085), ('loaded', 0.084), ('favorite', 0.083), ('couchdb', 0.081), ('ease', 0.081), ('introduce', 0.08), ('scaled', 0.079), ('lists', 0.079), ('describes', 0.078), ('frameworks', 0.078), ('development', 0.077), ('horizontal', 0.077), ('caching', 0.077), ('focused', 0.075), ('adds', 0.074), ('served', 0.072), ('eliminate', 0.072), ('determine', 0.072), ('million', 0.071), ('pool', 0.07)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 908 high scalability-2010-09-28-6 Strategies for Scaling BBC iPlayer
Introduction: The BBC's iPlayer site averages 8 million page views a day for 1.3 million users. Technical Architect Simon Frost describes how they scaled their site in Scaling the BBC iPlayer to handle demand : Use frameworks . Frameworks support component based development which makes it convenient for team development, but can introduce delays that have to be minimized. Zend/PHP is used because it supports components and is easy to recruit for. MySQL is used for program metadata. CouchDB is used for key-value access for fast read/write of user-focused data. Prove architecture before building it . Eliminate guesswork by coming up with alternate architectures and create prototypes to determine which option works best. Balance performance with factors like ease of development. Cache a lot . Data is cached in memcached for a few seconds to minutes. Short cache invalidation periods keep the data up to date for the users, but even these short periods make a huge difference in performance.
2 0.20209405 447 high scalability-2008-11-19-High Definition Video Delivery on the Web?
Introduction: How would you architect and implement an SD and HD internet video delivery system such as the BBC iPlayer or Recast Digital's RDV1 . What do you need to consider on top of the Lessons Learned section in the YouTube Architecture post? How is it possible to compete with the big players like Google? Can you just use a CDN and scale efficiently? Would Amazon's cloud services be a viable platform for high-definition video streaming?
3 0.17406692 903 high scalability-2010-09-17-Hot Scalability Links For Sep 17, 2010
Introduction: Disqus - Scaling the Worlds Largest Django App. Interesting overview of a commenting system with 75 million comments and 250 million visitors. Lots of good details on how they partition their database, testing, continuous integration, feature switches, caching, delayed signals, and more. Things I learnt tracking a billion events in 24 hours : Know your host, Scaling isn't just servers, My servers need to talk to me more, Kill switches for users, What you don't know is the problem, Don't mix server roles, Know your most important users outside of your site. Tweets of Gold: georgebarnett : I read High Scalability for useful articles about large scaling. Sadly though, nothing useful ever shows up. #NoLongerBothering northscale : wow that is fast! :) RT @cgoldberg: was just running > 100k ops/sec against my 2-node #Membase cluster... zazooom #nosql turbofunctor : The root of many (horizontal) scalability problems is an application level access to a writab
4 0.16380179 1321 high scalability-2012-09-12-Using Varnish for Paywalls: Moving Logic to the Edge
Introduction: This is a guest post from Per Buer , founder and CEO of Varnish Software , provider of Varnish Cache, an open source web application accelerator freely available at varnish-cache.org . Varnish powers a lot of really big websites worldwide. We at Varnish Software are all about speed. Varnish Cache is built for speed. It executes its policy code more or less a thousand times faster than your typical Java or PHP based application servers, mostly due to the fact that the configuration is compiled into system call free machine code. System calls require expensive context switches, stall the CPU and wreck havoc in the CPU cache so avoiding them makes the code fly. There are strong limitations on what kind of logic you can move into Varnish Cache, but the logic that you do move there will run very fast. An example is using Varnish for access control to serve access controlled content from the caching edge layer. The Varnish Paywall Who gets to access your content? In a tradi
5 0.15241177 703 high scalability-2009-09-12-How Google Taught Me to Cache and Cash-In
Introduction: A user named Apathy on how Reddit scales some of their features, shares some advice he learned while working at Google and other major companies. To be fair, I [Apathy] was working at Google at the time, and every job I held between 1995 and 2005 involved at least one of the largest websites on the planet. I didn't come up with any of these ideas, just watched other smart people I worked with who knew what they were doing and found (or wrote) tools that did the same things. But the theme is always the same: Cache everything you can and store the rest in some sort of database (not necessarily relational and not necessarily centralized). Cache everything that doesn't change rapidly. Most of the time you don't have to hit the database for anything other than checking whether the users' new message count has transitioned from 0 to (1 or more). Cache everything--templates, user message status, the front page components--and hit the database once a minute or so to update the fr
6 0.14424789 996 high scalability-2011-02-28-A Practical Guide to Varnish - Why Varnish Matters
7 0.14212985 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
9 0.12086374 821 high scalability-2010-05-03-MocoSpace Architecture - 3 Billion Mobile Page Views a Month
11 0.11284345 638 high scalability-2009-06-26-PlentyOfFish Architecture
12 0.11255335 1361 high scalability-2012-11-22-Gone Fishin': PlentyOfFish Architecture
13 0.10828049 673 high scalability-2009-08-07-Strategy: Break Up the Memcache Dog Pile
14 0.10685629 511 high scalability-2009-02-12-MySpace Architecture
15 0.10510608 721 high scalability-2009-10-13-Why are Facebook, Digg, and Twitter so hard to scale?
16 0.10419327 274 high scalability-2008-03-12-YouTube Architecture
17 0.10368844 495 high scalability-2009-01-17-Intro to Caching,Caching algorithms and caching frameworks part 1
18 0.10080893 359 high scalability-2008-07-29-Ehcache - A Java Distributed Cache
19 0.099451721 808 high scalability-2010-04-12-Poppen.de Architecture
20 0.09344919 602 high scalability-2009-05-17-Scaling Django Web Apps by Mike Malone
topicId topicWeight
[(0, 0.152), (1, 0.067), (2, -0.027), (3, -0.155), (4, 0.032), (5, -0.008), (6, -0.04), (7, -0.002), (8, -0.013), (9, 0.036), (10, -0.02), (11, -0.04), (12, 0.018), (13, 0.073), (14, -0.072), (15, -0.021), (16, -0.04), (17, -0.047), (18, 0.026), (19, -0.052), (20, -0.044), (21, 0.037), (22, 0.039), (23, 0.005), (24, -0.057), (25, -0.003), (26, -0.043), (27, 0.091), (28, -0.017), (29, -0.02), (30, -0.042), (31, 0.044), (32, -0.054), (33, -0.029), (34, -0.005), (35, 0.045), (36, -0.034), (37, 0.023), (38, 0.053), (39, 0.036), (40, -0.02), (41, -0.01), (42, -0.074), (43, 0.046), (44, 0.03), (45, -0.033), (46, -0.018), (47, -0.004), (48, -0.022), (49, 0.007)]
simIndex simValue blogId blogTitle
same-blog 1 0.97344947 908 high scalability-2010-09-28-6 Strategies for Scaling BBC iPlayer
Introduction: The BBC's iPlayer site averages 8 million page views a day for 1.3 million users. Technical Architect Simon Frost describes how they scaled their site in Scaling the BBC iPlayer to handle demand : Use frameworks . Frameworks support component based development which makes it convenient for team development, but can introduce delays that have to be minimized. Zend/PHP is used because it supports components and is easy to recruit for. MySQL is used for program metadata. CouchDB is used for key-value access for fast read/write of user-focused data. Prove architecture before building it . Eliminate guesswork by coming up with alternate architectures and create prototypes to determine which option works best. Balance performance with factors like ease of development. Cache a lot . Data is cached in memcached for a few seconds to minutes. Short cache invalidation periods keep the data up to date for the users, but even these short periods make a huge difference in performance.
2 0.87064016 703 high scalability-2009-09-12-How Google Taught Me to Cache and Cash-In
Introduction: A user named Apathy on how Reddit scales some of their features, shares some advice he learned while working at Google and other major companies. To be fair, I [Apathy] was working at Google at the time, and every job I held between 1995 and 2005 involved at least one of the largest websites on the planet. I didn't come up with any of these ideas, just watched other smart people I worked with who knew what they were doing and found (or wrote) tools that did the same things. But the theme is always the same: Cache everything you can and store the rest in some sort of database (not necessarily relational and not necessarily centralized). Cache everything that doesn't change rapidly. Most of the time you don't have to hit the database for anything other than checking whether the users' new message count has transitioned from 0 to (1 or more). Cache everything--templates, user message status, the front page components--and hit the database once a minute or so to update the fr
3 0.79722315 52 high scalability-2007-08-01-Product: Memcached
Introduction: Memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. Danga Interactive developed memcached to enhance the speed of LiveJournal.com, a site which was already doing 20 million+ dynamic page views per day for 1 million users with a bunch of webservers and a bunch of database servers. memcached dropped the database load to almost nothing, yielding faster page load times for users, better resource utilization, and faster access to the databases on a memcache miss.
4 0.76375026 436 high scalability-2008-11-02-Strategy: How to Manage Sessions Using Memcached
Introduction: Dormando shows an enlightened middle way for storing sessions in cache and the database. Sessions are a perfect cache candidate because they are transient, smallish, and since they are usually accessed on every page access removing all that load from the database is a good thing. But as Dormando points out session caches have problems. If you remove expiration times from the cache and you run out of memory then no more logins. If a cache server fails or needs to be upgrade then you just logged out a bunch of potentially angry users. The middle ground Dormando proposes is using both the cache and the database: Reads : read from the cache first, then the database. Typical cache logic. Writes : write to memcached every time, write to the database every N seconds (assuming the data has changed). There's a small chance of data loss, but you've still greatly reduced the database load while providing reliability. Nice solution.
5 0.76048678 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
Introduction: The primero recommendation for speeding up a website is almost always to add cache and more cache. And after that add a little more cache just in case. Memcached is almost always given as the recommended cache to use. What we don't often hear is how to effectively use a cache in our own products. MySQL hosted two excellent webinars (referenced below) on the subject of how to deploy and use memcached. The star of the show, other than MySQL of course, is Farhan Mashraqi of Fotolog. You may recall we did an earlier article on Fotolog in Secrets to Fotolog's Scaling Success , which was one of my personal favorites. Fotolog, as they themselves point out, is probably the largest site nobody has ever heard of, pulling in more page views than even Flickr. Fotolog has 51 instances of memcached on 21 servers with 175G in use and 254G available. As a large successful photo-blogging site they have very demanding performance and scaling requirements. To meet those requirements they've developed a
7 0.75258881 673 high scalability-2009-08-07-Strategy: Break Up the Memcache Dog Pile
8 0.73880696 495 high scalability-2009-01-17-Intro to Caching,Caching algorithms and caching frameworks part 1
9 0.72659725 571 high scalability-2009-04-15-Using HTTP cache headers effectively
10 0.72506738 1321 high scalability-2012-09-12-Using Varnish for Paywalls: Moving Logic to the Edge
11 0.71561819 996 high scalability-2011-02-28-A Practical Guide to Varnish - Why Varnish Matters
12 0.71388739 911 high scalability-2010-09-30-More Troubles with Caching
13 0.71285701 1346 high scalability-2012-10-24-Saving Cash Using Less Cache - 90% Savings in the Caching Tier
14 0.71104282 662 high scalability-2009-07-27-Handle 700 Percent More Requests Using Squid and APC Cache
15 0.70719129 359 high scalability-2008-07-29-Ehcache - A Java Distributed Cache
16 0.70492214 998 high scalability-2011-03-03-Stack Overflow Architecture Update - Now at 95 Million Page Views a Month
17 0.70292777 836 high scalability-2010-06-04-Strategy: Cache Larger Chunks - Cache Hit Rate is a Bad Indicator
18 0.69716907 1220 high scalability-2012-04-02-YouPorn - Targeting 200 Million Views a Day and Beyond
19 0.69569588 136 high scalability-2007-10-28-Scaling Early Stage Startups
20 0.68995214 800 high scalability-2010-03-26-Strategy: Caching 404s Saved the Onion 66% on Server Time
topicId topicWeight
[(1, 0.148), (2, 0.236), (11, 0.246), (30, 0.056), (61, 0.07), (79, 0.023), (85, 0.019), (94, 0.101)]
simIndex simValue blogId blogTitle
1 0.92451662 25 high scalability-2007-07-25-Paper: Designing Disaster Tolerant High Availability Clusters
Introduction: A very detailed (339 pages) paper on how to use HP products to create a highly available cluster. It's somewhat dated and obviously concentrates on HP products, but it is still good information. Table of contents: 1. Disaster Tolerance and Recovery in a Serviceguard Cluster 2. Building an Extended Distance Cluster Using ServiceGuard 3. Designing a Metropolitan Cluster 4. Designing a Continental Cluster 5. Building Disaster-Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP 6. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF 7. Cascading Failover in a Continental Cluster Evaluating the Need for Disaster Tolerance What is a Disaster Tolerant Architecture? Types of Disaster Tolerant Clusters Extended Distance Clusters Metropolitan Cluster Continental Cluster Continental Cluster With Cascading Failover Disaster Tolerant Architecture Guidelines Protecting Nodes through Geographic Dispersion Protecting Data th
2 0.89325154 668 high scalability-2009-08-01-15 Scalability and Performance Best Practices
Introduction: These are from Laura Thomson of OmniTi : Profile early, profile often. Pick a profiling tool and learn it in and out. Dev-ops cooperation is essential. The most critical difference in organizations that handles crises well. Test on production data. Code behavior (especially performance) is often data driven. Track and trend. Understanding your historical performance characteristics is essential for spotting emerging problems. Assumptions will burn you. Systems are complex and often break in unexpected ways. Decouple. Isolate performance failures. Cache. Caching is the core of most optimizations. Federate. Data federation is taking a single data set and spreading it across multiple database/application servers. Replicate. Replication is making synchronized copies of data available in more than one place. Avoid straining hard-to-scale resources. Some resources are inherently hard to scale: Uncacheable’ data, Data with a very high read+write rate
3 0.88292032 134 high scalability-2007-10-26-Paper: Wikipedia's Site Internals, Configuration, Code Examples and Management Issues
Introduction: Wikipedia and Wikimedia have some of the best, most complete real-world documentation on how to build highly scalable systems. This paper by Domas Mituzas covers a lot of details about how Wikipedia works, including: an overview of the different packages used (Linux, PowerDNS, LVS, Squid, lighttpd, Apache, PHP5, Lucene, Mono, Memcached), how they use their CDN, how caching works, how they profile their code, how they store their media, how they structure their database access, how they handle search, how they handle load balancing and administration. All with real code examples and examples of configuration files. This is a really useful resource. Related Articles Wikimedia Architecture Domas Mituzas' Blog
same-blog 4 0.8594346 908 high scalability-2010-09-28-6 Strategies for Scaling BBC iPlayer
Introduction: The BBC's iPlayer site averages 8 million page views a day for 1.3 million users. Technical Architect Simon Frost describes how they scaled their site in Scaling the BBC iPlayer to handle demand : Use frameworks . Frameworks support component based development which makes it convenient for team development, but can introduce delays that have to be minimized. Zend/PHP is used because it supports components and is easy to recruit for. MySQL is used for program metadata. CouchDB is used for key-value access for fast read/write of user-focused data. Prove architecture before building it . Eliminate guesswork by coming up with alternate architectures and create prototypes to determine which option works best. Balance performance with factors like ease of development. Cache a lot . Data is cached in memcached for a few seconds to minutes. Short cache invalidation periods keep the data up to date for the users, but even these short periods make a huge difference in performance.
Introduction: Snooze is an open-source, scalable, autonomic, and energy-efficient virtual machine (VM) management framework for private clouds. Similarly to other VM management frameworks such as Nimbus, OpenNebula, Eucalyptus, and OpenStack it allows to build compute infrastructures from virtualized resources. Particularly, once installed and configured users can submit and control the life-cycle of a large number of VMs. However, contrary to existing frameworks for scalability and fault tolerance, Snooze employs a self-organizing and healing (based on Apache ZooKeeper) hierarchical architecture. Moreover, it performs distributed VM management and is designed to be energy efficient. Therefore, it implements features to monitor and estimate VM resource (CPU, memory, network Rx, network Tx) demands, detect and resolve overload/underload situations, perform dynamic VM consolidation through live migration, and finally power management to save energy. Last but not least, it integrates a g
6 0.8151378 771 high scalability-2010-02-04-Hot Scalability Links for February 4, 2010
7 0.80688626 1055 high scalability-2011-06-08-Stuff to Watch from Google IO 2011
8 0.80523568 136 high scalability-2007-10-28-Scaling Early Stage Startups
9 0.80057859 1236 high scalability-2012-04-30-Masstree - Much Faster than MongoDB, VoltDB, Redis, and Competitive with Memcached
10 0.7997461 105 high scalability-2007-10-01-Statistics Logging Scalability
11 0.79406035 457 high scalability-2008-12-01-Sun FireTM X4540 Server as Backup Server for Zmanda's Amanda Enterprise 2.6 Software
12 0.78400648 5 high scalability-2007-07-10-mixi.jp Architecture
13 0.77945864 699 high scalability-2009-09-10-How to handle so many socket connection
14 0.76814014 942 high scalability-2010-11-15-Strategy: Biggest Performance Impact is to Reduce the Number of HTTP Requests
15 0.76786751 303 high scalability-2008-04-18-Scaling Mania at MySQL Conference 2008
16 0.75221044 1220 high scalability-2012-04-02-YouPorn - Targeting 200 Million Views a Day and Beyond
17 0.75142372 356 high scalability-2008-07-22-Scaling Bumper Sticker: A 1 Billion Page Per Month Facebook RoR App
18 0.74978632 72 high scalability-2007-08-22-Wikimedia architecture
19 0.74935079 1123 high scalability-2011-09-23-The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month…
20 0.74899393 256 high scalability-2008-02-21-Tracking usage of public resources - throttling accesses per hour