high_scalability high_scalability-2007 high_scalability-2007-5 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Mixi is a fast growing social networking site in Japan. They provide services like: diary, community, message, review, and photo album. Having a lot in common with LiveJournal they also developed many of the same approaches. Their write up on how they scaled their system is easily one of the best out there. Site: http://mixi.jp Information Sources mixi.jp - scaling out with open source Platform Linux Apache MySQL Perl Memcached Squid Shard What's Inside? They grew to approximately 4 million users in two years and add over 15,000 new users/day. Ranks 35th on Alexa and 3rd in Japan. More than 100 MySQL servers Add more than 10 servers/month Use non-persistent connections. Diary traffic is 85% read and 15% write. Message traffic is is 75% read and 25% write. Ran into replication performance problems so they had to split the database. Considered splitting vertically by user or splitting horizontally by table type. The ende
sentIndex sentText sentNum sentScore
1 Mixi is a fast growing social networking site in Japan. [sent-1, score-0.116]
2 They grew to approximately 4 million users in two years and add over 15,000 new users/day. [sent-8, score-0.395]
3 Considered splitting vertically by user or splitting horizontally by table type. [sent-14, score-0.665]
4 The ended up partitioning by table type and user. [sent-15, score-0.619]
5 So all the messages for a group of users would be assigned to a particular database. [sent-16, score-0.327]
6 Partitioning key is used to decide in which database data should be stored. [sent-17, score-0.151]
7 Stores more than 8 TB of images with about 23 GB added per day. [sent-19, score-0.334]
8 MySQL is only used to store metadata about the images, not the images themselves. [sent-20, score-0.419]
9 Images are either frequently accessed or rarely accessed. [sent-21, score-0.484]
10 Frequently accessed images are cached using Squid on multiple machines. [sent-22, score-0.557]
11 Rarely accessed images are served from the file system. [sent-23, score-0.557]
12 Lessons Learned When using dynamic partitioning it's difficult to pick keys and algorithms for where data should be stored. [sent-25, score-0.527]
13 Once you partition data you can no longer do joins and you have to open a lot of connections to different databases to merge the data back together. [sent-26, score-0.452]
14 It's hard to add new hosts and rearrange data when you partition. [sent-27, score-0.404]
15 For example, let's say your partitioning algorithm stores all the messages for users 1-N on host 1. [sent-28, score-0.703]
16 Now let's say host 1 becomes overburdened and you want to repartition users across more hosts. [sent-29, score-0.621]
17 By using distributed memory caching they rarely hit the DB and there average page load time is about . [sent-31, score-0.298]
18 You will often have to develop strategies based on the type of content. [sent-34, score-0.148]
19 For example, image will be treated differently than short text posts. [sent-35, score-0.268]
20 Social networking sites are very time oriented, so it might be useful to partition data by time as well as user and type. [sent-36, score-0.329]
wordName wordTfidf (topN-words)
[('images', 0.334), ('partitioning', 0.265), ('accessed', 0.223), ('rarely', 0.183), ('splitting', 0.177), ('diary', 0.166), ('overburdened', 0.166), ('rearrange', 0.156), ('type', 0.148), ('gb', 0.144), ('partition', 0.139), ('repartition', 0.138), ('alexa', 0.135), ('livejournal', 0.126), ('messages', 0.121), ('host', 0.119), ('networking', 0.116), ('caching', 0.115), ('vertically', 0.114), ('table', 0.114), ('difficult', 0.113), ('users', 0.112), ('squid', 0.108), ('treated', 0.105), ('approximately', 0.099), ('photo', 0.098), ('grew', 0.095), ('assigned', 0.094), ('profit', 0.094), ('ended', 0.092), ('review', 0.091), ('differently', 0.089), ('add', 0.089), ('merge', 0.088), ('say', 0.086), ('tb', 0.086), ('hosts', 0.085), ('metadata', 0.085), ('horizontally', 0.083), ('traffic', 0.081), ('scaled', 0.079), ('associated', 0.079), ('frequently', 0.078), ('joins', 0.077), ('decide', 0.077), ('oriented', 0.077), ('let', 0.076), ('pick', 0.075), ('data', 0.074), ('text', 0.074)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 5 high scalability-2007-07-10-mixi.jp Architecture
Introduction: Mixi is a fast growing social networking site in Japan. They provide services like: diary, community, message, review, and photo album. Having a lot in common with LiveJournal they also developed many of the same approaches. Their write up on how they scaled their system is easily one of the best out there. Site: http://mixi.jp Information Sources mixi.jp - scaling out with open source Platform Linux Apache MySQL Perl Memcached Squid Shard What's Inside? They grew to approximately 4 million users in two years and add over 15,000 new users/day. Ranks 35th on Alexa and 3rd in Japan. More than 100 MySQL servers Add more than 10 servers/month Use non-persistent connections. Diary traffic is 85% read and 15% write. Message traffic is is 75% read and 25% write. Ran into replication performance problems so they had to split the database. Considered splitting vertically by user or splitting horizontally by table type. The ende
2 0.20525019 319 high scalability-2008-05-14-Scaling an image upload service
Introduction: Hi, First of all I want to to say that this is an extremely interesting and informative website. i have enjoyed reading the various posts on how the big sites scale to meet the needs of their customers. The service we are developing is a webcam service. The client application sends images to the server via HTTP POST and they are saved in folder specified by the users id. When a new image is sent to the server it will overwrite the current image. Users can then view the images via our web server. Ideally we want the images to upload as quickly as possible and allow users to view them as quickly as possible. Would I be correct to assume that when the number of uploading clients exceeds the capability of the server the only way to scale is to add more hardware. Also I assume that to use HTTP accelerator caches will not speed up viewing the images as the new images will invalidate the cache. I appreciate any input on the subject.
3 0.18157046 72 high scalability-2007-08-22-Wikimedia architecture
Introduction: Wikimedia is the platform on which Wikipedia, Wiktionary, and the other seven wiki dwarfs are built on. This document is just excellent for the student trying to scale the heights of giant websites. It is full of details and innovative ideas that have been proven on some of the most used websites on the internet. Site: http://wikimedia.org/ Information Sources Wikimedia architecture http://meta.wikimedia.org/wiki/Wikimedia_servers scale-out vs scale-up in the from Oracle to MySQL blog. Platform Apache Linux MySQL PHP Squid LVS Lucene for Search Memcached for Distributed Object Cache Lighttpd Image Server The Stats 8 million articles spread over hundreds of language projects (english, dutch, ...) 10th busiest site in the world (source: Alexa) Exponential growth: doubling every 4-6 months in terms of visitors / traffic / servers 30 000 HTTP requests/s during peak-time 3 Gbit/s of data traffic 3 data centers: Tampa, A
4 0.18123859 721 high scalability-2009-10-13-Why are Facebook, Digg, and Twitter so hard to scale?
Introduction: Real-time social graphs (connectivity between people, places, and things). That's why scaling Facebook is hard says Jeff Rothschild , Vice President of Technology at Facebook. Social networking sites like Facebook, Digg, and Twitter are simply harder than traditional websites to scale. Why is that? Why would social networking sites be any more difficult to scale than traditional web sites? Let's find out. Traditional websites are easier to scale than social networking sites for two reasons: They usually access only their own data and common cached data. Only 1-2% of users are active on the site at one time. Imagine a huge site like Yahoo. When you come to Yahoo they can get your profile record with one get and that's enough to build your view of the website for you. It's relatively straightforward to scale systems based around single records using distributed hashing schemes . And since only a few percent of the people are on the site at once it takes comparatively little
5 0.17481293 106 high scalability-2007-10-02-Secrets to Fotolog's Scaling Success
Introduction: Fotolog, a social blogging site centered around photos, grew from about 300 thousand users in 2004 to over 11 million users in 2007. Though they initially experienced the inevitable pains of rapid growth, they overcame their problems and now manage over 300 million photos and 800,000 new photos are added each day. Generating all that fabulous content are 20 million unique monthly visitors and a volunteer army of 30,000 new users each day. They did so well a very impressed suitor bought them out for a cool $90 million. That's scale meets success by anyone standards. How did they do it? Site: http://www.fotolog.com Information Sources Scaling the World's Largest Photo Blogging Community Congrats to Fotolog on $90mm sale to Hi-Media Fotolog overtaking Flickr? Fotolog Hits 11 Million Members and 300 Million Photos Posted Site of the Week: Fotolog.com by PC Magazine CEO John Borthwick's Blog . DBA Frank Mash's Blog Fotolog, lessons learnt by John B
6 0.163634 18 high scalability-2007-07-16-Paper: MySQL Scale-Out by application partitioning
7 0.15751454 511 high scalability-2009-02-12-MySpace Architecture
8 0.15079604 1268 high scalability-2012-06-20-Ask HighScalability: How do I organize millions of images?
10 0.14885084 808 high scalability-2010-04-12-Poppen.de Architecture
11 0.14347808 313 high scalability-2008-05-02-Friends for Sale Architecture - A 300 Million Page View-Month Facebook RoR App
12 0.14214481 291 high scalability-2008-03-29-20 New Rules for Faster Web Pages
13 0.14087579 1361 high scalability-2012-11-22-Gone Fishin': PlentyOfFish Architecture
14 0.14041325 638 high scalability-2009-06-26-PlentyOfFish Architecture
15 0.13273953 152 high scalability-2007-11-13-Flickr Architecture
16 0.13173673 297 high scalability-2008-04-05-Skype Plans for PostgreSQL to Scale to 1 Billion Users
17 0.13124685 274 high scalability-2008-03-12-YouTube Architecture
18 0.1293674 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
19 0.12864642 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
20 0.12693086 261 high scalability-2008-02-25-Make Your Site Run 10 Times Faster
topicId topicWeight
[(0, 0.202), (1, 0.134), (2, -0.051), (3, -0.167), (4, 0.036), (5, 0.046), (6, -0.017), (7, -0.057), (8, -0.0), (9, 0.034), (10, 0.015), (11, 0.016), (12, -0.007), (13, 0.05), (14, 0.008), (15, 0.047), (16, -0.055), (17, 0.004), (18, 0.046), (19, 0.036), (20, -0.024), (21, 0.08), (22, -0.029), (23, -0.008), (24, -0.015), (25, -0.02), (26, 0.007), (27, 0.01), (28, 0.016), (29, -0.02), (30, 0.071), (31, -0.044), (32, -0.033), (33, 0.015), (34, 0.006), (35, -0.045), (36, 0.007), (37, 0.045), (38, 0.066), (39, -0.015), (40, 0.004), (41, -0.009), (42, 0.025), (43, -0.009), (44, -0.057), (45, 0.016), (46, 0.019), (47, -0.01), (48, -0.076), (49, 0.076)]
simIndex simValue blogId blogTitle
same-blog 1 0.95907038 5 high scalability-2007-07-10-mixi.jp Architecture
Introduction: Mixi is a fast growing social networking site in Japan. They provide services like: diary, community, message, review, and photo album. Having a lot in common with LiveJournal they also developed many of the same approaches. Their write up on how they scaled their system is easily one of the best out there. Site: http://mixi.jp Information Sources mixi.jp - scaling out with open source Platform Linux Apache MySQL Perl Memcached Squid Shard What's Inside? They grew to approximately 4 million users in two years and add over 15,000 new users/day. Ranks 35th on Alexa and 3rd in Japan. More than 100 MySQL servers Add more than 10 servers/month Use non-persistent connections. Diary traffic is 85% read and 15% write. Message traffic is is 75% read and 25% write. Ran into replication performance problems so they had to split the database. Considered splitting vertically by user or splitting horizontally by table type. The ende
2 0.77803528 391 high scalability-2008-09-23-The 7 Stages of Scaling Web Apps
Introduction: By John Engales CTO, Rackspace. Good presentation of the stages a typical successful website goes through: Stage 1 - The Beginning: Simple architecture, low complexity. no redundancy. Firewall, load balancer, a pair of web servers, database server, and internal storage. Stage 2 - More of the same, just bigger. Stage 3 - The Pain Begins: publicity hits. Use reverse proxy, cache static content, load balancers, more databases, re-coding. Stage 4 - The Pain Intensifies: caching with memcached, writes overload and replication takes too long, start database partitioning, shared storage makes sense for content, significant re-architecting for DB. Stage 5 - This Really Hurts!: rethink entire application, partition on geography user ID, etc, create user clusters, using hashing scheme for locating which user belongs to which cluster. Stage 6 - Getting a little less painful: scalable application and database architecture, acceptable performance, starting to add ne features again, op
3 0.77492523 106 high scalability-2007-10-02-Secrets to Fotolog's Scaling Success
Introduction: Fotolog, a social blogging site centered around photos, grew from about 300 thousand users in 2004 to over 11 million users in 2007. Though they initially experienced the inevitable pains of rapid growth, they overcame their problems and now manage over 300 million photos and 800,000 new photos are added each day. Generating all that fabulous content are 20 million unique monthly visitors and a volunteer army of 30,000 new users each day. They did so well a very impressed suitor bought them out for a cool $90 million. That's scale meets success by anyone standards. How did they do it? Site: http://www.fotolog.com Information Sources Scaling the World's Largest Photo Blogging Community Congrats to Fotolog on $90mm sale to Hi-Media Fotolog overtaking Flickr? Fotolog Hits 11 Million Members and 300 Million Photos Posted Site of the Week: Fotolog.com by PC Magazine CEO John Borthwick's Blog . DBA Frank Mash's Blog Fotolog, lessons learnt by John B
4 0.76939559 511 high scalability-2009-02-12-MySpace Architecture
Introduction: Update: Presentation: Behind the Scenes at MySpace.com . Dan Farino, Chief Systems Architect at MySpace shares details of some of MySpace's cool internal operations tools. MySpace.com is one of the fastest growing site on the Internet with 65 million subscribers and 260,000 new users registering each day. Often criticized for poor performance, MySpace has had to tackle scalability issues few other sites have faced. How did they do it? Site: http://myspace.com Information Sources Presentation: Behind the Scenes at MySpace.com Inside MySpace.com Platform ASP.NET 2.0 Windows IIS SQL Server What's Inside? 300 million users. Pushes 100 gigabits/second to the internet. 10Gb/sec is HTML content. 4,500+ web servers windows 2003/IIS 6.0/APS.NET. 1,200+ cache servers running 64-bit Windows 2003. 16GB of objects cached in RAM. 500+ database servers running 64-bit Windows and SQL Server 2005. MySpace processes 1.5 Billion page views per day and
5 0.76718915 297 high scalability-2008-04-05-Skype Plans for PostgreSQL to Scale to 1 Billion Users
Introduction: Skype uses PostgreSQL as their backend database . PostgreSQL doesn't get enough run in the database world so I was excited to see how PostgreSQL is used "as the main DB for most of [Skype's] business needs." Their approach is to use a traditional stored procedure interface for accessing data and on top of that layer proxy servers which hash SQL requests to a set of database servers that actually carry out queries. The result is a horizontally partitioned system that they think will scale to handle 1 billion users. Skype's goal is an architecture that can handle 1 billion plus users. This level of scale isn't practically solvable with one really big computer, so our masked superhero horizontal scaling comes to the rescue. Hardware is dual or quad Opterons with SCSI RAID. Followed common database progression: Start with one DB. Add new databases partitioned by functionality. Replicate read-mostly data for better read access. Then horizontally partition data across multiple nod
6 0.75469035 512 high scalability-2009-02-14-Scaling Digg and Other Web Applications
7 0.7532804 68 high scalability-2007-08-20-TypePad Architecture
8 0.75113595 72 high scalability-2007-08-22-Wikimedia architecture
9 0.7417497 152 high scalability-2007-11-13-Flickr Architecture
10 0.72215796 821 high scalability-2010-05-03-MocoSpace Architecture - 3 Billion Mobile Page Views a Month
11 0.72185057 136 high scalability-2007-10-28-Scaling Early Stage Startups
12 0.71766472 437 high scalability-2008-11-03-How Sites are Scaling Up for the Election Night Crush
13 0.71462762 554 high scalability-2009-04-04-Digg Architecture
14 0.71380246 261 high scalability-2008-02-25-Make Your Site Run 10 Times Faster
15 0.71140242 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
16 0.71133685 1205 high scalability-2012-03-07-Scale Indefinitely on S3 With These Secrets of the S3 Masters
17 0.71071595 481 high scalability-2009-01-02-Strategy: Understanding Your Data Leads to the Best Scalability Solutions
18 0.70867598 473 high scalability-2008-12-20-Second Life Architecture - The Grid
19 0.70385045 150 high scalability-2007-11-12-Slashdot Architecture - How the Old Man of the Internet Learned to Scale
20 0.70376807 602 high scalability-2009-05-17-Scaling Django Web Apps by Mike Malone
topicId topicWeight
[(1, 0.16), (2, 0.278), (10, 0.049), (11, 0.111), (30, 0.077), (47, 0.039), (61, 0.066), (77, 0.01), (79, 0.064), (94, 0.061)]
simIndex simValue blogId blogTitle
1 0.97078288 908 high scalability-2010-09-28-6 Strategies for Scaling BBC iPlayer
Introduction: The BBC's iPlayer site averages 8 million page views a day for 1.3 million users. Technical Architect Simon Frost describes how they scaled their site in Scaling the BBC iPlayer to handle demand : Use frameworks . Frameworks support component based development which makes it convenient for team development, but can introduce delays that have to be minimized. Zend/PHP is used because it supports components and is easy to recruit for. MySQL is used for program metadata. CouchDB is used for key-value access for fast read/write of user-focused data. Prove architecture before building it . Eliminate guesswork by coming up with alternate architectures and create prototypes to determine which option works best. Balance performance with factors like ease of development. Cache a lot . Data is cached in memcached for a few seconds to minutes. Short cache invalidation periods keep the data up to date for the users, but even these short periods make a huge difference in performance.
2 0.96329439 136 high scalability-2007-10-28-Scaling Early Stage Startups
Introduction: Mark Maunder of No VC Required --who advocates not taking VC money lest you be turned into a frog instead of the prince (or princess) you were dreaming of--has an excellent slide deck on how to scale an early stage startup. His blog also has some good SEO tips and a very spooky widget showing the geographical location of his readers. Perfect for Halloween! What is Mark's other worldly scaling strategies for startups? Site: http://novcrequired.com/ Information Sources Slides from Seattle Tech Startup Talk . Scaling Early Stage Startups blog post by Mark Maunder. The Platform Linxux An ISAM type data store. Perl Httperf is used for benchmarking. Websitepulse.com is used for perf monitoring. The Architecture Performance matters because being slow could cost you 20% of your revenue. The UIE guys disagree saying this ain't necessarily so. They explain their reasoning in Usability Tools Podcast: The Truth About Page Download Time . The idea i
same-blog 3 0.958103 5 high scalability-2007-07-10-mixi.jp Architecture
Introduction: Mixi is a fast growing social networking site in Japan. They provide services like: diary, community, message, review, and photo album. Having a lot in common with LiveJournal they also developed many of the same approaches. Their write up on how they scaled their system is easily one of the best out there. Site: http://mixi.jp Information Sources mixi.jp - scaling out with open source Platform Linux Apache MySQL Perl Memcached Squid Shard What's Inside? They grew to approximately 4 million users in two years and add over 15,000 new users/day. Ranks 35th on Alexa and 3rd in Japan. More than 100 MySQL servers Add more than 10 servers/month Use non-persistent connections. Diary traffic is 85% read and 15% write. Message traffic is is 75% read and 25% write. Ran into replication performance problems so they had to split the database. Considered splitting vertically by user or splitting horizontally by table type. The ende
Introduction: Snooze is an open-source, scalable, autonomic, and energy-efficient virtual machine (VM) management framework for private clouds. Similarly to other VM management frameworks such as Nimbus, OpenNebula, Eucalyptus, and OpenStack it allows to build compute infrastructures from virtualized resources. Particularly, once installed and configured users can submit and control the life-cycle of a large number of VMs. However, contrary to existing frameworks for scalability and fault tolerance, Snooze employs a self-organizing and healing (based on Apache ZooKeeper) hierarchical architecture. Moreover, it performs distributed VM management and is designed to be energy efficient. Therefore, it implements features to monitor and estimate VM resource (CPU, memory, network Rx, network Tx) demands, detect and resolve overload/underload situations, perform dynamic VM consolidation through live migration, and finally power management to save energy. Last but not least, it integrates a g
5 0.94617116 771 high scalability-2010-02-04-Hot Scalability Links for February 4, 2010
Introduction: Lots of cool stuff happening this week... Voldemort gets rebalancing. It's one thing to shard data to scale, it's a completely different level of functionality to manage those shards intelligently. Voldemort has stepped up by adding advanced rebalancing functionality: Dynamic addition of new nodes to the cluster; Deletion of nodes from cluster; Load balancing of data inside a cluster. Microsoft Finally Opens Azure for Business. Out of the blue Microsoft opens up their platform as a service service. Good to have more competition and we'll keep an eye out for experience reports. New details on LinkedIn architecture by Greg Linden. LinkedIn appears to only use caching minimally, preferring to spend their efforts and machine resources on making sure they can recompute computations quickly than on hiding poor performance behind caching layers . The end of SQL and relational databases? by David Intersimone . For new projects, I believe, we have genuine non-relational a
6 0.94271678 256 high scalability-2008-02-21-Tracking usage of public resources - throttling accesses per hour
7 0.9375301 105 high scalability-2007-10-01-Statistics Logging Scalability
9 0.9326781 1321 high scalability-2012-09-12-Using Varnish for Paywalls: Moving Logic to the Edge
10 0.93195158 1055 high scalability-2011-06-08-Stuff to Watch from Google IO 2011
11 0.93182248 942 high scalability-2010-11-15-Strategy: Biggest Performance Impact is to Reduce the Number of HTTP Requests
12 0.93151999 72 high scalability-2007-08-22-Wikimedia architecture
13 0.93060803 134 high scalability-2007-10-26-Paper: Wikipedia's Site Internals, Configuration, Code Examples and Management Issues
14 0.92903405 1076 high scalability-2011-07-08-Stuff The Internet Says On Scalability For July 8, 2011
15 0.92846328 25 high scalability-2007-07-25-Paper: Designing Disaster Tolerant High Availability Clusters
16 0.92676324 668 high scalability-2009-08-01-15 Scalability and Performance Best Practices
17 0.9252705 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
18 0.92443252 821 high scalability-2010-05-03-MocoSpace Architecture - 3 Billion Mobile Page Views a Month
19 0.92419678 1364 high scalability-2012-11-29-Performance data for LevelDB, Berkley DB and BangDB for Random Operations
20 0.92416102 77 high scalability-2007-08-30-Log Everything All the Time