high_scalability high_scalability-2007 high_scalability-2007-3 knowledge-graph by maker-knowledge-mining

3 high scalability-2007-07-09-LiveJournal Architecture

meta infos for this blog

Source: html

Introduction: A fascinating and detailed story of how LiveJournal evolved their system to scale. LiveJournal was an early player in the free blog service race and faced issues from quickly adding a large number of users. Blog posts come fast and furious which causes a lot of writes and writes are particularly hard to scale. Understanding how LiveJournal faced their scaling problems will help any aspiring website builder. Site: http://www.livejournal.com/ Information Sources LiveJournal - Behind The Scenes Scaling Storytime Google Video Tokyo Video 2005 version Platform Linux MySql Perl Memcached MogileFS Apache What's Inside? Scaling from 1, 2, and 4 hosts to cluster of servers. Avoid single points of failure. Using MySQL replication only takes you so far. Becoming IO bound kills scaling. Spread out writes and reads for more parallelism. You can't keep adding read slaves and scale. Shard storage approach, using DRBD, for maxim

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A fascinating and detailed story of how LiveJournal evolved their system to scale. [sent-1, score-0.157]

2 LiveJournal was an early player in the free blog service race and faced issues from quickly adding a large number of users. [sent-2, score-0.684]

3 Blog posts come fast and furious which causes a lot of writes and writes are particularly hard to scale. [sent-3, score-0.648]

4 Understanding how LiveJournal faced their scaling problems will help any aspiring website builder. [sent-4, score-0.423]

5 Scaling from 1, 2, and 4 hosts to cluster of servers. [sent-8, score-0.081]

6 Spread out writes and reads for more parallelism. [sent-12, score-0.143]

7 Shard storage approach, using DRBD, for maximal throughput. [sent-14, score-0.131]

8 MogileFS, a distributed file system, for parallelism. [sent-19, score-0.212]

9 TheSchwartz and Gearman for distributed job queuing to do more work in parallel. [sent-20, score-0.259]

10 Lessons Learned Don't be afraid to write your own software to solve your own problems. [sent-22, score-0.076]

11 LiveJournal as provided incredible value to the community through their efforts. [sent-23, score-0.085]

12 Sites can evolve from small 1, 2 machine setups to larger systems as they learn about their users and what their system really needs to do. [sent-24, score-0.21]

13 Remove choke points by caching, load balancing, sharding, clustering file systems, and making use of more disk spindles. [sent-26, score-0.502]

14 You can't just keep adding more and more read slaves and expect to scale. [sent-28, score-0.313]

15 Low level issues like which OS event notification mechanism to use, file system and disk interactions, threading and even models, and connection types, matter at scale. [sent-29, score-0.732]

16 Large sites eventually turn to a distributed queuing and scheduling mechanism to distribute large work loads across a grid. [sent-30, score-0.549]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('livejournal', 0.478), ('faced', 0.18), ('slaves', 0.167), ('queuing', 0.163), ('aspiring', 0.157), ('storytimegoogle', 0.157), ('versionplatformlinuxmysqlperlmemcachedmogilefsapachewhat', 0.157), ('videotokyo', 0.157), ('furious', 0.148), ('learneddo', 0.148), ('adding', 0.146), ('writes', 0.143), ('mechanism', 0.141), ('maximal', 0.131), ('connection', 0.128), ('setups', 0.125), ('drbd', 0.122), ('choke', 0.119), ('file', 0.116), ('gearman', 0.112), ('scenes', 0.11), ('points', 0.107), ('allocate', 0.102), ('blog', 0.102), ('kills', 0.098), ('distributed', 0.096), ('interactions', 0.095), ('notification', 0.095), ('race', 0.092), ('threading', 0.092), ('evolved', 0.088), ('hashing', 0.087), ('scaling', 0.086), ('incredible', 0.085), ('evolve', 0.085), ('player', 0.085), ('hosts', 0.081), ('disk', 0.081), ('bound', 0.08), ('issues', 0.079), ('clustering', 0.079), ('shards', 0.078), ('afraid', 0.076), ('scheduling', 0.076), ('causes', 0.075), ('distribute', 0.073), ('persistent', 0.072), ('particularly', 0.071), ('fascinating', 0.069), ('posts', 0.068)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 3 high scalability-2007-07-09-LiveJournal Architecture

2 0.89987427 1352 high scalability-2012-10-31-Gone Fishin': LiveJournal Architecture

Introduction: This was the first architecture profile on HighScalability. IMHO LiveJournal was really the start of the openness on how to build stuff at scale, setting the whole industry off with an excellent role model. They wrote about their architecture, they open sourced their tools, they showed that success wasn't based on keeping secrets, and they set forth principles still followed by our rather amazing industry. No other industry is so open and cooperative, with their eyes cast so far forward, intent on building cool stuff. When all around seems dark it would be good to keep this little bit of light in mind... A fascinating and detailed story of how LiveJournal evolved their system to scale. LiveJournal was an early player in the free blog service race and faced issues from quickly adding a large number of users. Blog posts come fast and furious which causes a lot of writes and writes are particularly hard to scale. Understanding how LiveJournal faced their scaling problems will help any

3 0.16002063 192 high scalability-2007-12-25-IBMer Says LAMP Can't Scale

Introduction: A very entertaining and somewhat educational article on IBM Poopheads say LAMP Users Need to "grow up" . The physical three tier architecture turns out to be the root of all evil and shared nothing architectures brings simplicity and light. In the comments Simon Willison makes an insightful comment on why fine grained caching works for personalized pages and proxy's don't: Great post, but I have to disagree with you on the finely grained caching part. If you look at big LAMP deployments such as Flickr, LiveJournal and Facebook the common technology component that enables them to scale is memcached - a tool for finely grained caching. That's not to say that they aren't doing shared-nothing, it's just that memcached is critical for helping the database layer scale. LiveJournal serves around 50% of its page views "permission controlled" (friends only) so an HTTP proxy on the front end isn't the right solution - but memcached reduces their database hits by 90%.

4 0.11884084 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?

Introduction: For everything given something seems to be taken. Caching is a great scalability solution, but caching also comes with problems . Sharding is a great scalability solution, but as Foursquare recently revealed in a post-mortem about their 17 hours of downtime, sharding also has problems. MongoDB, the database Foursquare uses, also contributed their post-mortem of what went wrong too. Now that everyone has shared and resharded, what can we learn to help us skip these mistakes and quickly move on to a different set of mistakes? First, like for Facebook , huge props to Foursquare and MongoDB for being upfront and honest about their problems. This helps everyone get better and is a sign we work in a pretty cool industry. Second, overall, the fault didn't flow from evil hearts or gross negligence. As usual the cause was more mundane: a key system, that could be a little more robust, combined with a very popular application built by a small group of people, under immense pressure

5 0.11456065 5 high scalability-2007-07-10-mixi.jp Architecture

Introduction: Mixi is a fast growing social networking site in Japan. They provide services like: diary, community, message, review, and photo album. Having a lot in common with LiveJournal they also developed many of the same approaches. Their write up on how they scaled their system is easily one of the best out there. Site: http://mixi.jp Information Sources mixi.jp - scaling out with open source Platform Linux Apache MySQL Perl Memcached Squid Shard What's Inside? They grew to approximately 4 million users in two years and add over 15,000 new users/day. Ranks 35th on Alexa and 3rd in Japan. More than 100 MySQL servers Add more than 10 servers/month Use non-persistent connections. Diary traffic is 85% read and 15% write. Message traffic is is 75% read and 25% write. Ran into replication performance problems so they had to split the database. Considered splitting vertically by user or splitting horizontally by table type. The ende

6 0.11098199 178 high scalability-2007-12-10-1 Master, N Slaves

7 0.11035916 491 high scalability-2009-01-13-Product: Gearman - Open Source Message Queuing System

8 0.10595024 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years

9 0.10553543 274 high scalability-2008-03-12-YouTube Architecture

10 0.1037671 672 high scalability-2009-08-06-An Unorthodox Approach to Database Design : The Coming of the Shard

11 0.097985357 406 high scalability-2008-10-08-Strategy: Flickr - Do the Essential Work Up-front and Queue the Rest

12 0.095072702 196 high scalability-2007-12-30-MySQL clustering strategies and comparisions

13 0.094453268 554 high scalability-2009-04-04-Digg Architecture

14 0.094008774 769 high scalability-2010-02-02-Scale out your identity management

15 0.092299901 927 high scalability-2010-10-26-Marrying memcached and NoSQL

16 0.089132451 829 high scalability-2010-05-20-Strategy: Scale Writes to 734 Million Records Per Day Using Time Partitioning

17 0.088413246 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?

18 0.088131741 23 high scalability-2007-07-24-Major Websites Down: Or Why You Want to Run in Two or More Data Centers.

19 0.08791355 1646 high scalability-2014-05-12-4 Architecture Issues When Scaling Web Applications: Bottlenecks, Database, CPU, IO

20 0.086000301 1421 high scalability-2013-03-11-Low Level Scalability Solutions - The Conditioning Collection

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.161), (1, 0.071), (2, -0.05), (3, -0.061), (4, 0.01), (5, 0.065), (6, 0.003), (7, -0.046), (8, -0.036), (9, 0.008), (10, -0.006), (11, 0.012), (12, 0.021), (13, 0.029), (14, 0.033), (15, 0.009), (16, -0.0), (17, -0.005), (18, -0.01), (19, 0.076), (20, 0.037), (21, 0.065), (22, -0.078), (23, 0.014), (24, -0.077), (25, 0.029), (26, 0.15), (27, 0.103), (28, -0.083), (29, 0.009), (30, 0.112), (31, -0.094), (32, 0.13), (33, 0.005), (34, 0.028), (35, -0.106), (36, -0.007), (37, -0.093), (38, 0.069), (39, -0.19), (40, 0.001), (41, -0.016), (42, 0.051), (43, -0.026), (44, 0.051), (45, -0.056), (46, 0.024), (47, -0.07), (48, 0.055), (49, 0.02)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96531695 3 high scalability-2007-07-09-LiveJournal Architecture

2 0.95067388 1352 high scalability-2012-10-31-Gone Fishin': LiveJournal Architecture

3 0.58722705 243 high scalability-2008-02-07-clusteradmin.blogspot.com - blog about building and administering clusters

Introduction: A blog about cluster administration. Written by a System Administrator working at HPC (High Performance Computing) data-center, mostly dealing with PC clusters (100s of servers), SMP machines and distributed installations. The blog concentrates on software/configuration/installation management systems, load balancers, monitoring and other cluster-related solutions.

4 0.57325828 257 high scalability-2008-02-22-Kevin's Great Adventures in SSDland

Introduction: Update: Final Thoughts on SSD and MySQL AKA Battleship Spinn3r . Tips on how to make your database 10x faster using solid state drives. Potential exists for 100x speedup. Solid-state drives (SSDs) are the holy grail of storage. The promise of RAM speeds and hard disk like persistence have for years driven us crazy with power user lust, but they've stayed tantalizingly just out of reach. Always too expensive, too small, and oddly too slow. Has that changed? Can you now miraculously have your cake and eat it too? Can you now have it both ways? Is balancing work with family life now as easy as tripping over a terabyte drive? In a pioneering series of blog articles Kevin Burton conducts original research on next generation SSD drives in real world configurations. For an experience report on his great adventure you can turn to: Could SSD Mean a Rise in MyISAM Usage? , Serverbeach, MySQL and Mtron SSDs , Prediction: SSD Blades in 2008 , Zeus IOPS - Another High

5 0.57183206 566 high scalability-2009-04-13-High Performance Web Pages – Real World Examples: Netflix Case Study

Introduction: This read will provide you with information about how Netflix deals with high load on their movie rental website. It was written by Bill Scott in the fall of 2008. Read or download the PDF file here

6 0.54432237 769 high scalability-2010-02-02-Scale out your identity management

7 0.54414672 710 high scalability-2009-09-20-PaxosLease: Diskless Paxos for Leases

8 0.54081023 898 high scalability-2010-09-09-6 Scalability Lessons

9 0.53585601 528 high scalability-2009-03-06-Product: Lightcloud - Key-Value Database

10 0.53158474 157 high scalability-2007-11-16-Product: lbpool - Load Balancing JDBC Pool

11 0.52540815 1325 high scalability-2012-09-19-The 4 Building Blocks of Architecting Systems for Scale

12 0.51583433 595 high scalability-2009-05-08-Publish-subscribe model does not scale?

13 0.50880742 88 high scalability-2007-09-10-Blog: Scalable Web Architectures by Royans Tharakan

14 0.50642353 345 high scalability-2008-06-11-Pyshards aspires to build sharding toolkit for Python

15 0.49537209 491 high scalability-2009-01-13-Product: Gearman - Open Source Message Queuing System

16 0.49522305 690 high scalability-2009-08-31-Scaling MySQL on Amazon Web Services

17 0.49183443 275 high scalability-2008-03-14-Problem: Mobbing the Least Used Resource Error

18 0.49015725 1611 high scalability-2014-03-12-Paper: Scalable Eventually Consistent Counters over Unreliable Networks

19 0.48571566 514 high scalability-2009-02-18-Numbers Everyone Should Know

20 0.47905287 215 high scalability-2008-01-16-Strategy: Asynchronous Queued Virus Scanning

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.163), (2, 0.25), (10, 0.046), (12, 0.288), (40, 0.019), (61, 0.014), (79, 0.064), (94, 0.063)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.90356696 278 high scalability-2008-03-16-Product: GlusterFS

Introduction: Adapted from their website: GlusterFS is a clustered file-system capable of scaling to several peta-bytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system. Storage bricks can be made of any commodity hardware such as x86-64 server with SATA-II RAID and Infiniband HBA). Cluster file systems are still not mature for enterprise market. They are too complex to deploy and maintain though they are extremely scalable and cheap. Can be entirely built out of commodity OS and hardware. GlusterFS hopes to solves this problem. GlusterFS achieved 35 GBps read throughput . The GlusterFS Aggregated I/O Benchmark was performed on 64 bricks clustered storage system over 10 Gbps Infiniband interconnect. A cluster of 220 clients pounded the storage system with multiple dd (disk-dump) instances, each reading / writing a 1 GB file with 1MB block size. GlusterFS was configured with unify translator and round-robin scheduler

2 0.88868654 82 high scalability-2007-09-06-Why doesn't anyone use j2ee?

Introduction: From a reader: > Was reading through your very interesting/useful site. >Most of the architectures are non j2ee-Does that mean that >there aren't enough websites that are scalable(with youtube > like userbase) built with j2ee tech-would like to know if there > are any and their architecture as >well. eBay uses Java, but in a very pragmatic way. They use servlets, an application server, the JDK, and they do the rest themselves. They skip JSP, entity beans, and JMS. When you need to scale putting all your eggs in one basket is a risky strategy. Why use JSP when you can do better? When use entity beans when you can do better? Use servlets because they are a very effective way of handling http requests. Use Java because it is fast, runs everywhere, and has a boat load of libraries you can use to build your build your custom system. Probably the major reason J2EE is absentee is simply LAMP. LAMP is just so incredibly functional for most 2-tier shared nothing site

same-blog 3 0.87333345 3 high scalability-2007-07-09-LiveJournal Architecture

4 0.84990859 1352 high scalability-2012-10-31-Gone Fishin': LiveJournal Architecture

5 0.82587492 285 high scalability-2008-03-19-Serving JavaScript Fast

Introduction: Cal Henderson writes at thinkvitamin.com : "With our so-called "Web 2.0' applications and their rich content and interaction, we expect our applications to increasingly make use of CSS and JavaScript. To make sure these applications are nice and snappy to use, we need to optimize the size and nature of content required to render the page, making sure weâ€™re delivering the optimum experience. In practice, this means a combination of making our content as small and fast to download as possible, while avoiding unnecessarily refetching unmodified resources." A lot of good comments too.

6 0.82490045 161 high scalability-2007-11-20-Product: SmartFrog a Distributed Configuration and Deployment Framework

7 0.78481674 886 high scalability-2010-08-24-21 Quality Screencasts on Scaling Rails

8 0.76553673 209 high scalability-2008-01-12-Gandi.net, french registrar launches in granular server resources.

9 0.72931498 77 high scalability-2007-08-30-Log Everything All the Time

10 0.72685146 1124 high scalability-2011-09-26-17 Techniques Used to Scale Turntable.fm and Labmeeting to Millions of Users

11 0.72342819 378 high scalability-2008-09-03-Some Facebook Secrets to Better Operations

12 0.72239971 1368 high scalability-2012-12-07-Stuff The Internet Says On Scalability For December 7, 2012

13 0.72232109 122 high scalability-2007-10-14-Product: The Spread Toolkit

14 0.72205418 1616 high scalability-2014-03-20-Paper: Log-structured Memory for DRAM-based Storage - High Memory Utilization Plus High Performance

15 0.72156757 729 high scalability-2009-10-28-And the winner is: MySQL or Memcached or Tokyo Tyrant?

16 0.72151995 1231 high scalability-2012-04-20-Stuff The Internet Says On Scalability For April 20, 2012

17 0.71995652 303 high scalability-2008-04-18-Scaling Mania at MySQL Conference 2008

18 0.71932387 1646 high scalability-2014-05-12-4 Architecture Issues When Scaling Web Applications: Bottlenecks, Database, CPU, IO

19 0.71795106 415 high scalability-2008-10-15-Need help with your Hadoop deployment? This company may help!

20 0.71787149 985 high scalability-2011-02-08-Mollom Architecture - Killing Over 373 Million Spams at 100 Requests Per Second