high_scalability high_scalability-2009 high_scalability-2009-519 knowledge-graph by maker-knowledge-mining

519 high scalability-2009-02-23-Database Sharding at Netlog, with MySQL and PHP


meta infos for this blog

Source: html

Introduction: Jurriaan Persyn is a Lead Web Developer at Netlog, a social portal site that gets 50 million unique visitors and 5+ billion page views per month. In this paper Jurriaan goes into a lot of excellent nuts and bolts details about how they used sharding to scale their system. If you are pondering sharding as a solution to your scaling problems you'll want to read this paper. As the paper is quite well organized there's no reason to write a summary, but I especially liked this part from the conclusion: If you can do with simpler solutions (better hardware, more hardware, server tweaking and tuning, vertical partitioning, sql query optimization, ...) that require less development cost, why invest lots of effort in sharding? On the other hand, when your visitor statistics really start blowing through the roof, it is a good direction to go. After all, it worked for us.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Jurriaan Persyn is a Lead Web Developer at Netlog, a social portal site that gets 50 million unique visitors and 5+ billion page views per month. [sent-1, score-0.699]

2 In this paper Jurriaan goes into a lot of excellent nuts and bolts details about how they used sharding to scale their system. [sent-2, score-0.965]

3 If you are pondering sharding as a solution to your scaling problems you'll want to read this paper. [sent-3, score-0.435]

4 As the paper is quite well organized there's no reason to write a summary, but I especially liked this part from the conclusion: If you can do with simpler solutions (better hardware, more hardware, server tweaking and tuning, vertical partitioning, sql query optimization, . [sent-4, score-1.226]

5 ) that require less development cost, why invest lots of effort in sharding? [sent-7, score-0.423]

6 On the other hand, when your visitor statistics really start blowing through the roof, it is a good direction to go. [sent-8, score-0.623]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('jurriaan', 0.513), ('sharding', 0.27), ('netlog', 0.233), ('roof', 0.209), ('blowing', 0.177), ('bolts', 0.174), ('nuts', 0.171), ('pondering', 0.165), ('portal', 0.153), ('tweaking', 0.153), ('visitor', 0.151), ('paper', 0.144), ('organized', 0.139), ('invest', 0.133), ('liked', 0.127), ('direction', 0.126), ('vertical', 0.125), ('conclusion', 0.125), ('simpler', 0.117), ('visitors', 0.11), ('statistics', 0.11), ('tuning', 0.107), ('hand', 0.104), ('optimization', 0.102), ('hardware', 0.101), ('effort', 0.094), ('summary', 0.093), ('partitioning', 0.093), ('views', 0.093), ('worked', 0.085), ('lead', 0.083), ('reason', 0.081), ('unique', 0.075), ('gets', 0.074), ('require', 0.073), ('especially', 0.072), ('developer', 0.07), ('social', 0.07), ('details', 0.07), ('goes', 0.068), ('excellent', 0.068), ('solutions', 0.067), ('quite', 0.067), ('sql', 0.067), ('query', 0.067), ('lots', 0.066), ('billion', 0.063), ('page', 0.061), ('start', 0.059), ('less', 0.057)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 519 high scalability-2009-02-23-Database Sharding at Netlog, with MySQL and PHP

Introduction: Jurriaan Persyn is a Lead Web Developer at Netlog, a social portal site that gets 50 million unique visitors and 5+ billion page views per month. In this paper Jurriaan goes into a lot of excellent nuts and bolts details about how they used sharding to scale their system. If you are pondering sharding as a solution to your scaling problems you'll want to read this paper. As the paper is quite well organized there's no reason to write a summary, but I especially liked this part from the conclusion: If you can do with simpler solutions (better hardware, more hardware, server tweaking and tuning, vertical partitioning, sql query optimization, ...) that require less development cost, why invest lots of effort in sharding? On the other hand, when your visitor statistics really start blowing through the roof, it is a good direction to go. After all, it worked for us.

2 0.13610075 672 high scalability-2009-08-06-An Unorthodox Approach to Database Design : The Coming of the Shard

Introduction: Update 4: Why you don’t want to shard. by Morgon on the MySQL Performance Blog. Optimize everything else first, and then if performance still isn’t good enough, it’s time to take a very bitter medicine. Update 3: Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes by Dare Obasanjo. Excellent discussion of why and when you would choose a sharding architecture, how to shard, and problems with sharding. Update 2: Mr. Moore gets to punt on sharding by Alan Rimm-Kaufman of 37signals. Insightful article on design tradeoffs and the evils of premature optimization. With more memory, more CPU, and new tech like SSD, problems can be avoided before more exotic architectures like sharding are needed. Add features not infrastructure. Jeremy Zawodny says he's wrong wrong wrong. we're running multi-core CPUs at slower clock speeds. Moore won't save you. Update: Dan Pritchett shares some excellent Sharding Lessons : Size Your Shards, Use Math on Shard C

3 0.11878652 37 high scalability-2007-07-28-Product: Web Log Storming

Introduction: Web Log Storming is an interactive, desktop-based Web Log Analyzer for Windows. The whole new concept of log analysis makes it clearly different from any other web log analyzer. Browse through statistics to get into details - down to individual visitor's session. Check individual visitor behavior pattern and how it fits into your desired scenario. Web Log Storming does far more than just generate common reports - it displays detailed web site statistics with interactive graphs and reports. Very complete detailed log analysis of activity from every visitor to your web site is only a mouse-click away. In other words, analyze your web logs like never before! It's easy to track sessions, hits, page views, downloads, or whatever metric is most important to each user. You can look at referring pages and see which search engines and keywords were used to bring visitors to the site. Web site behavior, from the top entry and exit pages, to the paths that users follow, can be analyzed. You

4 0.10564769 313 high scalability-2008-05-02-Friends for Sale Architecture - A 300 Million Page View-Month Facebook RoR App

Introduction: Update: Jake in Does Django really scale better than Rails? thinks apps like FFS shouldn't need so much hardware to scale. In a short three months Friends for Sale (think Hot-or-Not with a market economy) grew to become a top 10 Facebook application handling 200 gorgeous requests per second and a stunning 300 million page views a month. They did all this using Ruby on Rails, two part time developers, a cluster of a dozen machines, and a fairly standard architecture. How did Friends for Sale scale to sell all those beautiful people? And how much do you think your friends are worth on the open market?  Site: http://www.facebook.com/apps/application.php?id=7019261521 Information Sources Siqi Chen and Alexander Le, co-creators of Friends for Sale, answering my standard questionairre. Virality on Facebook The Platform Ruby on Rails CentOS 5 (64 bit) Capistrano - update and restart application servers. Memcached MySQL Nginx Starling - distrib

5 0.097081766 570 high scalability-2009-04-15-Implementing large scale web analytics

Introduction: Does anyone know of any articles or papers that discuss the nuts and bolts of how web analytics is implemented at organizations with large volumes of web traffic and a critcal business need to analyze that data - e.g. places like Amazon.com, eBay, and Google? Just as a fun project I'm planning to build my own web log analysis app that can effectively index and query large volumes of web log data (i.e. TB range). But first I'd like to learn more about how it's done in the organizations whose lifeblood depends on this stuff. Even just a high level architectural overview of their approaches would be nice to have.

6 0.093479507 89 high scalability-2007-09-10-Is there a difference between partitioning and federation and sharding?

7 0.09237197 61 high scalability-2007-08-07-What qps should we design for in making a MySpace like site?

8 0.091146924 638 high scalability-2009-06-26-PlentyOfFish Architecture

9 0.090141758 1361 high scalability-2012-11-22-Gone Fishin': PlentyOfFish Architecture

10 0.08708296 511 high scalability-2009-02-12-MySpace Architecture

11 0.086959168 10 high scalability-2007-07-15-Book: Building Scalable Web Sites

12 0.085207172 274 high scalability-2008-03-12-YouTube Architecture

13 0.082496636 1094 high scalability-2011-08-08-Tagged Architecture - Scaling to 100 Million Users, 1000 Servers, and 5 Billion Page Views

14 0.073787697 1032 high scalability-2011-05-02-Stack Overflow Makes Slow Pages 100x Faster by Simple SQL Tuning

15 0.073745921 348 high scalability-2008-07-09-Federation at Flickr: Doing Billions of Queries Per Day

16 0.072960027 105 high scalability-2007-10-01-Statistics Logging Scalability

17 0.071694851 1123 high scalability-2011-09-23-The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month…

18 0.071290001 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O

19 0.070906691 1228 high scalability-2012-04-16-Instagram Architecture Update: What’s new with Instagram?

20 0.070320807 1085 high scalability-2011-07-25-Is NoSQL a Premature Optimization that's Worse than Death? Or the Lady Gaga of the Database World?


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.113), (1, 0.063), (2, -0.0), (3, -0.05), (4, 0.052), (5, 0.007), (6, -0.053), (7, -0.015), (8, 0.021), (9, 0.027), (10, -0.001), (11, -0.023), (12, -0.034), (13, 0.026), (14, -0.021), (15, 0.027), (16, -0.005), (17, -0.003), (18, -0.004), (19, 0.041), (20, 0.019), (21, 0.014), (22, -0.055), (23, -0.016), (24, 0.045), (25, -0.018), (26, -0.051), (27, -0.028), (28, 0.002), (29, 0.032), (30, 0.051), (31, -0.026), (32, 0.034), (33, 0.013), (34, 0.014), (35, -0.013), (36, -0.035), (37, 0.012), (38, -0.006), (39, 0.028), (40, -0.025), (41, 0.038), (42, -0.009), (43, 0.01), (44, 0.029), (45, 0.004), (46, -0.001), (47, -0.003), (48, 0.028), (49, 0.013)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9736352 519 high scalability-2009-02-23-Database Sharding at Netlog, with MySQL and PHP

Introduction: Jurriaan Persyn is a Lead Web Developer at Netlog, a social portal site that gets 50 million unique visitors and 5+ billion page views per month. In this paper Jurriaan goes into a lot of excellent nuts and bolts details about how they used sharding to scale their system. If you are pondering sharding as a solution to your scaling problems you'll want to read this paper. As the paper is quite well organized there's no reason to write a summary, but I especially liked this part from the conclusion: If you can do with simpler solutions (better hardware, more hardware, server tweaking and tuning, vertical partitioning, sql query optimization, ...) that require less development cost, why invest lots of effort in sharding? On the other hand, when your visitor statistics really start blowing through the roof, it is a good direction to go. After all, it worked for us.

2 0.67815036 106 high scalability-2007-10-02-Secrets to Fotolog's Scaling Success

Introduction: Fotolog, a social blogging site centered around photos, grew from about 300 thousand users in 2004 to over 11 million users in 2007. Though they initially experienced the inevitable pains of rapid growth, they overcame their problems and now manage over 300 million photos and 800,000 new photos are added each day. Generating all that fabulous content are 20 million unique monthly visitors and a volunteer army of 30,000 new users each day. They did so well a very impressed suitor bought them out for a cool $90 million. That's scale meets success by anyone standards. How did they do it? Site: http://www.fotolog.com Information Sources Scaling the World's Largest Photo Blogging Community Congrats to Fotolog on $90mm sale to Hi-Media Fotolog overtaking Flickr? Fotolog Hits 11 Million Members and 300 Million Photos Posted Site of the Week: Fotolog.com by PC Magazine CEO John Borthwick's Blog . DBA Frank Mash's Blog Fotolog, lessons learnt by John B

3 0.65675986 1228 high scalability-2012-04-16-Instagram Architecture Update: What’s new with Instagram?

Introduction: The fascination over Instagram continues and fortunately we have several new streams of information to feed the insanity. So consider this article an update to The Instagram Architecture Facebook Bought For A Cool Billion Dollars , based primarily on Scaling Instagram , a slide deck for an AirBnB tech talk given by Instagram co-founder, Mike Krieger. Several other information sources, listed at the bottom of the article, were also used. Unfortunately we just have a slide deck, so the connective tissue of the talk is missing, but it’s still very interesting, in the same spirit of wisdom presentations we often see after developers come up for air after spending significant time spent in the trenches. If you expect to dive deep into the technological details and find a billion reasons why Instagram was acquired, you will be disappointed. That magic can be found in the emotional investment in the relationship between all of the users and the product, not in the bits about h

4 0.64930689 554 high scalability-2009-04-04-Digg Architecture

Introduction: Update 4: : Introducing Digg’s IDDB Infrastructure by Joe Stump. IDDB is a way to partition both indexes (e.g. integer sequences and unique character indexes) and actual tables across multiple storage servers (MySQL and MemcacheDB are currently supported with more to follow). Update 3: : Scaling Digg and Other Web Applications . Update 2: : How Digg Works and How Digg Really Works (wear ear plugs). Brought to you straight from Digg's blog. A very succinct explanation of the major elements of the Digg architecture while tracing a request through the system. I've updated this profile with the new information. Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades . Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. How does Digg handle billions of req

5 0.64902306 1094 high scalability-2011-08-08-Tagged Architecture - Scaling to 100 Million Users, 1000 Servers, and 5 Billion Page Views

Introduction: This is a guest post by Johann Schleier-Smith , CTO & co-founder, Tagged. Five snapshots on how Tagged scaled to more than 1,000 servers Since 2004, Tagged has grown from a tiny social experiment to one of the largest social networks, delivering five billion pages per month to many millions of members who visit to meet and socialize with new people. One step at a time, this evolution forced us to evolve our architecture, eventually arriving at an enormously capable platform . V1: PHP webapp, 100k users, 15 servers, 2004 Tagged was born in the rapid-prototyping culture of an incubator that usually launched two new concepts each year in search of the big winner. LAMP was the natural choice for this style of work, which emphasized flexibility and quick turnaround at a time when Java development was mostly oriented towards development at large enterprises, Python attracted too few programmers, and Perl brought the wrong sort. Also, we knew that Yahoo was

6 0.64885253 1067 high scalability-2011-06-24-Stuff The Internet Says On Scalability For June 24, 2011

7 0.64665073 1361 high scalability-2012-11-22-Gone Fishin': PlentyOfFish Architecture

8 0.64621007 638 high scalability-2009-06-26-PlentyOfFish Architecture

9 0.64440584 313 high scalability-2008-05-02-Friends for Sale Architecture - A 300 Million Page View-Month Facebook RoR App

10 0.64391255 1089 high scalability-2011-07-29-Stuff The Internet Says On Scalability For July 29, 2011

11 0.64002937 771 high scalability-2010-02-04-Hot Scalability Links for February 4, 2010

12 0.63484186 671 high scalability-2009-08-05-Stack Overflow Architecture

13 0.63365853 7 high scalability-2007-07-12-FeedBurner Architecture

14 0.62931913 133 high scalability-2007-10-26-How Gravatar scales on WordPress.com hardware

15 0.62698567 811 high scalability-2010-04-16-Hot Scalability Links for April 16, 2010

16 0.62598675 1057 high scalability-2011-06-10-Stuff The Internet Says On Scalability For June 10, 2011

17 0.62483317 152 high scalability-2007-11-13-Flickr Architecture

18 0.62131399 1047 high scalability-2011-05-25-Stuff to Watch from Surge 2010

19 0.62053692 1024 high scalability-2011-04-15-Stuff The Internet Says On Scalability For April 15, 2011

20 0.61863065 903 high scalability-2010-09-17-Hot Scalability Links For Sep 17, 2010


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.161), (2, 0.269), (9, 0.218), (43, 0.021), (61, 0.114), (85, 0.097)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.88549727 604 high scalability-2009-05-20-Paper: Flux: An Adaptive Partitioning Operator for Continuous Query Systems

Introduction: At the core of the new real-time web, which is really really old, are continuous queries. I like how this paper proposed to handle dynamic demand and dynamic resource availability by making the underlying system adaptable, which seems like a very cloudy kind of thing to do. Abstract: The long-running nature of continuous queries poses new scalability challenges for dataflow processing. CQ systems execute pipelined dataflows that may be shared across multiple queries. The scalability of these dataflows is limited by their constituent, stateful operators – e.g. windowed joins or grouping operators. To scale such operators, a natural solution is to partition them across a shared-nothing platform. But in the CQ context, traditional, static techniques for partitioned parallelism can exhibit detrimental imbalances as workload and runtime conditions evolve. Longrunning CQ dataflows must continue to function robustly in the face of these imbalances. To address this challenge, we introduce

same-blog 2 0.87103629 519 high scalability-2009-02-23-Database Sharding at Netlog, with MySQL and PHP

Introduction: Jurriaan Persyn is a Lead Web Developer at Netlog, a social portal site that gets 50 million unique visitors and 5+ billion page views per month. In this paper Jurriaan goes into a lot of excellent nuts and bolts details about how they used sharding to scale their system. If you are pondering sharding as a solution to your scaling problems you'll want to read this paper. As the paper is quite well organized there's no reason to write a summary, but I especially liked this part from the conclusion: If you can do with simpler solutions (better hardware, more hardware, server tweaking and tuning, vertical partitioning, sql query optimization, ...) that require less development cost, why invest lots of effort in sharding? On the other hand, when your visitor statistics really start blowing through the roof, it is a good direction to go. After all, it worked for us.

3 0.86549926 577 high scalability-2009-04-22-Gear6 Web cache - the hardware solution for working with Memcache

Introduction: The Gear6 Web Cache hybrid DRAM-flash memory architecture allows for 5-10 times more memcache memory per unit of rack space than DRAM-only configurations, and cuts memory costs by 50%. Other software enhancements include a slab allocator that is more efficient than traditional memcache implementations due to its fine-grained bucket sizing. Gear6 Web Cache also supports object sizes greater than 1 megabyte and manages evictions based on the cost of replacing objects, depending on the size and frequency of object access. It intelligently places cache instances across DRAM and flash, taking into account their different characteristics, while at the same time monitoring their health and detecting and de�allocating faulty or failing memory. Gear6 Web Cache is a Memcached protocol compliant solution that scales and accelerates web applications, reduces memory footprint, enhances availability and implements comprehensive Memcached management features. Designed to work with all popular memcac

4 0.84879827 1462 high scalability-2013-05-22-Strategy: Stop Using Linked-Lists

Introduction: What data structure is more sacred than the link list? If we get rid of it what silly interview questions would we use instead? But not using linked-lists is exactly what Aater Suleman recommends in Should you ever use Linked-Lists? In The Secret To 10 Million Concurrent Connections one of the important strategies is not scribbling data all over memory via pointers because following pointers increases cache misses which reduces performance . And there’s nothing more iconic of pointers than the link list. Here are Aeter's reasons to be anti-linked-list: They reduce the benefit of out-of-order execution. They throw off hardware prefetching. They reduce DRAM and TLB locality. They cannot leverage SIMD. They are harder to send to GPUs. He also demolishes the pros of linked-lists, finding arrays a better option in almost every case. Good discussion in the comment section as not everyone agrees. Patrick Wyatt details how a linked-list threading bug repeated

5 0.84733349 1580 high scalability-2014-01-15-Vedis - An Embedded Implementation of Redis Supporting Terabyte Sized Databases

Introduction: I don't know about you, but when I first learned about Redis my initial thought was wow, why hasn't anyone done this before? My next thought was why put this functionality in a separate process? Why not just embed it in your own server code and skip the network path completely? Especially in a Service Oriented Architecture there's no need for an extra hop or extra software installation and configuration. Now you can embed Redis-like code directly into your server with Vedis  - an embeddable datastore C library built with over 70 commands similar in concept to Redis but without the networking layer since Vedis run in the same process of the host application. It's transactional, cross platform, thread safe, key-value, supports terabyte sized databases, has a GPL-like license (which isn't great for commercial apps), and supports an on-disk as well as in-memory datastore. More about Vedis: Unlike most other datastores (i.e. memcache, Redis), Vedis does not have a separate server

6 0.83561254 942 high scalability-2010-11-15-Strategy: Biggest Performance Impact is to Reduce the Number of HTTP Requests

7 0.83507884 18 high scalability-2007-07-16-Paper: MySQL Scale-Out by application partitioning

8 0.80981576 52 high scalability-2007-08-01-Product: Memcached

9 0.8086108 1250 high scalability-2012-05-23-Averages, web performance data, and how your analytics product is lying to you

10 0.80470783 1473 high scalability-2013-06-10-The 10 Deadly Sins Against Scalability

11 0.80321336 579 high scalability-2009-04-24-Heroku - Simultaneously Develop and Deploy Automatically Scalable Rails Applications in the Cloud

12 0.80301428 1123 high scalability-2011-09-23-The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month…

13 0.80294663 754 high scalability-2009-12-22-Incremental deployment

14 0.80190516 149 high scalability-2007-11-12-Scaling Using Cache Farms and Read Pooling

15 0.80169028 240 high scalability-2008-02-05-Handling of Session for a site running from more than 1 data center

16 0.80132651 998 high scalability-2011-03-03-Stack Overflow Architecture Update - Now at 95 Million Page Views a Month

17 0.80056286 313 high scalability-2008-05-02-Friends for Sale Architecture - A 300 Million Page View-Month Facebook RoR App

18 0.80016619 821 high scalability-2010-05-03-MocoSpace Architecture - 3 Billion Mobile Page Views a Month

19 0.79941827 1401 high scalability-2013-02-06-Super Bowl Advertisers Ready for the Traffic? Nope..It's Lights Out.

20 0.79919994 950 high scalability-2010-11-30-NoCAP – Part III – GigaSpaces clustering explained..