high_scalability high_scalability-2008 high_scalability-2008-383 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Hello everyone, I'm designing a website/widget that my business partner and I expect to serve millions of hits daily. As such we must shard our database (and we're designing with shards in mind right from the beginning). However, the one thing I haven't been able to figure out from Googling is the best hardware to go with for shards. I'm using exclusively InnoDB tables. We'll (eventually) be running 3 groups of database servers: a) Session servers for php sessions. These will have a very high write volume. b) ID servers. These will match a couple primary indices (such as user ID) to a given shard. These will have an intense read load, plus a moderate amount of writes. c) Shard servers. These will hold the bulk of the data. These will have a high read load and a lowish write load. Group A is done as a database instead of using memcached so users aren't logged out if a memcached server goes down. As the write load is high, a pair of high performance master-master serv
sentIndex sentText sentNum sentScore
1 Hello everyone, I'm designing a website/widget that my business partner and I expect to serve millions of hits daily. [sent-1, score-0.193]
2 As such we must shard our database (and we're designing with shards in mind right from the beginning). [sent-2, score-0.49]
3 However, the one thing I haven't been able to figure out from Googling is the best hardware to go with for shards. [sent-3, score-0.094]
4 We'll (eventually) be running 3 groups of database servers: a) Session servers for php sessions. [sent-5, score-0.269]
5 These will match a couple primary indices (such as user ID) to a given shard. [sent-8, score-0.108]
6 These will have an intense read load, plus a moderate amount of writes. [sent-9, score-0.218]
7 These will have a high read load and a lowish write load. [sent-12, score-0.306]
8 Group A is done as a database instead of using memcached so users aren't logged out if a memcached server goes down. [sent-13, score-0.508]
9 As the write load is high, a pair of high performance master-master servers seems obvious. [sent-14, score-0.434]
10 What's the ideal hardware setup for machines with this role? [sent-15, score-0.471]
11 Should I bother with RAID > 0 if I have a live backup on the other master? [sent-17, score-0.092]
12 I hear 4 cores is optimal for InnoDB -- recommendations? [sent-18, score-0.302]
13 Again, it looks like maxed RAM is recommended here. [sent-20, score-0.312]
14 Should I think about slaves to a master-master setup? [sent-25, score-0.077]
15 It seems to me these machines can be of any capacity because the data they hold is easily spread between shards. [sent-27, score-0.393]
16 What is the query-per-second per dollar sweet spot when it comes to cores and number of disks? [sent-28, score-0.345]
17 Should I beef these machines up, or stick with low end hardware? [sent-29, score-0.37]
18 I have some other thoughts on system setup, too. [sent-31, score-0.157]
19 Keep in mind that I can recycle machines used in Group A & B in Group C as times goes on. [sent-35, score-0.49]
20 Anyway, I'd love to hear from the expertise of the forum. [sent-36, score-0.137]
wordName wordTfidf (topN-words)
[('group', 0.232), ('setup', 0.222), ('maxed', 0.22), ('disks', 0.186), ('shard', 0.185), ('innodb', 0.18), ('cores', 0.165), ('thoughts', 0.157), ('machines', 0.155), ('hold', 0.15), ('recycle', 0.145), ('googling', 0.145), ('hear', 0.137), ('okay', 0.136), ('beef', 0.136), ('raid', 0.135), ('lowish', 0.13), ('memcached', 0.124), ('ram', 0.118), ('moderate', 0.112), ('designing', 0.111), ('evolves', 0.11), ('exclusively', 0.108), ('indices', 0.108), ('id', 0.107), ('intense', 0.106), ('mind', 0.106), ('hello', 0.103), ('php', 0.098), ('write', 0.096), ('hardware', 0.094), ('dollar', 0.094), ('recommended', 0.092), ('bother', 0.092), ('seems', 0.088), ('database', 0.088), ('logged', 0.088), ('pair', 0.087), ('sweet', 0.086), ('max', 0.086), ('stored', 0.085), ('goes', 0.084), ('servers', 0.083), ('partner', 0.082), ('remain', 0.082), ('high', 0.08), ('stick', 0.079), ('budget', 0.077), ('slaves', 0.077), ('variable', 0.077)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999952 383 high scalability-2008-09-10-Shard servers -- go big or small?
Introduction: Hello everyone, I'm designing a website/widget that my business partner and I expect to serve millions of hits daily. As such we must shard our database (and we're designing with shards in mind right from the beginning). However, the one thing I haven't been able to figure out from Googling is the best hardware to go with for shards. I'm using exclusively InnoDB tables. We'll (eventually) be running 3 groups of database servers: a) Session servers for php sessions. These will have a very high write volume. b) ID servers. These will match a couple primary indices (such as user ID) to a given shard. These will have an intense read load, plus a moderate amount of writes. c) Shard servers. These will hold the bulk of the data. These will have a high read load and a lowish write load. Group A is done as a database instead of using memcached so users aren't logged out if a memcached server goes down. As the write load is high, a pair of high performance master-master serv
2 0.19569561 672 high scalability-2009-08-06-An Unorthodox Approach to Database Design : The Coming of the Shard
Introduction: Update 4: Why you don’t want to shard. by Morgon on the MySQL Performance Blog. Optimize everything else first, and then if performance still isn’t good enough, it’s time to take a very bitter medicine. Update 3: Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes by Dare Obasanjo. Excellent discussion of why and when you would choose a sharding architecture, how to shard, and problems with sharding. Update 2: Mr. Moore gets to punt on sharding by Alan Rimm-Kaufman of 37signals. Insightful article on design tradeoffs and the evils of premature optimization. With more memory, more CPU, and new tech like SSD, problems can be avoided before more exotic architectures like sharding are needed. Add features not infrastructure. Jeremy Zawodny says he's wrong wrong wrong. we're running multi-core CPUs at slower clock speeds. Moore won't save you. Update: Dan Pritchett shares some excellent Sharding Lessons : Size Your Shards, Use Math on Shard C
3 0.16424613 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
Introduction: For everything given something seems to be taken. Caching is a great scalability solution, but caching also comes with problems . Sharding is a great scalability solution, but as Foursquare recently revealed in a post-mortem about their 17 hours of downtime, sharding also has problems. MongoDB, the database Foursquare uses, also contributed their post-mortem of what went wrong too. Now that everyone has shared and resharded, what can we learn to help us skip these mistakes and quickly move on to a different set of mistakes? First, like for Facebook , huge props to Foursquare and MongoDB for being upfront and honest about their problems. This helps everyone get better and is a sign we work in a pretty cool industry. Second, overall, the fault didn't flow from evil hearts or gross negligence. As usual the cause was more mundane: a key system, that could be a little more robust, combined with a very popular application built by a small group of people, under immense pressure
4 0.16239734 152 high scalability-2007-11-13-Flickr Architecture
Introduction: Update: Flickr hits 2 Billion photos served. That's a lot of hamburgers. Flickr is both my favorite bird and the web's leading photo sharing site. Flickr has an amazing challenge, they must handle a vast sea of ever expanding new content, ever increasing legions of users, and a constant stream of new features, all while providing excellent performance. How do they do it? Site: http://www.flickr.com Information Sources Flickr and PHP (an early document) Capacity Planning for LAMP Federation at Flickr: Doing Billions of Queries a Day by Dathan Pattishall. Building Scalable Web Sites by Cal Henderson from Flickr. Database War Stories #3: Flickr by Tim O'Reilly Cal Henderson's Talks . A lot of useful PowerPoint presentations. Platform PHP MySQL Shards Memcached for a caching layer. Squid in reverse-proxy for html and images. Linux (RedHat) Smarty for templating Perl PEAR for XML and Email parsing ImageMagick, for ima
5 0.14258049 1346 high scalability-2012-10-24-Saving Cash Using Less Cache - 90% Savings in the Caching Tier
Introduction: In a paper delivered at HotCloud '12 by a group from CMU and Intel Labs, Saving Cash by Using Less Cache ( slides , pdf ), they show it may be possible to use less DRAM under low load conditions to save on operational costs. There are some issues with this idea, but in a give me more cache era, it could be an interesting source of cost savings for your product. Caching is used to: Reduce load on the database. Reduce latency. Problem: RAM in the cloud is quite expensive. A third of costs can come from the caching tier. Solution: Shrink your cache when the load is lower. Their work shows when the load drops below a certain point you can throw away 50% of your cache while still maintaining performance. A few popular items often account for most of your hits, implying can remove the cache for the long tail. Use two tiers of servers, the Retiring Group, which is the group of servers you want to get rid of. The Primary Group is the group of servers you
7 0.1398648 1279 high scalability-2012-07-09-Data Replication in NoSQL Databases
8 0.13539915 1114 high scalability-2011-09-13-Must see: 5 Steps to Scaling MongoDB (Or Any DB) in 8 Minutes
9 0.1313477 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
10 0.12904818 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O
11 0.12577139 72 high scalability-2007-08-22-Wikimedia architecture
12 0.12384976 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
13 0.11726923 150 high scalability-2007-11-12-Slashdot Architecture - How the Old Man of the Internet Learned to Scale
14 0.11358235 358 high scalability-2008-07-26-Sharding the Hibernate Way
15 0.10975605 1331 high scalability-2012-10-02-An Epic TripAdvisor Update: Why Not Run on the Cloud? The Grand Experiment.
16 0.10912194 554 high scalability-2009-04-04-Digg Architecture
17 0.10715243 729 high scalability-2009-10-28-And the winner is: MySQL or Memcached or Tokyo Tyrant?
18 0.10537252 313 high scalability-2008-05-02-Friends for Sale Architecture - A 300 Million Page View-Month Facebook RoR App
19 0.10530359 367 high scalability-2008-08-17-Strategy: Drop Memcached, Add More MySQL Servers
20 0.10507163 1065 high scalability-2011-06-21-Running TPC-C on MySQL-RDS
topicId topicWeight
[(0, 0.183), (1, 0.088), (2, -0.053), (3, -0.12), (4, 0.01), (5, 0.054), (6, 0.024), (7, -0.064), (8, 0.033), (9, -0.037), (10, -0.03), (11, -0.038), (12, -0.029), (13, 0.092), (14, 0.007), (15, 0.028), (16, -0.049), (17, 0.017), (18, -0.046), (19, 0.103), (20, -0.016), (21, 0.003), (22, -0.054), (23, -0.012), (24, -0.053), (25, 0.004), (26, 0.067), (27, -0.055), (28, -0.056), (29, 0.043), (30, 0.016), (31, 0.016), (32, 0.042), (33, -0.001), (34, 0.023), (35, -0.027), (36, 0.068), (37, 0.016), (38, -0.049), (39, -0.012), (40, -0.043), (41, 0.08), (42, -0.063), (43, -0.047), (44, 0.007), (45, -0.015), (46, 0.065), (47, -0.017), (48, 0.02), (49, -0.041)]
simIndex simValue blogId blogTitle
same-blog 1 0.9542141 383 high scalability-2008-09-10-Shard servers -- go big or small?
Introduction: Hello everyone, I'm designing a website/widget that my business partner and I expect to serve millions of hits daily. As such we must shard our database (and we're designing with shards in mind right from the beginning). However, the one thing I haven't been able to figure out from Googling is the best hardware to go with for shards. I'm using exclusively InnoDB tables. We'll (eventually) be running 3 groups of database servers: a) Session servers for php sessions. These will have a very high write volume. b) ID servers. These will match a couple primary indices (such as user ID) to a given shard. These will have an intense read load, plus a moderate amount of writes. c) Shard servers. These will hold the bulk of the data. These will have a high read load and a lowish write load. Group A is done as a database instead of using memcached so users aren't logged out if a memcached server goes down. As the write load is high, a pair of high performance master-master serv
2 0.79154754 672 high scalability-2009-08-06-An Unorthodox Approach to Database Design : The Coming of the Shard
Introduction: Update 4: Why you don’t want to shard. by Morgon on the MySQL Performance Blog. Optimize everything else first, and then if performance still isn’t good enough, it’s time to take a very bitter medicine. Update 3: Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes by Dare Obasanjo. Excellent discussion of why and when you would choose a sharding architecture, how to shard, and problems with sharding. Update 2: Mr. Moore gets to punt on sharding by Alan Rimm-Kaufman of 37signals. Insightful article on design tradeoffs and the evils of premature optimization. With more memory, more CPU, and new tech like SSD, problems can be avoided before more exotic architectures like sharding are needed. Add features not infrastructure. Jeremy Zawodny says he's wrong wrong wrong. we're running multi-core CPUs at slower clock speeds. Moore won't save you. Update: Dan Pritchett shares some excellent Sharding Lessons : Size Your Shards, Use Math on Shard C
3 0.78655636 152 high scalability-2007-11-13-Flickr Architecture
Introduction: Update: Flickr hits 2 Billion photos served. That's a lot of hamburgers. Flickr is both my favorite bird and the web's leading photo sharing site. Flickr has an amazing challenge, they must handle a vast sea of ever expanding new content, ever increasing legions of users, and a constant stream of new features, all while providing excellent performance. How do they do it? Site: http://www.flickr.com Information Sources Flickr and PHP (an early document) Capacity Planning for LAMP Federation at Flickr: Doing Billions of Queries a Day by Dathan Pattishall. Building Scalable Web Sites by Cal Henderson from Flickr. Database War Stories #3: Flickr by Tim O'Reilly Cal Henderson's Talks . A lot of useful PowerPoint presentations. Platform PHP MySQL Shards Memcached for a caching layer. Squid in reverse-proxy for html and images. Linux (RedHat) Smarty for templating Perl PEAR for XML and Email parsing ImageMagick, for ima
4 0.70409542 847 high scalability-2010-06-23-Product: dbShards - Share Nothing. Shard Everything.
Introduction: I met the CodeFutures folks, makers of dbShards , at Gluecon . They occupy an interesting niche in the database space, somewhere between NoSQL , which jettisons everything SQL, and high end analytics platforms that completely rewrite the backend while keeping a SQL facade. High concept: I think of dbShards as a sort of commercial OLTP mashup of features from HSCALE (partitioning) + MySQL Proxy (transparent intermediate layer) + Memcached (client side sharding) + Gigaspaces (parallel query) + MySQL (transactions). You may find dbShards interesting if you are looking to keep SQL, need scale out writes and reads, need out of the box parallel query capabilities, and would prefer to use a standard platform like MySQL as a base. To learn more about dbShards I asked Cory Isaacson (CEO and CTO) a few devastatingly difficult questions (not really). Who are you, what is dbShards, and what problem was dbShards created to solve? I’m Cory Isaacson, CEO/CTO of CodeFutures Corp
5 0.69191682 1114 high scalability-2011-09-13-Must see: 5 Steps to Scaling MongoDB (Or Any DB) in 8 Minutes
Introduction: Jared Rosoff concisely, effectively, entertainingly, and convincingly gives an 8 minute MongoDB tutorial on scaling MongoDB at Scale Out Camp . The ideas aren't just limited to MongoDB, they work for most any database: Optimize your queries; Know your working set size; Tune your file system; Choose the right disks; Shard. Here's an explanation of all 5 strategies: Optimize your queries . Computer science works. Complexity analysis works. A btree search is faster than a table scan. So analyze your queries. Use explain to see what your query is doing. If it is saying it's using a cursor then it's doing a table scan. That's slow. Look at the number of documents it looks at to satisfy a query. Look at how long it takes. Fix: add indexes. It doesn't matter if you are running on 1 or 100 servers. Know your working set size . Sticking memcache in front of your database is silly. You have lots of RAM, use it. Embed your cache in the database, which is how MongoDB works. Working set
6 0.69065195 1183 high scalability-2012-01-30-37signals Still Happily Scaling on Moore RAM and SSDs
7 0.68603069 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
8 0.68454969 72 high scalability-2007-08-22-Wikimedia architecture
9 0.67527658 857 high scalability-2010-07-13-DbShards Part Deux - The Internals
10 0.67400879 345 high scalability-2008-06-11-Pyshards aspires to build sharding toolkit for Python
11 0.66891718 476 high scalability-2008-12-28-How to Organize a Database Table’s Keys for Scalability
12 0.66724122 142 high scalability-2007-11-05-Strategy: Diagonal Scaling - Don't Forget to Scale Out AND Up
13 0.66304994 1279 high scalability-2012-07-09-Data Replication in NoSQL Databases
14 0.66278142 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O
15 0.65462154 358 high scalability-2008-07-26-Sharding the Hibernate Way
16 0.65320486 157 high scalability-2007-11-16-Product: lbpool - Load Balancing JDBC Pool
17 0.64711857 150 high scalability-2007-11-12-Slashdot Architecture - How the Old Man of the Internet Learned to Scale
18 0.64538854 302 high scalability-2008-04-10-Mysql scalability and failover...
19 0.64371455 248 high scalability-2008-02-13-What's your scalability plan?
20 0.6384263 1606 high scalability-2014-03-05-10 Things You Should Know About Running MongoDB at Scale
topicId topicWeight
[(1, 0.143), (2, 0.161), (10, 0.108), (30, 0.014), (40, 0.013), (47, 0.014), (61, 0.206), (66, 0.122), (79, 0.083), (85, 0.025), (94, 0.023)]
simIndex simValue blogId blogTitle
same-blog 1 0.9366073 383 high scalability-2008-09-10-Shard servers -- go big or small?
Introduction: Hello everyone, I'm designing a website/widget that my business partner and I expect to serve millions of hits daily. As such we must shard our database (and we're designing with shards in mind right from the beginning). However, the one thing I haven't been able to figure out from Googling is the best hardware to go with for shards. I'm using exclusively InnoDB tables. We'll (eventually) be running 3 groups of database servers: a) Session servers for php sessions. These will have a very high write volume. b) ID servers. These will match a couple primary indices (such as user ID) to a given shard. These will have an intense read load, plus a moderate amount of writes. c) Shard servers. These will hold the bulk of the data. These will have a high read load and a lowish write load. Group A is done as a database instead of using memcached so users aren't logged out if a memcached server goes down. As the write load is high, a pair of high performance master-master serv
2 0.9054215 1031 high scalability-2011-04-28-PaaS on OpenStack - Run Applications on Any Cloud, Any Time Using Any Thing
Introduction: Yesterday, I had a session during the OpenStack Summit where I tried to present a more general view on how we should be thinking about PaaS in the context of OpenStack. The key takeaway : The main goal of PaaS is to drive productivity into the process by which we can deliver new applications. Most of the existing PaaS solutions take a fairly extreme approach with their abstraction of the underlying infrastructure and therefore fit a fairly small number of extremely simple applications and thus miss the real promise of PaaS. Amazon's Elastic Beanstalk took a more bottom up approach giving us better set of tradeoffs between the abstraction and control which makes it more broadly applicable to a larger set of applications. The fact that OpenStack is opensource allows us to think differently on the things we can do at the platform layer. We can create a tighter integration between the PaaS and IaaS layers and thus come up with better set of tradeoffs into the way we drive
3 0.90050519 150 high scalability-2007-11-12-Slashdot Architecture - How the Old Man of the Internet Learned to Scale
Introduction: Slashdot effect : overwhelming unprepared sites with an avalanche of reader's clicks after being mentioned on Slashdot. Sure, we now have the "Digg effect" and other hot new stars, but Slashdot was the original. And like many stars from generations past, Slashdot plays the elder statesman's role with with class, dignity, and restraint. Yet with millions and millions of users Slashdot is still box office gold and more than keeps up with the young'ins. And with age comes the wisdom of learning how to handle all those users. Just how does Slashdot scale and what can you learn by going old school? Site: http://slashdot.org Information Sources Slashdot's Setup, Part 1- Hardware Slashdot's Setup, Part 2- Software History of Slashdot Part 3- Going Corporate The History of Slashdot Part 4 - Yesterday, Today, Tomorrow The Platform MySQL Linux (CentOS/RHEL) Pound Apache Perl Memcached LVS The Stats Started building the system in 1999
4 0.89620942 181 high scalability-2007-12-11-Hosting and CDN for startup video sharing site
Introduction: This question is for all the gurus here. Please help this novice x I am starting a video sharing site like YouTube in India. I want to offer the best quality possible, at minimum cost. Nothing new about it, right? :). I have done some research on the dedicated hosting services and CDN services available and I have some basic knowledge on these. Following are my requirements 1) My budget is $500 to $1000 per month for hosting (including CDN if and as applicable). 2) I will need around 500GB of storage and 1TB per month of bandwidth in first 2-3 months and then about 10TB of storage and 5TB per month of bandwidth. And more ... depending on how big it gets (I can afford more when it gets big) 3) 90% of my viewers are in India. Other 10% are in US and UK. Based on the above, could you please answer my following questions? 1) Can I go with just a good dedicated server to start with and get a CDN service later on when the site gets big? Or do you think its wise
5 0.89560902 1411 high scalability-2013-02-22-Stuff The Internet Says On Scalability For February 22, 2013
Introduction: Hey, it's HighScalability time: Quotable Quotes: @p337er : I have committed some truly horrendous crimes against scalability today. @ErrataRob : doubling performance doesn't double scalability. @rsingel : In 2008 when Yahoo.com linked out, I had a Wired story get 1M visitors in an hour from their homepage. @philiph : Lets solve this scalability problem with a queuing system @jaykreps : Transferring data across data centers? Read this page and go tune your TCP buffer sizes... @gwestr : In which the node community showers schadenfreude upon the rails community for "scalability is not my problem" architectures @pbailis : Makes sense, though I think there's a tradeoff re: coordination and scalability (always homogeneous vs dynamically heterogenous) @pembleton : To summarize Yoav's philosophy: we started as quick as we can and then we accelerated #operationgrandma in #reversim @surfichris : “We chose Heroku because we be
6 0.89519644 1089 high scalability-2011-07-29-Stuff The Internet Says On Scalability For July 29, 2011
7 0.89414942 775 high scalability-2010-02-10-ElasticSearch - Open Source, Distributed, RESTful Search Engine
8 0.89193457 1289 high scalability-2012-07-23-State of the CDN: More Traffic, Stable Prices, More Products, Profits - Not So Much
9 0.89023358 1184 high scalability-2012-01-31-Performance in the Cloud: Business Jitter is Bad
10 0.89003432 1191 high scalability-2012-02-13-Tumblr Architecture - 15 Billion Page Views a Month and Harder to Scale than Twitter
12 0.88999915 1461 high scalability-2013-05-20-The Tumblr Architecture Yahoo Bought for a Cool Billion Dollars
13 0.8899405 100 high scalability-2007-09-26-Use a CDN to Instantly Improve Your Website's Performance by 20% or More
14 0.88868839 475 high scalability-2008-12-22-SLAs in the SaaS space
15 0.88368982 1189 high scalability-2012-02-07-Hypertable Routs HBase in Performance Test -- HBase Overwhelmed by Garbage Collection
16 0.88114709 931 high scalability-2010-10-28-Notes from A NOSQL Evening in Palo Alto
17 0.88079059 337 high scalability-2008-05-31-memcached and Storage of Friend list
18 0.87574261 142 high scalability-2007-11-05-Strategy: Diagonal Scaling - Don't Forget to Scale Out AND Up
19 0.87467486 856 high scalability-2010-07-12-Creating Scalable Digital Libraries
20 0.87363136 375 high scalability-2008-09-01-A Scalability checklist?