high_scalability high_scalability-2013 high_scalability-2013-1514 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: This is a guest post by Zardosht Kasheff , Software Developer at Tokutek , a storage engine company that delivers 21st-Century capabilities to the leading open source data management platforms. As software developers, we value abstraction. The simpler the API, the more attractive it becomes. Arguably, MongoDB’s greatest strengths are its elegant API and its agility , which let developers simply code. But when MongoDB runs into scalability problems on big data , developers need to peek underneath the covers to understand the underlying issues and how to fix them. Without understanding, one may end up with an inefficient solution that costs time and money. For example, one may shard prematurely, increasing hardware and management costs, when a simpler replication setup would do. Or, one may increase the size of a replica set when upgrading to SSDs would suffice. This article shows how to reason about some big data scalability problems in an effort to find efficient solut
sentIndex sentText sentNum sentScore
1 For example, one may shard prematurely, increasing hardware and management costs, when a simpler replication setup would do. [sent-7, score-0.533]
2 Or, one may increase the size of a replica set when upgrading to SSDs would suffice. [sent-8, score-0.492]
3 A range query used to retrieve 100 documents may induce 1 I/O, whereas 100 point queries to retrieve 100 documents may induce 100 I/O’s. [sent-38, score-1.791]
4 Using an index on a collection, this does a combination of range queries and point queries. [sent-47, score-0.63]
5 If not, then a combination of range queries and point queries are used. [sent-49, score-0.919]
6 One can reduce the I/O of the application by avoiding doing individual point queries to retrieve each document . [sent-74, score-0.715]
7 To do this, we use covering or clustering indexes that smartly filter the documents analyzed by the query, and can report results using range queries . [sent-75, score-0.727]
8 If you have an OLTP application and your queries are essentially point queries (because they retrieve very few documents), then even with proper indexes, you may still have an I/O bottleneck. [sent-77, score-1.033]
9 Also, additional indexes increase the cost of insertions, as each insertion must keep the indexes up to date as well, but write-optimized databases mitigate that cost. [sent-79, score-0.567]
10 Updates and deletes are tricky in that they are a combination of queries and inserts. [sent-83, score-0.461]
11 Read scaling via replication Read scaling with replication is effective for applications where queries are the bottleneck. [sent-110, score-0.435]
12 If inserts, updates, or deletes are your bottleneck, then replication may not be very effective, because the write work is duplicated on all servers that are added to the replica set. [sent-112, score-0.628]
13 Sharding Sharding partitions your data across different replica sets based on a shard key. [sent-114, score-0.664]
14 Different replica sets in the cluster are responsible for ranges of values in the shard key space. [sent-115, score-0.719]
15 So, an application’s write throughput is increased by spreading the write workload across separate replica sets in a cluster. [sent-116, score-0.595]
16 By partitioning the data by ranges in the shard key space, queries that use the shard key can effectively do range queries on a few shards, making such queries very efficient. [sent-118, score-1.9]
17 If one makes the shard key a hash, then all range queries must run on all shards in the cluster, but point queries on the shard key run on single shards. [sent-119, score-1.684]
18 A good shard key has the following properties: ● Many (if not all) of your queries use the shard key . [sent-129, score-0.987]
19 ● The shard key should do a good job of distributing writes to different replica sets in the cluster. [sent-132, score-0.726]
20 If all writes are directed to the same replica set in the cluster, then that replica set becomes a bottleneck for writes, just as it was in a non- sharded setup. [sent-133, score-0.615]
wordName wordTfidf (topN-words)
[('queries', 0.289), ('shard', 0.277), ('replica', 0.226), ('range', 0.18), ('induces', 0.165), ('query', 0.157), ('retrieve', 0.152), ('documents', 0.14), ('oltp', 0.136), ('solutions', 0.13), ('induce', 0.129), ('indexes', 0.118), ('may', 0.116), ('write', 0.111), ('updates', 0.109), ('cost', 0.108), ('ssds', 0.103), ('deletes', 0.102), ('inducing', 0.1), ('understanding', 0.096), ('application', 0.096), ('point', 0.091), ('arguably', 0.09), ('bottlenecks', 0.089), ('solution', 0.088), ('causing', 0.087), ('bottleneck', 0.087), ('document', 0.087), ('data', 0.086), ('increase', 0.083), ('underneath', 0.079), ('basic', 0.078), ('peek', 0.076), ('writes', 0.076), ('sets', 0.075), ('replication', 0.073), ('improving', 0.072), ('workload', 0.072), ('key', 0.072), ('must', 0.07), ('combination', 0.07), ('databases', 0.07), ('big', 0.069), ('ranges', 0.069), ('discussing', 0.069), ('ram', 0.068), ('addressing', 0.067), ('elegant', 0.067), ('one', 0.067), ('software', 0.065)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O
Introduction: This is a guest post by Zardosht Kasheff , Software Developer at Tokutek , a storage engine company that delivers 21st-Century capabilities to the leading open source data management platforms. As software developers, we value abstraction. The simpler the API, the more attractive it becomes. Arguably, MongoDB’s greatest strengths are its elegant API and its agility , which let developers simply code. But when MongoDB runs into scalability problems on big data , developers need to peek underneath the covers to understand the underlying issues and how to fix them. Without understanding, one may end up with an inefficient solution that costs time and money. For example, one may shard prematurely, increasing hardware and management costs, when a simpler replication setup would do. Or, one may increase the size of a replica set when upgrading to SSDs would suffice. This article shows how to reason about some big data scalability problems in an effort to find efficient solut
2 0.28983939 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
Introduction: For everything given something seems to be taken. Caching is a great scalability solution, but caching also comes with problems . Sharding is a great scalability solution, but as Foursquare recently revealed in a post-mortem about their 17 hours of downtime, sharding also has problems. MongoDB, the database Foursquare uses, also contributed their post-mortem of what went wrong too. Now that everyone has shared and resharded, what can we learn to help us skip these mistakes and quickly move on to a different set of mistakes? First, like for Facebook , huge props to Foursquare and MongoDB for being upfront and honest about their problems. This helps everyone get better and is a sign we work in a pretty cool industry. Second, overall, the fault didn't flow from evil hearts or gross negligence. As usual the cause was more mundane: a key system, that could be a little more robust, combined with a very popular application built by a small group of people, under immense pressure
3 0.24869357 672 high scalability-2009-08-06-An Unorthodox Approach to Database Design : The Coming of the Shard
Introduction: Update 4: Why you don’t want to shard. by Morgon on the MySQL Performance Blog. Optimize everything else first, and then if performance still isn’t good enough, it’s time to take a very bitter medicine. Update 3: Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes by Dare Obasanjo. Excellent discussion of why and when you would choose a sharding architecture, how to shard, and problems with sharding. Update 2: Mr. Moore gets to punt on sharding by Alan Rimm-Kaufman of 37signals. Insightful article on design tradeoffs and the evils of premature optimization. With more memory, more CPU, and new tech like SSD, problems can be avoided before more exotic architectures like sharding are needed. Add features not infrastructure. Jeremy Zawodny says he's wrong wrong wrong. we're running multi-core CPUs at slower clock speeds. Moore won't save you. Update: Dan Pritchett shares some excellent Sharding Lessons : Size Your Shards, Use Math on Shard C
4 0.20677851 152 high scalability-2007-11-13-Flickr Architecture
Introduction: Update: Flickr hits 2 Billion photos served. That's a lot of hamburgers. Flickr is both my favorite bird and the web's leading photo sharing site. Flickr has an amazing challenge, they must handle a vast sea of ever expanding new content, ever increasing legions of users, and a constant stream of new features, all while providing excellent performance. How do they do it? Site: http://www.flickr.com Information Sources Flickr and PHP (an early document) Capacity Planning for LAMP Federation at Flickr: Doing Billions of Queries a Day by Dathan Pattishall. Building Scalable Web Sites by Cal Henderson from Flickr. Database War Stories #3: Flickr by Tim O'Reilly Cal Henderson's Talks . A lot of useful PowerPoint presentations. Platform PHP MySQL Shards Memcached for a caching layer. Squid in reverse-proxy for html and images. Linux (RedHat) Smarty for templating Perl PEAR for XML and Email parsing ImageMagick, for ima
5 0.19934641 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
Introduction: Pinterest has been riding an exponential growth curve, doubling every month and half. They've gone from 0 to 10s of billions of page views a month in two years, from 2 founders and one engineer to over 40 engineers, from one little MySQL server to 180 Web Engines, 240 API Engines, 88 MySQL DBs (cc2.8xlarge) + 1 slave each, 110 Redis Instances, and 200 Memcache Instances.Stunning growth. So what's Pinterest's story? To tell their story we have our bards, Pinterest'sYashwanth NelapatiandMarty Weiner, who tell the dramatic story of Pinterest's architecture evolution in a talk titledScaling Pinterest. This is the talk they would have liked to hear a year and half ago when they were scaling fast and there were a lot of options to choose from. And they made a lot of incorrect choices.This is a great talk. It's full of amazing details. It's also very practical, down to earth, and it contains strategies adoptable by nearly anyone. Highly recommended.Two of my favorite lessons from the talk:Arc
6 0.19478971 1606 high scalability-2014-03-05-10 Things You Should Know About Running MongoDB at Scale
7 0.18928622 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
8 0.18553403 1114 high scalability-2011-09-13-Must see: 5 Steps to Scaling MongoDB (Or Any DB) in 8 Minutes
9 0.18037623 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
10 0.17408593 589 high scalability-2009-05-05-Drop ACID and Think About Data
11 0.17108992 358 high scalability-2008-07-26-Sharding the Hibernate Way
12 0.16911031 775 high scalability-2010-02-10-ElasticSearch - Open Source, Distributed, RESTful Search Engine
13 0.16333798 1337 high scalability-2012-10-10-Antirez: You Need to Think in Terms of Organizing Your Data for Fetching
14 0.16297844 946 high scalability-2010-11-22-Strategy: Google Sends Canary Requests into the Data Mine
15 0.15466256 847 high scalability-2010-06-23-Product: dbShards - Share Nothing. Shard Everything.
16 0.15231353 1291 high scalability-2012-07-25-Vertical Scaling Ascendant - How are SSDs Changing Architectures?
17 0.14711326 327 high scalability-2008-05-27-How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale
18 0.14659441 1089 high scalability-2011-07-29-Stuff The Internet Says On Scalability For July 29, 2011
19 0.14654337 476 high scalability-2008-12-28-How to Organize a Database Table’s Keys for Scalability
topicId topicWeight
[(0, 0.288), (1, 0.151), (2, -0.037), (3, -0.013), (4, 0.041), (5, 0.172), (6, 0.057), (7, -0.055), (8, 0.01), (9, -0.055), (10, 0.016), (11, 0.021), (12, -0.104), (13, 0.082), (14, 0.025), (15, 0.043), (16, -0.073), (17, -0.016), (18, 0.019), (19, 0.078), (20, 0.0), (21, -0.012), (22, 0.015), (23, 0.01), (24, -0.022), (25, -0.0), (26, -0.073), (27, -0.112), (28, 0.005), (29, 0.144), (30, 0.045), (31, 0.008), (32, 0.074), (33, 0.057), (34, 0.015), (35, 0.035), (36, 0.036), (37, -0.023), (38, -0.084), (39, -0.044), (40, -0.093), (41, 0.062), (42, 0.001), (43, 0.019), (44, 0.023), (45, -0.017), (46, -0.028), (47, -0.01), (48, 0.027), (49, -0.019)]
simIndex simValue blogId blogTitle
same-blog 1 0.9686681 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O
Introduction: This is a guest post by Zardosht Kasheff , Software Developer at Tokutek , a storage engine company that delivers 21st-Century capabilities to the leading open source data management platforms. As software developers, we value abstraction. The simpler the API, the more attractive it becomes. Arguably, MongoDB’s greatest strengths are its elegant API and its agility , which let developers simply code. But when MongoDB runs into scalability problems on big data , developers need to peek underneath the covers to understand the underlying issues and how to fix them. Without understanding, one may end up with an inefficient solution that costs time and money. For example, one may shard prematurely, increasing hardware and management costs, when a simpler replication setup would do. Or, one may increase the size of a replica set when upgrading to SSDs would suffice. This article shows how to reason about some big data scalability problems in an effort to find efficient solut
2 0.85074341 672 high scalability-2009-08-06-An Unorthodox Approach to Database Design : The Coming of the Shard
Introduction: Update 4: Why you don’t want to shard. by Morgon on the MySQL Performance Blog. Optimize everything else first, and then if performance still isn’t good enough, it’s time to take a very bitter medicine. Update 3: Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes by Dare Obasanjo. Excellent discussion of why and when you would choose a sharding architecture, how to shard, and problems with sharding. Update 2: Mr. Moore gets to punt on sharding by Alan Rimm-Kaufman of 37signals. Insightful article on design tradeoffs and the evils of premature optimization. With more memory, more CPU, and new tech like SSD, problems can be avoided before more exotic architectures like sharding are needed. Add features not infrastructure. Jeremy Zawodny says he's wrong wrong wrong. we're running multi-core CPUs at slower clock speeds. Moore won't save you. Update: Dan Pritchett shares some excellent Sharding Lessons : Size Your Shards, Use Math on Shard C
3 0.8487398 847 high scalability-2010-06-23-Product: dbShards - Share Nothing. Shard Everything.
Introduction: I met the CodeFutures folks, makers of dbShards , at Gluecon . They occupy an interesting niche in the database space, somewhere between NoSQL , which jettisons everything SQL, and high end analytics platforms that completely rewrite the backend while keeping a SQL facade. High concept: I think of dbShards as a sort of commercial OLTP mashup of features from HSCALE (partitioning) + MySQL Proxy (transparent intermediate layer) + Memcached (client side sharding) + Gigaspaces (parallel query) + MySQL (transactions). You may find dbShards interesting if you are looking to keep SQL, need scale out writes and reads, need out of the box parallel query capabilities, and would prefer to use a standard platform like MySQL as a base. To learn more about dbShards I asked Cory Isaacson (CEO and CTO) a few devastatingly difficult questions (not really). Who are you, what is dbShards, and what problem was dbShards created to solve? I’m Cory Isaacson, CEO/CTO of CodeFutures Corp
4 0.83523262 476 high scalability-2008-12-28-How to Organize a Database Table’s Keys for Scalability
Introduction: The key (no pun intended) to understanding how to organize your dataset’s data is to think of each shard not as an individual database, but as one large singular database. Just as in a normal single server database setup where you have a unique key for each row within a table, each row key within each individual shard must be unique to the whole dataset partitioned across all shards. There are a few different ways we can accomplish uniqueness of row keys across a shard cluster. Each has its pro’s and con’s and the one chosen should be specific to the problems you’re trying to solve.
5 0.83232856 358 high scalability-2008-07-26-Sharding the Hibernate Way
Introduction: Update : A very nice JavaWorld podcast interview with Google engineer Max Ross on Hibernate Shards . Max defines Hibernate Shards (horizontal partitioning), how it works (pretty well), virtual shards (don't ask), what they need to do in the future (query, replication, operational tools), and how it relates to Google AppEngine (not much). To scale you are supposed to partition your data. Sounds good, but how do you do it? When you actually sit down to work out all the details it’s not that easy. Hibernate Shards to the rescue! Hibernate shards is: an extension to the core Hibernate product that adds facilities for horizontal partitioning. If you know the core Hibernate API you know the shards API. No learning curve at all. Here is what a few members of the core group had to say about the Hibernate Shards open source project. Although there are some limitations, from the sound of it they are doing useful stuff in the right way and it’s very much worth looking at, especially if you us
6 0.82001674 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
7 0.81216985 1606 high scalability-2014-03-05-10 Things You Should Know About Running MongoDB at Scale
8 0.79684341 1304 high scalability-2012-08-14-MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) and Familiar (SQL)
9 0.79597515 281 high scalability-2008-03-18-Database Design 101
10 0.77571577 857 high scalability-2010-07-13-DbShards Part Deux - The Internals
11 0.77082378 152 high scalability-2007-11-13-Flickr Architecture
12 0.76957327 351 high scalability-2008-07-16-The Mother of All Database Normalization Debates on Coding Horror
13 0.76204705 1114 high scalability-2011-09-13-Must see: 5 Steps to Scaling MongoDB (Or Any DB) in 8 Minutes
14 0.74695241 345 high scalability-2008-06-11-Pyshards aspires to build sharding toolkit for Python
15 0.74683982 65 high scalability-2007-08-16-Scaling Secret #2: Denormalizing Your Way to Speed and Profit
16 0.74614269 986 high scalability-2011-02-10-Database Isolation Levels And Their Effects on Performance and Scalability
17 0.74172705 933 high scalability-2010-11-01-Hot Trend: Move Behavior to Data for a New Interactive Application Architecture
18 0.74085879 1281 high scalability-2012-07-11-FictionPress: Publishing 6 Million Works of Fiction on the Web
19 0.74021554 587 high scalability-2009-05-01-FastBit: An Efficient Compressed Bitmap Index Technology
20 0.73884016 1337 high scalability-2012-10-10-Antirez: You Need to Think in Terms of Organizing Your Data for Fetching
topicId topicWeight
[(1, 0.175), (2, 0.21), (10, 0.076), (27, 0.011), (30, 0.033), (40, 0.019), (47, 0.011), (61, 0.115), (63, 0.067), (79, 0.119), (85, 0.012), (94, 0.06)]
simIndex simValue blogId blogTitle
1 0.98567963 999 high scalability-2011-03-04-Stuff The Internet Says On Scalability For March 4, 2011
Introduction: Submitted for your reading pleasure on this beautifully blue and sunny Friday... @Werner : Each day #AWS adds enough computing muscle to power one whole Amazon.com circa 2000, when it was a $2.8 billion business http://wv.ly/gMr8LQ Building servers to rule in hell. Datacenters spend a lot of energy on cooling down processors. Why can't they operate at higher temperatures? This is the proposition addressed by James Hamilton in Exploring the Limits of Datacenter Temprature and Datacenter Knowledge in What’s Next? Hotter Servers with ‘Gas Pedals’ . Quotable Quotes for 200 Watson: @jreichhold : One thing working at Twitter teaches me daily is that all scale is relative. What seemed impossible last year is now the daily case. @dannycast0nguay : If you’re concerned about scalability, any algorithm that forces you to run agreement will eventually become your bottleneck.—Werner Vogels @rael : No shortcut ever goes undetected by scale. @sr
2 0.97598445 1329 high scalability-2012-09-26-WordPress.com Serves 70,000 req-sec and over 15 Gbit-sec of Traffic using NGINX
Introduction: This is a guest post by Barry Abrahamson , Chief Systems Wrangler at Automattic, and Nginx's Coufounder Andrew Alexeev. WordPress.com serves more than 33 million sites attracting over 339 million people and 3.4 billion pages each month. Since April 2008, WordPress.com has experienced about 4.4 times growth in page views. WordPress.com VIP hosts many popular sites including CNN’s Political Ticker, NFL, Time Inc’s The Page, People Magazine’s Style Watch, corporate blogs for Flickr and KROQ, and many more. Automattic operates two thousand servers in twelve, globally distributed, data centers. WordPress.com customer data is instantly replicated between different locations to provide an extremely reliable and fast web experience for hundreds of millions of visitors. Problem WordPress.com, which began in 2005, started on shared hosting, much like all of the WordPress.org sites. It was soon moved to a single dedicated server and then to two servers. In late 2005, WordPress.com
3 0.97241366 940 high scalability-2010-11-12-Stuff the Internet Says on Scalability For November 12th, 2010
Introduction: Google – A Study In Scalability And A Little Systems Horse Sense . A nice summary by Krishna Sankar of a version of Jeff Dean's classic talk on Google Scalability given to Stanford's EE380 class . Quotable Quotes: @jkalucki : Getting just 100 servers to work together for the first time is so ridiculously complicated. Horizontal scaling doesn't scale. @simeons : Yahoo's scalability is drivem by lots of asynchronous processing. "You learn to love it." -- @rstata Yahoo's CTO The Economics of the Cloud: Dissecting a Must-Read White Paper by Bernard Golden. I love the depiction of the unseen and unfelt forces that nevertheless organize everything around them: After a brief introduction, the authors lay out a central thesis: despite initial concerns about shortcomings in new technology offerings, "historically, underlying economics have a much stronger impact on the direction and speed of disruptions, as technological challenges are resolved or overcome thro
4 0.97092193 711 high scalability-2009-09-22-How Ravelry Scales to 10 Million Requests Using Rails
Introduction: Tim Bray has a wonderful interview with Casey Forbes , creator of Ravelry, a Ruby on Rails site supporting a 400,000+ strong community of dedicated knitters and crocheters. Casey and his small team have done great things with Ravelry. It is a very focused site that provides a lot of value for users. And users absolutely adore the site. That's obvious from their enthusiastic comments and rocket fast adoption of Ravelry. Ten years ago a site like Ravelry would have been a multi-million dollar operation. Today Casey is the sole engineer for Ravelry and to run it takes only a few people. He was able to code it in 4 months working nights and weekends. Take a look down below of all the technologies used to make Ravelry and you'll see how it is constructed almost completely from free of the shelf software that Casey has stitched together into a complete system. There's an amazing amount of leverage in today's ecosystem when you combine all the quality tools, languages, storage, bandwidth
same-blog 5 0.96966362 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O
Introduction: This is a guest post by Zardosht Kasheff , Software Developer at Tokutek , a storage engine company that delivers 21st-Century capabilities to the leading open source data management platforms. As software developers, we value abstraction. The simpler the API, the more attractive it becomes. Arguably, MongoDB’s greatest strengths are its elegant API and its agility , which let developers simply code. But when MongoDB runs into scalability problems on big data , developers need to peek underneath the covers to understand the underlying issues and how to fix them. Without understanding, one may end up with an inefficient solution that costs time and money. For example, one may shard prematurely, increasing hardware and management costs, when a simpler replication setup would do. Or, one may increase the size of a replica set when upgrading to SSDs would suffice. This article shows how to reason about some big data scalability problems in an effort to find efficient solut
6 0.96960121 1180 high scalability-2012-01-24-The State of NoSQL in 2012
7 0.96816933 853 high scalability-2010-07-08-Cloud AWS Infrastructure vs. Physical Infrastructure
8 0.96524978 1302 high scalability-2012-08-10-Stuff The Internet Says On Scalability For August 10, 2012
9 0.96327972 195 high scalability-2007-12-28-Amazon's EC2: Pay as You Grow Could Cut Your Costs in Half
10 0.9629609 776 high scalability-2010-02-12-Hot Scalability Links for February 12, 2010
11 0.96256852 1344 high scalability-2012-10-19-Stuff The Internet Says On Scalability For October 19, 2012
12 0.96232766 576 high scalability-2009-04-21-What CDN would you recommend?
13 0.96151114 1626 high scalability-2014-04-04-Stuff The Internet Says On Scalability For April 4th, 2014
14 0.96094072 1028 high scalability-2011-04-22-Stuff The Internet Says On Scalability For April 22, 2011
15 0.96062022 1137 high scalability-2011-11-04-Stuff The Internet Says On Scalability For November 4, 2011
16 0.96052283 1189 high scalability-2012-02-07-Hypertable Routs HBase in Performance Test -- HBase Overwhelmed by Garbage Collection
17 0.96028489 1649 high scalability-2014-05-16-Stuff The Internet Says On Scalability For May 16th, 2014
18 0.95999664 1448 high scalability-2013-04-29-AWS v GCE Face-off and Why Innovation Needs Lower Cost Infrastructures
19 0.95996261 301 high scalability-2008-04-08-Google AppEngine - A First Look
20 0.95983535 1093 high scalability-2011-08-05-Stuff The Internet Says On Scalability For August 5, 2011