high_scalability high_scalability-2011 high_scalability-2011-1065 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I recently came across a TPC-C benchmark results held on MySQL based RDS databases. You can see it here . I think the results may bring light to many questions concerning MySQL scalability in general and RDS scalability in particular. (For disclosure, I'm working for ScaleBase where we run an internal scale out TPC-C benchmark these days, and will publish results soon). TPC-C TPC-C is a standard database benchmark, used to measure databases. The database vendors invest big bucks in running this test, and showing off which database is faster, and can scale better. It is a write intensive test, so it doesn’t necessarily reflect the behavior of the database in your application. But it does give some very important insights on what you can expect from your database under heavy load. The Benchmark Process First of all, I have some comments for the benchmark method itself. Generally - the benchmarks were held in an orderly fashion and in a rather methodological way – which i
sentIndex sentText sentNum sentScore
1 I recently came across a TPC-C benchmark results held on MySQL based RDS databases. [sent-1, score-0.641]
2 (For disclosure, I'm working for ScaleBase where we run an internal scale out TPC-C benchmark these days, and will publish results soon). [sent-4, score-0.559]
3 The database vendors invest big bucks in running this test, and showing off which database is faster, and can scale better. [sent-6, score-0.521]
4 The Benchmark Process First of all, I have some comments for the benchmark method itself. [sent-9, score-0.41]
5 The benchmark generator client was tpcc-mysql which is an open-source implementation provided by Percona. [sent-11, score-0.473]
6 The TPC-C benchmark contains 6 types of transactions (“new order”, “delivery”, “stock query”, etc. [sent-14, score-0.495]
7 The benchmark focused on throughput, not latency. [sent-16, score-0.41]
8 That’s usually a small database, and I would be interested in seeing how the benchmark ranks with 1000 warehouses (around 100 GB) or even more. [sent-21, score-0.583]
9 Results Analysis The benchmark results are surprising. [sent-24, score-0.559]
10 With hardly any dependency on the database size, MySQL reaches its optimal throughput at around 64 concurrent users. [sent-25, score-0.76]
11 The sweet spot is around the XL machine, which reaches a throughput of around 7000 tpm. [sent-29, score-0.44]
12 Well we saw that optimal throughput is achieved with around 64 concurrent sessions on the database. [sent-40, score-0.508]
13 While with 1 user the throughput 1,000 transactions per user, with 256 users it drops to 1 transaction per user! [sent-42, score-0.501]
14 For each query, the database needs to parse, optimize, find an execution plan, execute it, and manage transaction logs, transaction isolation and row level locks. [sent-47, score-0.745]
15 A simple update command needs an execution plan to get the qualifying rows to update and then, reading those rows, lock each and every row. [sent-49, score-0.563]
16 Meaning that the user’s result must be the snapshot of the query as it was when the query started. [sent-57, score-0.416]
17 Rather, the database should go and find the “old snapshot” of the row, meaning the way the row looked at the beginning of the query. [sent-59, score-0.464]
18 When sessions or users concurrency goes up, load inside the database engine increases exponentially. [sent-64, score-0.41]
19 There are allot of possible solutions to this problem – adding a caching layer is a must, to decrease the number of database hits, and any other action that can reduce the number of hits on the database (like NoSQL solutions) is welcomed. [sent-69, score-0.721]
20 Instead of 128 concurrent users or even 256 concurrent users (that according to the TPC-C benchmark bring worst results), we’ll have 10 databases with 26 users on each, and each database can reach 64 users (up to 640 concurrent users). [sent-72, score-1.682]
wordName wordTfidf (topN-words)
[('benchmark', 0.41), ('database', 0.227), ('throughput', 0.186), ('rows', 0.174), ('warehouses', 0.173), ('concurrent', 0.173), ('quadruple', 0.167), ('weights', 0.164), ('results', 0.149), ('row', 0.148), ('query', 0.128), ('rds', 0.121), ('transaction', 0.116), ('users', 0.114), ('doubles', 0.109), ('command', 0.105), ('rollback', 0.104), ('reaches', 0.094), ('snapshot', 0.09), ('meaning', 0.089), ('unlikely', 0.089), ('transactions', 0.085), ('rather', 0.083), ('held', 0.082), ('around', 0.08), ('update', 0.079), ('isolation', 0.076), ('disclosure', 0.074), ('orderly', 0.074), ('crowded', 0.074), ('credibility', 0.074), ('methodological', 0.074), ('xl', 0.074), ('although', 0.073), ('mix', 0.07), ('must', 0.07), ('ranged', 0.07), ('databases', 0.07), ('sessions', 0.069), ('updated', 0.069), ('number', 0.068), ('bucks', 0.067), ('thedatabase', 0.067), ('mbs', 0.067), ('hits', 0.066), ('bottleneck', 0.065), ('solutions', 0.065), ('qualifying', 0.064), ('provided', 0.063), ('execution', 0.062)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 1065 high scalability-2011-06-21-Running TPC-C on MySQL-RDS
Introduction: I recently came across a TPC-C benchmark results held on MySQL based RDS databases. You can see it here . I think the results may bring light to many questions concerning MySQL scalability in general and RDS scalability in particular. (For disclosure, I'm working for ScaleBase where we run an internal scale out TPC-C benchmark these days, and will publish results soon). TPC-C TPC-C is a standard database benchmark, used to measure databases. The database vendors invest big bucks in running this test, and showing off which database is faster, and can scale better. It is a write intensive test, so it doesn’t necessarily reflect the behavior of the database in your application. But it does give some very important insights on what you can expect from your database under heavy load. The Benchmark Process First of all, I have some comments for the benchmark method itself. Generally - the benchmarks were held in an orderly fashion and in a rather methodological way – which i
2 0.24372731 986 high scalability-2011-02-10-Database Isolation Levels And Their Effects on Performance and Scalability
Introduction: Some of us are not aware of the tremendous job databases perform, particularly their efforts to maintain the Isolation aspect of ACID. For example, some people believe that transactions are only related to data manipulation and not to queries, which is an incorrect assumption. Transaction Isolation is all about queries, and the consistency and completeness of the data retrieved by queries. This is how it works: Isolation gives the querying user the feeling that he owns the database. It does not matter that hundreds or thousands of concurrent users work with the same database and the same schema (or even the same data). These other uses can generate new data, modify existing data or perform any other action. The querying user must be able to get a complete, consistent picture of the data, unaffected by other users’ actions. Let’s take the following scenario, which is based on an Orders table that has 1,000,000 rows, with a disk size of 20 GB: 8:00: UserA started a query “SELECT
3 0.17672785 465 high scalability-2008-12-14-Scaling MySQL on a 256-way T5440 server using Solaris ZFS and Java 1.7
Introduction: How to scale MySQL on a 32 core system with 256 threads? Diagonal scalability in a box. An impressive benchmark that achieved more than 79,000 SQL queries per second on a single 4 RU server! Is this real? If so what is the role of good old horizontal scalability? The goals of the benchmark: Reach a high throughput of SQL queries on a 256-way Sun SPARC Enterprise T5440 Do it 21st century style i.e. with MySQL and ZFS , not 20th century style i.e with OraSybInf... and VxFS Do it with minimal tuning i.e as close as possible as out-of-the-box
4 0.16290081 623 high scalability-2009-06-10-Dealing with multi-partition transactions in a distributed KV solution
Introduction: I've been getting asked about this a lot lately so I figured I'd just blog about it. Products like WebSphere eXtreme Scale work by taking a dataset, partitioning it using a key and then assigning those partitions to a number of JVMs. Each partition usually has a primary and a replica. These 'shards' are assigned to JVMs. A transactional application typically interacts with the data on a single partition at a time. This means the transaction is executed in a single JVM. A server box will be able to do M of those transactions per second and it scales because N boxes does MN (M multiplied by N) transactions per second. Increase N, you get more transactions per second. Availability is very good because a transaction only depends on 1 of the N servers that are currently online. Any of the other (N-1) servers can go down or fail with no impact on the transaction. So, single partition transactions can scale indefinitely from a throughput point of view, offer very consistent response times and
5 0.15186086 1529 high scalability-2013-10-08-F1 and Spanner Holistically Compared
Introduction: This aricle, F1: A Distributed SQL Database That Scales by Srihari Srinivasan , is republished with permission from a blog you really should follow: Systems We Make - Curating Complex Distributed Systems. With both the F1 and Spanner papers out its now possible to understand their interplay a bit holistically. So lets start by revisiting the key goals of both systems. Key Goals of F1′s design System must be able to scale up by adding resources Ability to re-shard and rebalance data without application changes ACID consistency for transactions Full SQL support, support for indexes Spanner’s objectives Main focus is on managing cross data center replicated data Ability to re-shard and rebalance data Automatically migrates data across machines F1 – An overview F1 is built on top of Spanner. Spanner offers support for for features such as – strong consistency through distributed transactions (2PC), global ordering based on timestam
7 0.1433481 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
8 0.13864812 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O
9 0.13653816 961 high scalability-2010-12-21-SQL + NoSQL = Yes !
10 0.13371271 1233 high scalability-2012-04-25-The Anatomy of Search Technology: blekko’s NoSQL database
11 0.13042358 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
12 0.12944852 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
13 0.12879309 936 high scalability-2010-11-09-Facebook Uses Non-Stored Procedures to Update Social Graphs
14 0.12809142 589 high scalability-2009-05-05-Drop ACID and Think About Data
17 0.12656012 1508 high scalability-2013-08-28-Sean Hull's 20 Biggest Bottlenecks that Reduce and Slow Down Scalability
18 0.12519681 304 high scalability-2008-04-19-How to build a real-time analytics system?
19 0.12408769 1459 high scalability-2013-05-16-Paper: Warp: Multi-Key Transactions for Key-Value Stores
20 0.12303169 831 high scalability-2010-05-26-End-To-End Performance Study of Cloud Services
topicId topicWeight
[(0, 0.232), (1, 0.122), (2, -0.075), (3, -0.059), (4, 0.024), (5, 0.135), (6, -0.013), (7, -0.051), (8, -0.021), (9, -0.084), (10, 0.048), (11, -0.033), (12, 0.016), (13, 0.036), (14, 0.043), (15, -0.011), (16, -0.059), (17, 0.006), (18, 0.033), (19, -0.011), (20, 0.042), (21, -0.046), (22, 0.019), (23, 0.03), (24, 0.012), (25, -0.029), (26, -0.095), (27, -0.018), (28, 0.07), (29, -0.008), (30, 0.026), (31, 0.032), (32, -0.045), (33, 0.058), (34, -0.013), (35, 0.022), (36, 0.05), (37, -0.021), (38, -0.005), (39, -0.013), (40, -0.014), (41, -0.052), (42, -0.0), (43, -0.072), (44, -0.053), (45, 0.041), (46, 0.028), (47, 0.007), (48, 0.036), (49, 0.03)]
simIndex simValue blogId blogTitle
same-blog 1 0.96536952 1065 high scalability-2011-06-21-Running TPC-C on MySQL-RDS
Introduction: I recently came across a TPC-C benchmark results held on MySQL based RDS databases. You can see it here . I think the results may bring light to many questions concerning MySQL scalability in general and RDS scalability in particular. (For disclosure, I'm working for ScaleBase where we run an internal scale out TPC-C benchmark these days, and will publish results soon). TPC-C TPC-C is a standard database benchmark, used to measure databases. The database vendors invest big bucks in running this test, and showing off which database is faster, and can scale better. It is a write intensive test, so it doesn’t necessarily reflect the behavior of the database in your application. But it does give some very important insights on what you can expect from your database under heavy load. The Benchmark Process First of all, I have some comments for the benchmark method itself. Generally - the benchmarks were held in an orderly fashion and in a rather methodological way – which i
2 0.84854871 986 high scalability-2011-02-10-Database Isolation Levels And Their Effects on Performance and Scalability
Introduction: Some of us are not aware of the tremendous job databases perform, particularly their efforts to maintain the Isolation aspect of ACID. For example, some people believe that transactions are only related to data manipulation and not to queries, which is an incorrect assumption. Transaction Isolation is all about queries, and the consistency and completeness of the data retrieved by queries. This is how it works: Isolation gives the querying user the feeling that he owns the database. It does not matter that hundreds or thousands of concurrent users work with the same database and the same schema (or even the same data). These other uses can generate new data, modify existing data or perform any other action. The querying user must be able to get a complete, consistent picture of the data, unaffected by other users’ actions. Let’s take the following scenario, which is based on an Orders table that has 1,000,000 rows, with a disk size of 20 GB: 8:00: UserA started a query “SELECT
3 0.83769923 1304 high scalability-2012-08-14-MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) and Familiar (SQL)
Introduction: This is an interview with MemSQL cofounder’s Eric Frenkiel and Nikita Shamgunov , in which they try to answer critics by going into more depth about their technology. MemSQL ruffled a few feathers with their claim of being the fastest database in the world. According to their benchmarks MemSQL can execute 200K TPS on an EC2 Quadruple Extra Large and on a 64 core machine they can push 1.2 million transactions a second. Benchmarks are always a dark mirror, so make of them what you will, but the target market for MemSQL is clear: projects looking for something both fast and familiar. Fast as in a novel design using a combination of technologies like MVCC , code generation, lock-free data structures , skip lists , and in-memory execution . Familiar as in SQL and nothing but SQL. The only interface to MemSQL is SQL. It’s right to point out MemSQL gets a boost by being a first release. Only a limited subset of SQL is supported, neither rep
Introduction: What do you get when you take a SQL database and start a new implementation from scratch, taking advantage of the latest research and modern hardware? Mike Stonebraker , the sword wielding Johnny Appleseed of the database world, hopes you get something like his new database, VoltDB : a pure SQL, pure ACID, pure OLTP, shared nothing, sharded, scalable, lockless, open source, in-memory DBMS, purpose-built for running hundreds of thousands of transactions a second. VoltDB claims to be 100 times faster than MySQL, up to 13 times faster than Cassandra , and 45 times faster than Oracle, with near-linear scaling. Will VoltDB kill off the new NoSQL upstarts? Will VoltDB cause a mass extinction of ancient databases? Probably no and no to both questions, but it's a product with a definite point-of-view and is worth a look as the transaction component in your system. But will it be right for you? Let's see... I first heard the details about VoltDB at Gluecon , where Mr. Stonebraker pres
5 0.75873619 1233 high scalability-2012-04-25-The Anatomy of Search Technology: blekko’s NoSQL database
Introduction: This is a guest post ( part 2 , part 3 ) by Greg Lindahl, CTO of blekko, the spam free search engine that had over 3.5 million unique visitors in March. Greg Lindahl was Founder and Distinguished Engineer at PathScale, at which he was the architect of the InfiniPath low-latency InfiniBand HCA, used to build tightly-coupled supercomputing clusters. Imagine that you're crazy enough to think about building a search engine. It's a huge task: the minimum index size needed to answer most queries is a few billion webpages. Crawling and indexing a few billion webpages requires a cluster with several petabytes of usable disk -- that's several thousand 1 terabyte disks -- and produces an index that's about 100 terabytes in size. Serving query results quickly involves having most of the index in RAM or on solid state (flash) disk. If you can buy a server with 100 gigabytes of RAM for about $3,000, that's 1,000 servers at a capital cost of $3 million, plus about $1 million per year of serve
6 0.75106663 623 high scalability-2009-06-10-Dealing with multi-partition transactions in a distributed KV solution
7 0.74679816 1281 high scalability-2012-07-11-FictionPress: Publishing 6 Million Works of Fiction on the Web
8 0.74461013 281 high scalability-2008-03-18-Database Design 101
9 0.73847985 222 high scalability-2008-01-25-Application Database and DAL Architecture
10 0.73816144 306 high scalability-2008-04-21-The Search for the Source of Data - How SimpleDB Differs from a RDBMS
11 0.72935748 1032 high scalability-2011-05-02-Stack Overflow Makes Slow Pages 100x Faster by Simple SQL Tuning
12 0.7269206 961 high scalability-2010-12-21-SQL + NoSQL = Yes !
13 0.72674114 1205 high scalability-2012-03-07-Scale Indefinitely on S3 With These Secrets of the S3 Masters
14 0.718458 1529 high scalability-2013-10-08-F1 and Spanner Holistically Compared
15 0.71567696 1096 high scalability-2011-08-10-LevelDB - Fast and Lightweight Key-Value Database From the Authors of MapReduce and BigTable
16 0.7099368 1364 high scalability-2012-11-29-Performance data for LevelDB, Berkley DB and BangDB for Random Operations
17 0.70937622 587 high scalability-2009-05-01-FastBit: An Efficient Compressed Bitmap Index Technology
18 0.70874792 351 high scalability-2008-07-16-The Mother of All Database Normalization Debates on Coding Horror
19 0.70784473 1508 high scalability-2013-08-28-Sean Hull's 20 Biggest Bottlenecks that Reduce and Slow Down Scalability
20 0.70484787 1553 high scalability-2013-11-25-How To Make an Infinitely Scalable Relational Database Management System (RDBMS)
topicId topicWeight
[(1, 0.132), (2, 0.283), (7, 0.118), (10, 0.05), (30, 0.031), (47, 0.041), (61, 0.069), (79, 0.117), (85, 0.038), (94, 0.045)]
simIndex simValue blogId blogTitle
1 0.96453887 1397 high scalability-2013-02-01-Stuff The Internet Says On Scalability For February 1, 2013
Introduction: Hey, it's HighScalability time: 400 milliseconds : modern private wire for Natural Gas ; 620,000: size of a mobile phone botnet ; 1 million cores: jet noise modeling supercomputer ; 2.2 petabytes per gram: DNA with storage density ; 1 billion: Google's one quarter infrastructure spend Quotable Quotes: @StartupLJackson : "Good thing we spent 6mo on scalability pre-launch. The thing went hyper-viral day one, just as planned." - nobody ever. @dysinger : WELCOME TO DEVOPS ADVENTURE! YOU ARE STANDING INSIDE AWS. NEARBY IS AN ANGRY ELB. THERE ARE SOME SSH KEYS ON THE GROUND. ~@levie : Well cloud computing was fun while it lasted; now we can just store all our data in DNA strands. @mtnygard : Debates about “Tech X > tech T” aren’t religious wars. They’re also economic wars. @alfonsol : I do not believe your per load number of socks requires scalability. @mikeolson : Latency breeds contempt. @DrQz : GitHub looks to me like pow
Introduction: For years a war has been fought in the software architecture trenches between the ideal of decentralized services and the power and practicality of centralized services. Centralized architectures, at least at the management and control plane level, are winning. And Google not only agrees, they are enthusiastic adopters of this model, even in places you don't think it should work. Here's an excerpt from Google Lifts Veil On “Andromeda” Virtual Networking , an excellent article by Timothy Morgan, that includes a money quote from Amin Vahdat , distinguished engineer and technical lead for networking at Google: Like many of the massive services that Google has created, the Andromeda network has centralized control. By the way, so did the Google File System and the MapReduce scheduler that gave rise to Hadoop when it was mimicked, so did the BigTable NoSQL data store that has spawned a number of quasi-clones, and even the B4 WAN and the Spanner distributed file system that have yet
3 0.96105456 68 high scalability-2007-08-20-TypePad Architecture
Introduction: TypePad is considered the largest paid blogging service in the world. After experience problems because of their meteoric growth, they eventually transitioned to an architecture patterned after their sister company, LiveJournal. Site: http://www.typepad.com/ The Platform MySQL Memcached Perl MogileFS Apache Linux The Stats As of 2005 TypePad sends 250mbps of traffic using multiple network pipes for 3TB of traffic a day. They were growing by 10-20% each month. I was unable to find more recent statistics. The Architecture Original Architecture: - Single server running Linux, Apache, Postgres, Perl, mod_perl - Storage was NFS on a filer. A Devastating Crash Caused a New Direction - A RAID controller failed and spewed data across all RAID disks. - The database was corrupted and the backups were corrupted. - Their redundant filers suffered from "split brain" syndrome. They move to LiveJournal Architecture type architecture which isn't surprising
4 0.95866716 964 high scalability-2010-12-28-Netflix: Continually Test by Failing Servers with Chaos Monkey
Introduction: In 5 Lessons We’ve Learned Using AWS , Netflix's John Ciancutti says the best way to avoid failure is to fail constantly . In the cloud it's expected instances can fail at any time, so you always have to be prepared. In the real world we prepare by running drills. Remember all those exciting fire drills? It's not just fire drills of course. The military, football teams, fire fighters, beach rescue, virtually any entity that must react quickly and efficiently to disaster hones their responsiveness by running drills. Netflix aggressively moves this strategy into the cloud by randomly failing servers using a tool they built called Chaos Monkey. The idea is: If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage. They respond to failures by degrading service , but they always respond: If the recommendations system is down they'll show popular titles instead. If the s
same-blog 5 0.95631593 1065 high scalability-2011-06-21-Running TPC-C on MySQL-RDS
Introduction: I recently came across a TPC-C benchmark results held on MySQL based RDS databases. You can see it here . I think the results may bring light to many questions concerning MySQL scalability in general and RDS scalability in particular. (For disclosure, I'm working for ScaleBase where we run an internal scale out TPC-C benchmark these days, and will publish results soon). TPC-C TPC-C is a standard database benchmark, used to measure databases. The database vendors invest big bucks in running this test, and showing off which database is faster, and can scale better. It is a write intensive test, so it doesn’t necessarily reflect the behavior of the database in your application. But it does give some very important insights on what you can expect from your database under heavy load. The Benchmark Process First of all, I have some comments for the benchmark method itself. Generally - the benchmarks were held in an orderly fashion and in a rather methodological way – which i
6 0.94263268 274 high scalability-2008-03-12-YouTube Architecture
8 0.94101709 1345 high scalability-2012-10-22-Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale
9 0.93972468 936 high scalability-2010-11-09-Facebook Uses Non-Stored Procedures to Update Social Graphs
10 0.93937719 661 high scalability-2009-07-25-Latency is Everywhere and it Costs You Sales - How to Crush it
11 0.93771023 350 high scalability-2008-07-15-ZooKeeper - A Reliable, Scalable Distributed Coordination System
12 0.93689245 849 high scalability-2010-06-28-VoltDB Decapitates Six SQL Urban Myths and Delivers Internet Scale OLTP in the Process
13 0.93645501 1245 high scalability-2012-05-14-DynamoDB Talk Notes and the SSD Hot S3 Cold Pattern
14 0.93628573 1509 high scalability-2013-08-30-Stuff The Internet Says On Scalability For August 30, 2013
15 0.93624783 825 high scalability-2010-05-10-Sify.com Architecture - A Portal at 3900 Requests Per Second
16 0.9358207 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
17 0.93538034 512 high scalability-2009-02-14-Scaling Digg and Other Web Applications
18 0.9349218 1077 high scalability-2011-07-11-ATMCash Exploits Virtualization for Security - Immutability and Reversion
19 0.93485606 1209 high scalability-2012-03-14-The Azure Outage: Time Is a SPOF, Leap Day Doubly So
20 0.93449581 1215 high scalability-2012-03-26-7 Years of YouTube Scalability Lessons in 30 Minutes