high_scalability high_scalability-2009 high_scalability-2009-705 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: We've seen a lot of NoSQL action lately built around distributed hash tables. Btrees are getting jealous. Btrees, once the king of the database world, want their throne back. Paul Buchheit surfaced a paper: A practical scalable distributed B-tree by Marcos K. Aguilera and Wojciech Golab, that might help spark a revolution. From the Abstract: We propose a new algorithm for a practical, fault tolerant, and scalable B-tree distributed over a set of servers. Our algorithm supports practical features not present in prior work: transactions that allow atomic execution of multiple operations over multiple B-trees, online migration of B-tree nodes between servers, and dynamic addition and removal of servers. Moreover, our algorithm is conceptually simple: we use transactions to manipulate B-tree nodes so that clients need not use complicated concurrency and locking protocols used in prior work. To execute these transactions quickly, we rely on three techniques: (1) We use optimistic
sentIndex sentText sentNum sentScore
1 We've seen a lot of NoSQL action lately built around distributed hash tables. [sent-1, score-0.389]
2 Paul Buchheit surfaced a paper: A practical scalable distributed B-tree by Marcos K. [sent-4, score-0.267]
3 From the Abstract: We propose a new algorithm for a practical, fault tolerant, and scalable B-tree distributed over a set of servers. [sent-6, score-0.343]
4 Our algorithm supports practical features not present in prior work: transactions that allow atomic execution of multiple operations over multiple B-trees, online migration of B-tree nodes between servers, and dynamic addition and removal of servers. [sent-7, score-1.061]
5 Moreover, our algorithm is conceptually simple: we use transactions to manipulate B-tree nodes so that clients need not use complicated concurrency and locking protocols used in prior work. [sent-8, score-1.169]
6 To execute these transactions quickly, we rely on three techniques: (1) We use optimistic concurrency control, so that B-tree nodes are not locked during transaction execution, only during commit. [sent-9, score-0.558]
7 These replicas are lazy, and hence lightweight, and they are very helpful to reduce client-server communication while traversing the B-tree. [sent-12, score-0.097]
8 (3)We replicate version numbers of inner nodes across servers, so that clients can validate their transactions efficiently, without creating bottlenecks at the root node and other upper levels in the tree. [sent-13, score-0.988]
9 Distributed hash tables are scalable because records area easily distributed across a cluster which gives the golden ability to perform many writes in parallel. [sent-14, score-0.649]
10 A lot of the time you want to iterate through records or search records in a sorted order. [sent-16, score-0.752]
11 Sorted could mean time stamp order, for example, or last name order as another example. [sent-17, score-0.197]
12 Access to data in sorted order is what btrees are for. [sent-18, score-0.661]
13 But we simply haven't seen distributed btree systems develop. [sent-19, score-0.295]
14 Instead, you would have to use some sort of map-reduce mechanism to efficiently scan all the records or you would have to maintain the information in some other way. [sent-20, score-0.542]
15 This paper points the way to do some really cool things at a system level: It's distributed so it can scale dynamically in size and handle writes in parallel. [sent-21, score-0.298]
16 It supports adding and dropping servers dynamically, which is an essential requirement for architectures based on elastic cloud infrastructures. [sent-22, score-0.354]
17 Data can be migrated to other nodes, which is essential for maintenance. [sent-23, score-0.273]
18 Multiple records can be involved in transactions which is essential for the complex data manipulations that happen in real systems. [sent-24, score-0.717]
19 This is accomplished via a version number mechanism that looks something like MVCC. [sent-25, score-0.264]
20 Optimistic concurrency, that is, the ability to change data without explicit locking, makes the job for programmers a lot easier. [sent-26, score-0.151]
wordName wordTfidf (topN-words)
[('btrees', 0.343), ('records', 0.268), ('sorted', 0.216), ('nodes', 0.203), ('essential', 0.188), ('inner', 0.162), ('practical', 0.156), ('algorithm', 0.153), ('transactions', 0.139), ('concurrency', 0.138), ('prior', 0.138), ('replicate', 0.124), ('manipulations', 0.122), ('locking', 0.121), ('distributed', 0.111), ('hash', 0.111), ('dynamically', 0.111), ('mechanism', 0.109), ('order', 0.102), ('btree', 0.102), ('keyed', 0.102), ('execution', 0.101), ('validate', 0.097), ('traversing', 0.097), ('upper', 0.095), ('stamp', 0.095), ('manipulate', 0.093), ('conceptually', 0.093), ('clients', 0.091), ('richer', 0.091), ('efficiently', 0.089), ('supports', 0.088), ('lazy', 0.087), ('migrated', 0.085), ('lately', 0.085), ('golden', 0.084), ('spark', 0.084), ('removal', 0.083), ('seen', 0.082), ('king', 0.08), ('moreover', 0.08), ('propose', 0.079), ('accomplished', 0.078), ('locked', 0.078), ('dropping', 0.078), ('version', 0.077), ('explicit', 0.076), ('scan', 0.076), ('paper', 0.076), ('ability', 0.075)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 705 high scalability-2009-09-16-Paper: A practical scalable distributed B-tree
Introduction: We've seen a lot of NoSQL action lately built around distributed hash tables. Btrees are getting jealous. Btrees, once the king of the database world, want their throne back. Paul Buchheit surfaced a paper: A practical scalable distributed B-tree by Marcos K. Aguilera and Wojciech Golab, that might help spark a revolution. From the Abstract: We propose a new algorithm for a practical, fault tolerant, and scalable B-tree distributed over a set of servers. Our algorithm supports practical features not present in prior work: transactions that allow atomic execution of multiple operations over multiple B-trees, online migration of B-tree nodes between servers, and dynamic addition and removal of servers. Moreover, our algorithm is conceptually simple: we use transactions to manipulate B-tree nodes so that clients need not use complicated concurrency and locking protocols used in prior work. To execute these transactions quickly, we rely on three techniques: (1) We use optimistic
2 0.1314777 676 high scalability-2009-08-08-Yahoo!'s PNUTS Database: Too Hot, Too Cold or Just Right?
Introduction: So far every massively scalable database is a bundle of compromises. For some the weak guarantees of Amazon's eventual consistency model are too cold. For many the strong guarantees of standard RDBMS distributed transactions are too hot. Google App Engine tries to get it just right with entity groups . Yahoo! is also trying to get is just right by offering per-record timeline consistency, which hopes to serve up a heaping bowl of rich database functionality and low latency at massive scale : We describe PNUTS [Platform for Nimble Universal Table Storage], a massively parallel and geographically distributed database system for Yahoo!’s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of con-current requests including updates and queries, and novel per-record consistency guarantees. It is a hosted, centrally managed, and geographically distributed service, and utilizes automated load-balancing and failover to redu
Introduction: When building a system on top of a set of wildly uncooperative and unruly computers you have knowledge problems: knowing when other nodes are dead; knowing when nodes become alive; getting information about other nodes so you can make local decisions, like knowing which node should handle a request based on a scheme for assigning nodes to a certain range of users; learning about new configuration data; agreeing on data values; and so on. How do you solve these problems? A common centralized approach is to use a database and all nodes query it for information. Obvious availability and performance issues for large distributed clusters. Another approach is to use Paxos , a protocol for solving consensus in a network to maintain strict consistency requirements for small groups of unreliable processes. Not practical when larger number of nodes are involved. So what's the super cool decentralized way to bring order to large clusters? Gossip protocols , which maintain relaxed consi
4 0.12390625 890 high scalability-2010-09-01-Paper: The Case for Determinism in Database Systems
Introduction: Can you have your ACID cake and eat your distributed database too? Yes explains Daniel Abadi, Assistant Professor of Computer Science at Yale University, in an epic post, The problems with ACID, and how to fix them without going NoSQL , coauthored with Alexander Thomson , on their paper The Case for Determinism in Database Systems . We've already seen VoltDB offer the best of both worlds, this sounds like a completely different approach. The solution, they propose, is: ...an architecture and execution model that avoids deadlock, copes with failures without aborting transactions, and achieves high concurrency. The paper contains full details, but the basic idea is to use ordered locking coupled with optimistic lock location prediction, while exploiting deterministic systems' nice replication properties in the case of failures. The problem they are trying to solve is: In our opinion, the NoSQL decision to give up on ACID is the lazy solution to these scala
5 0.11669442 448 high scalability-2008-11-22-Google Architecture
Introduction: Update 2: Sorting 1 PB with MapReduce . PB is not peanut-butter-and-jelly misspelled. It's 1 petabyte or 1000 terabytes or 1,000,000 gigabytes. It took six hours and two minutes to sort 1PB (10 trillion 100-byte records) on 4,000 computers and the results were replicated thrice on 48,000 disks. Update: Greg Linden points to a new Google article MapReduce: simplified data processing on large clusters . Some interesting stats: 100k MapReduce jobs are executed each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory. Google is the King of scalability. Everyone knows Google for their large, sophisticated, and fast searching, but they don't just shine in search. Their platform approach to building scalable applications allows them to roll out internet scale applications at an alarmingly high competition crushing rate. Their goal is always to build
6 0.11629534 1345 high scalability-2012-10-22-Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale
7 0.11300273 1258 high scalability-2012-06-05-Thesis: Concurrent Programming for Scalable Web Architectures
9 0.11151548 589 high scalability-2009-05-05-Drop ACID and Think About Data
10 0.11044636 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
11 0.10982652 1629 high scalability-2014-04-10-Paper: Scalable Atomic Visibility with RAMP Transactions - Scale Linearly to 100 Servers
12 0.10982125 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
13 0.1094235 1305 high scalability-2012-08-16-Paper: A Provably Correct Scalable Concurrent Skip List
14 0.10783742 1299 high scalability-2012-08-06-Paper: High-Performance Concurrency Control Mechanisms for Main-Memory Databases
15 0.10765397 1017 high scalability-2011-04-06-Netflix: Run Consistency Checkers All the time to Fixup Transactions
16 0.10650306 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud
18 0.10464405 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
19 0.10293483 1302 high scalability-2012-08-10-Stuff The Internet Says On Scalability For August 10, 2012
20 0.10241427 666 high scalability-2009-07-30-Learn How to Think at Scale
topicId topicWeight
[(0, 0.18), (1, 0.09), (2, 0.001), (3, 0.043), (4, -0.015), (5, 0.104), (6, 0.048), (7, -0.008), (8, -0.055), (9, 0.002), (10, 0.028), (11, 0.047), (12, -0.065), (13, -0.05), (14, 0.061), (15, 0.01), (16, -0.01), (17, 0.014), (18, 0.027), (19, -0.026), (20, 0.036), (21, 0.033), (22, -0.051), (23, 0.044), (24, -0.071), (25, -0.059), (26, 0.04), (27, 0.05), (28, 0.039), (29, -0.028), (30, 0.025), (31, 0.013), (32, -0.085), (33, -0.005), (34, -0.031), (35, -0.038), (36, -0.002), (37, -0.03), (38, 0.002), (39, 0.011), (40, -0.039), (41, -0.038), (42, -0.011), (43, 0.004), (44, -0.037), (45, 0.054), (46, -0.004), (47, 0.015), (48, 0.048), (49, 0.028)]
simIndex simValue blogId blogTitle
same-blog 1 0.96435589 705 high scalability-2009-09-16-Paper: A practical scalable distributed B-tree
Introduction: We've seen a lot of NoSQL action lately built around distributed hash tables. Btrees are getting jealous. Btrees, once the king of the database world, want their throne back. Paul Buchheit surfaced a paper: A practical scalable distributed B-tree by Marcos K. Aguilera and Wojciech Golab, that might help spark a revolution. From the Abstract: We propose a new algorithm for a practical, fault tolerant, and scalable B-tree distributed over a set of servers. Our algorithm supports practical features not present in prior work: transactions that allow atomic execution of multiple operations over multiple B-trees, online migration of B-tree nodes between servers, and dynamic addition and removal of servers. Moreover, our algorithm is conceptually simple: we use transactions to manipulate B-tree nodes so that clients need not use complicated concurrency and locking protocols used in prior work. To execute these transactions quickly, we rely on three techniques: (1) We use optimistic
2 0.83664072 1463 high scalability-2013-05-23-Paper: Calvin: Fast Distributed Transactions for Partitioned Database Systems
Introduction: Distributed transactions are costly because they use agreement protocols . Calvin says, surprisingly, that using a deterministic database allows you to avoid the use of agreement protocols. The approach is to use a deterministic transaction layer that does all the hard work before acquiring locks and the beginning of transaction execution. Overview: Many distributed storage systems achieve high data access throughput via partitioning and replication, each system with its own advantages and tradeoffs. In order to achieve high scalability, however, today’s systems generally reduce transactional support, disallowing single transactions from spanning multiple partitions. Calvin is a practical transaction scheduling and data replication layer that uses a deterministic ordering guarantee to significantly reduce the normally prohibitive contention costs associated with distributed transactions. Unlike previous deterministic database system prototypes, Calvin supports disk-based storage
3 0.8306331 890 high scalability-2010-09-01-Paper: The Case for Determinism in Database Systems
Introduction: Can you have your ACID cake and eat your distributed database too? Yes explains Daniel Abadi, Assistant Professor of Computer Science at Yale University, in an epic post, The problems with ACID, and how to fix them without going NoSQL , coauthored with Alexander Thomson , on their paper The Case for Determinism in Database Systems . We've already seen VoltDB offer the best of both worlds, this sounds like a completely different approach. The solution, they propose, is: ...an architecture and execution model that avoids deadlock, copes with failures without aborting transactions, and achieves high concurrency. The paper contains full details, but the basic idea is to use ordered locking coupled with optimistic lock location prediction, while exploiting deterministic systems' nice replication properties in the case of failures. The problem they are trying to solve is: In our opinion, the NoSQL decision to give up on ACID is the lazy solution to these scala
4 0.82070196 1299 high scalability-2012-08-06-Paper: High-Performance Concurrency Control Mechanisms for Main-Memory Databases
Introduction: If you stayed up all night watching the life reaffirming Curiosity landing on Mars , then this paper, High-Performance Concurrency Control Mechanisms for Main-Memory Databases , has nothing to do with that at all, but it is an excellent look at how to use optimistic MVCC schemes to reduce lock overhead on in-memory datastructures: A database system optimized for in-memory storage can support much higher transaction rates than current systems. However, standard concurrency control methods used today do not scale to the high transaction rates achievable by such systems. In this paper we introduce two efficient concurrency control methods specifically designed for main-memory databases. Both use multiversioning to isolate read-only transactions from updates but differ in how atomicity is ensured: one is optimistic and one is pessimistic. To avoid expensive context switching, transactions never block during normal processing but they may have to wait before commit to ensure corr
Introduction: We are not yet at the End of History for database theory as Peter Bailis and the Database Group at UC Berkeley continue to prove with a great companion blog post to their new paper: Scalable Atomic Visibility with RAMP Transactions . I like the approach of pairing a blog post with a paper. A paper almost by definition is formal, but a blog post can help put a paper in context and give it some heart. From the abstract: Databases can provide scalability by partitioning data across several servers. However, multi-partition, multi-operation transactional access is often expensive, employing coordination-intensive locking, validation, or scheduling mechanisms. Accordingly, many real-world systems avoid mechanisms that provide useful semantics for multi-partition operations . This leads to incorrect behavior for a large class of applications including secondary indexing, foreign key enforcement, and materialized view maintenance . In this work, we identify a new isolation mode
6 0.81191081 963 high scalability-2010-12-23-Paper: CRDTs: Consistency without concurrency control
7 0.81110376 529 high scalability-2009-03-10-Paper: Consensus Protocols: Paxos
9 0.80188495 676 high scalability-2009-08-08-Yahoo!'s PNUTS Database: Too Hot, Too Cold or Just Right?
10 0.80035919 983 high scalability-2011-02-02-Piccolo - Building Distributed Programs that are 11x Faster than Hadoop
11 0.78874594 1459 high scalability-2013-05-16-Paper: Warp: Multi-Key Transactions for Key-Value Stores
12 0.78286934 507 high scalability-2009-02-03-Paper: Optimistic Replication
13 0.77623391 1611 high scalability-2014-03-12-Paper: Scalable Eventually Consistent Counters over Unreliable Networks
14 0.77532578 844 high scalability-2010-06-18-Paper: The Declarative Imperative: Experiences and Conjectures in Distributed Logic
15 0.77415925 357 high scalability-2008-07-26-Google's Paxos Made Live – An Engineering Perspective
17 0.76733166 1146 high scalability-2011-11-23-Paper: Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS
18 0.76383293 756 high scalability-2009-12-30-Terrastore - Scalable, elastic, consistent document store.
19 0.75865525 1273 high scalability-2012-06-27-Paper: Logic and Lattices for Distributed Programming
20 0.74177289 510 high scalability-2009-02-09-Paper: Consensus Protocols: Two-Phase Commit
topicId topicWeight
[(1, 0.166), (2, 0.195), (10, 0.013), (27, 0.213), (61, 0.077), (79, 0.123), (85, 0.068), (94, 0.066)]
simIndex simValue blogId blogTitle
same-blog 1 0.91579413 705 high scalability-2009-09-16-Paper: A practical scalable distributed B-tree
Introduction: We've seen a lot of NoSQL action lately built around distributed hash tables. Btrees are getting jealous. Btrees, once the king of the database world, want their throne back. Paul Buchheit surfaced a paper: A practical scalable distributed B-tree by Marcos K. Aguilera and Wojciech Golab, that might help spark a revolution. From the Abstract: We propose a new algorithm for a practical, fault tolerant, and scalable B-tree distributed over a set of servers. Our algorithm supports practical features not present in prior work: transactions that allow atomic execution of multiple operations over multiple B-trees, online migration of B-tree nodes between servers, and dynamic addition and removal of servers. Moreover, our algorithm is conceptually simple: we use transactions to manipulate B-tree nodes so that clients need not use complicated concurrency and locking protocols used in prior work. To execute these transactions quickly, we rely on three techniques: (1) We use optimistic
2 0.91052902 1097 high scalability-2011-08-12-Stuff The Internet Says On Scalability For August 12, 2011
Introduction: Submitted for your scaling pleasure, you may not scale often, but when you scale, please drink us: Quotably quotable quotes: @mardix : There is no single point of truth in #NoSQL . #Consistency is no longer global, it's relative to the one accessing it. #Scalability @kekline : RT @CurtMonash: "...from industry figures, Basho/Riak is our third-biggest competitor." How often do you encounter them? "Never have" #nosql @dave_jacobs : Love being in a city where I can overhear a convo about Heroku scalability while doing deadlifts. #ahsanfrancisco @satheeshilu : Doctor at #hospital in india says #ge #healthcare software is slow to handle 100K X-rays an year.Scalability is critical 4 Indian #software @sufw : How can it be possible that Tagged has 80m users and I have *never* heard of it!?! @EventCloudPro : One of my vacation realizations? Whole #bigdata thing has turned into a lotta #bighype - many distinct issues & nothing to do w/ #bigdata No
3 0.9071514 555 high scalability-2009-04-04-Performance Anti-Pattern
Introduction: Want your apps to run faster? Here’s what not to do. By: Bart Smaalders, Sun Microsystems. Performance Anti-Patterns: - Fixing Performance at the End of the Project - Measuring and Comparing the Wrong Things - Algorithmic Antipathy - Reusing Software - Iterating Because That’s What Computers Do Well - Premature Optimization - Focusing on What You Can See Rather Than on the Problem - Software Layering - Excessive Numbers of Threads - Asymmetric Hardware Utilization - Not Optimizing for the Common Case - Needless Swapping of Cache Lines Between CPUs For more detail go there
Introduction: This is guest post by Michael DeHaan (@laserllama), a software developer and architect, on Ansible , a simple deployment, model-driven configuration management, and command execution framework. I owe High Scalability a great deal of credit for the idea behind my latest software project. I was reading about how an older tool I helped create, Func, was used at Tumblr , and it kicked some ideas into gear. This article is about what happened from that idea. My observation, which the article reinforced, was that many shops end up using a configuration management tool (Puppet, Chef, cfengine), a separate deployment tool (Capistrano, Fabric) and yet another separate ad-hoc task execution tool (Func, pssh, etc) because one class of tool historically hasn't been good at all three jobs. My other observation (not from the article) was that the whole "infrastructure as code" movement, while revolutionary, and definitely great for many, was probably secretly grating on a good number of
5 0.90056378 1141 high scalability-2011-11-11-Stuff The Internet Says On Scalability For November 11, 2011
Introduction: You got performance in my scalability! You got scalability in my performance! Two great tastes that taste great together: Quotable quotes: @jasoncbooth : Tired of the term #nosql. I would like to coin NRDS (pronounced "nerds"), standing for Non Relational Data Store. @zenfeed : One lesson I learn about scalability, is that it has a LOT to do with simplicity and consistency. Ray Walters : Quad-core chips in mobile phones is nothing but a marketing snow job Flickr: Real-time Updates on the Cheap for Fun and Profit . How Flickr added real-time push feed on the cheap. Events happen all over Flickr, uploads and updates (around 100/s depending on the time of day), all of them inserting tasks. Implemented with Cache, Tasks, & Queues: PubSubHubbub; Async task system Gearman; use async EVERYWHERE; use Redis Lists for queues; cron to consume events off the queue; Cloud Event Processing - Big Data, Low Latency Use Cases at LinkedIn by Colin Clark. It
6 0.89926291 544 high scalability-2009-03-18-QCon London 2009: Upgrading Twitter without service disruptions
7 0.89113086 1483 high scalability-2013-06-27-Paper: XORing Elephants: Novel Erasure Codes for Big Data
8 0.88964188 883 high scalability-2010-08-20-Hot Scalability Links For Aug 20, 2010
9 0.8857041 835 high scalability-2010-06-03-Hot Scalability Links for June 3, 2010
10 0.87454009 28 high scalability-2007-07-25-Product: NetApp MetroCluster Software
11 0.8630532 900 high scalability-2010-09-11-Google's Colossus Makes Search Real-time by Dumping MapReduce
12 0.84100211 666 high scalability-2009-07-30-Learn How to Think at Scale
13 0.83692819 250 high scalability-2008-02-17-Web Accelerators - snake oil or miracle remedy?
15 0.81312889 1594 high scalability-2014-02-12-Paper: Network Stack Specialization for Performance
16 0.81109321 984 high scalability-2011-02-04-Stuff The Internet Says On Scalability For February 4, 2011
17 0.81100762 1102 high scalability-2011-08-22-Strategy: Run a Scalable, Available, and Cheap Static Site on S3 or GitHub
18 0.8099798 1389 high scalability-2013-01-18-Stuff The Internet Says On Scalability For January 18, 2013
19 0.80710071 304 high scalability-2008-04-19-How to build a real-time analytics system?
20 0.80663931 1622 high scalability-2014-03-31-How WhatsApp Grew to Nearly 500 Million Users, 11,000 cores, and 70 Million Messages a Second