high_scalability high_scalability-2009 high_scalability-2009-705 knowledge-graph by maker-knowledge-mining

705 high scalability-2009-09-16-Paper: A practical scalable distributed B-tree

meta infos for this blog

Source: html

Introduction: We've seen a lot of NoSQL action lately built around distributed hash tables. Btrees are getting jealous. Btrees, once the king of the database world, want their throne back. Paul Buchheit surfaced a paper: A practical scalable distributed B-tree by Marcos K. Aguilera and Wojciech Golab, that might help spark a revolution. From the Abstract: We propose a new algorithm for a practical, fault tolerant, and scalable B-tree distributed over a set of servers. Our algorithm supports practical features not present in prior work: transactions that allow atomic execution of multiple operations over multiple B-trees, online migration of B-tree nodes between servers, and dynamic addition and removal of servers. Moreover, our algorithm is conceptually simple: we use transactions to manipulate B-tree nodes so that clients need not use complicated concurrency and locking protocols used in prior work. To execute these transactions quickly, we rely on three techniques: (1) We use optimistic

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We've seen a lot of NoSQL action lately built around distributed hash tables. [sent-1, score-0.389]

2 Paul Buchheit surfaced a paper: A practical scalable distributed B-tree by Marcos K. [sent-4, score-0.267]

3 From the Abstract: We propose a new algorithm for a practical, fault tolerant, and scalable B-tree distributed over a set of servers. [sent-6, score-0.343]

4 Our algorithm supports practical features not present in prior work: transactions that allow atomic execution of multiple operations over multiple B-trees, online migration of B-tree nodes between servers, and dynamic addition and removal of servers. [sent-7, score-1.061]

5 Moreover, our algorithm is conceptually simple: we use transactions to manipulate B-tree nodes so that clients need not use complicated concurrency and locking protocols used in prior work. [sent-8, score-1.169]

6 To execute these transactions quickly, we rely on three techniques: (1) We use optimistic concurrency control, so that B-tree nodes are not locked during transaction execution, only during commit. [sent-9, score-0.558]

7 These replicas are lazy, and hence lightweight, and they are very helpful to reduce client-server communication while traversing the B-tree. [sent-12, score-0.097]

8 (3)We replicate version numbers of inner nodes across servers, so that clients can validate their transactions efficiently, without creating bottlenecks at the root node and other upper levels in the tree. [sent-13, score-0.988]

9 Distributed hash tables are scalable because records area easily distributed across a cluster which gives the golden ability to perform many writes in parallel. [sent-14, score-0.649]

10 A lot of the time you want to iterate through records or search records in a sorted order. [sent-16, score-0.752]

11 Sorted could mean time stamp order, for example, or last name order as another example. [sent-17, score-0.197]

12 Access to data in sorted order is what btrees are for. [sent-18, score-0.661]

13 But we simply haven't seen distributed btree systems develop. [sent-19, score-0.295]

14 Instead, you would have to use some sort of map-reduce mechanism to efficiently scan all the records or you would have to maintain the information in some other way. [sent-20, score-0.542]

15 This paper points the way to do some really cool things at a system level: It's distributed so it can scale dynamically in size and handle writes in parallel. [sent-21, score-0.298]

16 It supports adding and dropping servers dynamically, which is an essential requirement for architectures based on elastic cloud infrastructures. [sent-22, score-0.354]

17 Data can be migrated to other nodes, which is essential for maintenance. [sent-23, score-0.273]

18 Multiple records can be involved in transactions which is essential for the complex data manipulations that happen in real systems. [sent-24, score-0.717]

19 This is accomplished via a version number mechanism that looks something like MVCC. [sent-25, score-0.264]

20 Optimistic concurrency, that is, the ability to change data without explicit locking, makes the job for programmers a lot easier. [sent-26, score-0.151]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('btrees', 0.343), ('records', 0.268), ('sorted', 0.216), ('nodes', 0.203), ('essential', 0.188), ('inner', 0.162), ('practical', 0.156), ('algorithm', 0.153), ('transactions', 0.139), ('concurrency', 0.138), ('prior', 0.138), ('replicate', 0.124), ('manipulations', 0.122), ('locking', 0.121), ('distributed', 0.111), ('hash', 0.111), ('dynamically', 0.111), ('mechanism', 0.109), ('order', 0.102), ('btree', 0.102), ('keyed', 0.102), ('execution', 0.101), ('validate', 0.097), ('traversing', 0.097), ('upper', 0.095), ('stamp', 0.095), ('manipulate', 0.093), ('conceptually', 0.093), ('clients', 0.091), ('richer', 0.091), ('efficiently', 0.089), ('supports', 0.088), ('lazy', 0.087), ('migrated', 0.085), ('lately', 0.085), ('golden', 0.084), ('spark', 0.084), ('removal', 0.083), ('seen', 0.082), ('king', 0.08), ('moreover', 0.08), ('propose', 0.079), ('accomplished', 0.078), ('locked', 0.078), ('dropping', 0.078), ('version', 0.077), ('explicit', 0.076), ('scan', 0.076), ('paper', 0.076), ('ability', 0.075)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 705 high scalability-2009-09-16-Paper: A practical scalable distributed B-tree

2 0.1314777 676 high scalability-2009-08-08-Yahoo!'s PNUTS Database: Too Hot, Too Cold or Just Right?

Introduction: So far every massively scalable database is a bundle of compromises. For some the weak guarantees of Amazon's eventual consistency model are too cold. For many the strong guarantees of standard RDBMS distributed transactions are too hot. Google App Engine tries to get it just right with entity groups . Yahoo! is also trying to get is just right by offering per-record timeline consistency, which hopes to serve up a heaping bowl of rich database functionality and low latency at massive scale : We describe PNUTS [Platform for Nimble Universal Table Storage], a massively parallel and geographically distributed database system for Yahoo!â€™s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of con-current requests including updates and queries, and novel per-record consistency guarantees. It is a hosted, centrally managed, and geographically distributed service, and utilizes automated load-balancing and failover to redu

3 0.1305653 1142 high scalability-2011-11-14-Using Gossip Protocols for Failure Detection, Monitoring, Messaging and Other Good Things

Introduction: When building a system on top of a set of wildly uncooperative and unruly computers you have knowledge problems: knowing when other nodes are dead; knowing when nodes become alive; getting information about other nodes so you can make local decisions, like knowing which node should handle a request based on a scheme for assigning nodes to a certain range of users; learning about new configuration data; agreeing on data values; and so on. How do you solve these problems? A common centralized approach is to use a database and all nodes query it for information. Obvious availability and performance issues for large distributed clusters. Another approach is to use Paxos , a protocol for solving consensus in a network to maintain strict consistency requirements for small groups of unreliable processes. Not practical when larger number of nodes are involved. So what's the super cool decentralized way to bring order to large clusters? Gossip protocols , which maintain relaxed consi

4 0.12390625 890 high scalability-2010-09-01-Paper: The Case for Determinism in Database Systems

Introduction: Can you have your ACID cake and eat your distributed database too? Yes explains Daniel Abadi, Assistant Professor of Computer Science at Yale University, in an epic post, The problems with ACID, and how to fix them without going NoSQL , coauthored with Alexander Thomson , on their paper The Case for Determinism in Database Systems . We've already seen VoltDB offer the best of both worlds, this sounds like a completely different approach. The solution, they propose, is: ...an architecture and execution model that avoids deadlock, copes with failures without aborting transactions, and achieves high concurrency. The paper contains full details, but the basic idea is to use ordered locking coupled with optimistic lock location prediction, while exploiting deterministic systems' nice replication properties in the case of failures. The problem they are trying to solve is: In our opinion, the NoSQL decision to give up on ACID is the lazy solution to these scala

5 0.11669442 448 high scalability-2008-11-22-Google Architecture

Introduction: Update 2: Sorting 1 PB with MapReduce . PB is not peanut-butter-and-jelly misspelled. It's 1 petabyte or 1000 terabytes or 1,000,000 gigabytes. It took six hours and two minutes to sort 1PB (10 trillion 100-byte records) on 4,000 computers and the results were replicated thrice on 48,000 disks. Update: Greg Linden points to a new Google article MapReduce: simplified data processing on large clusters . Some interesting stats: 100k MapReduce jobs are executed each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory. Google is the King of scalability. Everyone knows Google for their large, sophisticated, and fast searching, but they don't just shine in search. Their platform approach to building scalable applications allows them to roll out internet scale applications at an alarmingly high competition crushing rate. Their goal is always to build

6 0.11629534 1345 high scalability-2012-10-22-Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale

7 0.11300273 1258 high scalability-2012-06-05-Thesis: Concurrent Programming for Scalable Web Architectures

8 0.1126354 1307 high scalability-2012-08-20-The Performance of Distributed Data-Structures Running on a "Cache-Coherent" In-Memory Data Grid

9 0.11151548 589 high scalability-2009-05-05-Drop ACID and Think About Data

10 0.11044636 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?

11 0.10982652 1629 high scalability-2014-04-10-Paper: Scalable Atomic Visibility with RAMP Transactions - Scale Linearly to 100 Servers

12 0.10982125 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?

13 0.1094235 1305 high scalability-2012-08-16-Paper: A Provably Correct Scalable Concurrent Skip List

14 0.10783742 1299 high scalability-2012-08-06-Paper: High-Performance Concurrency Control Mechanisms for Main-Memory Databases

15 0.10765397 1017 high scalability-2011-04-06-Netflix: Run Consistency Checkers All the time to Fixup Transactions

16 0.10650306 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud

17 0.10643189 1355 high scalability-2012-11-05-Gone Fishin': Building Super Scalable Systems: Blade Runner Meets Autonomic Computing In The Ambient Cloud

18 0.10464405 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?

19 0.10293483 1302 high scalability-2012-08-10-Stuff The Internet Says On Scalability For August 10, 2012

20 0.10241427 666 high scalability-2009-07-30-Learn How to Think at Scale

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.18), (1, 0.09), (2, 0.001), (3, 0.043), (4, -0.015), (5, 0.104), (6, 0.048), (7, -0.008), (8, -0.055), (9, 0.002), (10, 0.028), (11, 0.047), (12, -0.065), (13, -0.05), (14, 0.061), (15, 0.01), (16, -0.01), (17, 0.014), (18, 0.027), (19, -0.026), (20, 0.036), (21, 0.033), (22, -0.051), (23, 0.044), (24, -0.071), (25, -0.059), (26, 0.04), (27, 0.05), (28, 0.039), (29, -0.028), (30, 0.025), (31, 0.013), (32, -0.085), (33, -0.005), (34, -0.031), (35, -0.038), (36, -0.002), (37, -0.03), (38, 0.002), (39, 0.011), (40, -0.039), (41, -0.038), (42, -0.011), (43, 0.004), (44, -0.037), (45, 0.054), (46, -0.004), (47, 0.015), (48, 0.048), (49, 0.028)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96435589 705 high scalability-2009-09-16-Paper: A practical scalable distributed B-tree

2 0.83664072 1463 high scalability-2013-05-23-Paper: Calvin: Fast Distributed Transactions for Partitioned Database Systems

Introduction: Distributed transactions are costly because they use agreement protocols . Calvin says, surprisingly, that using a deterministic database allows you to avoid the use of agreement protocols. The approach is to use a deterministic transaction layer that does all the hard work before acquiring locks and the beginning of transaction execution. Overview: Many distributed storage systems achieve high data access throughput via partitioning and replication, each system with its own advantages and tradeoffs. In order to achieve high scalability, however, today’s systems generally reduce transactional support, disallowing single transactions from spanning multiple partitions. Calvin is a practical transaction scheduling and data replication layer that uses a deterministic ordering guarantee to signiﬁcantly reduce the normally prohibitive contention costs associated with distributed transactions. Unlike previous deterministic database system prototypes, Calvin supports disk-based storage

3 0.8306331 890 high scalability-2010-09-01-Paper: The Case for Determinism in Database Systems

4 0.82070196 1299 high scalability-2012-08-06-Paper: High-Performance Concurrency Control Mechanisms for Main-Memory Databases

Introduction: If you stayed up all night watching the life reaffirming Curiosity landing on Mars , then this paper, High-Performance Concurrency Control Mechanisms for Main-Memory Databases , has nothing to do with that at all, but it is an excellent look at how to use optimistic MVCC schemes to reduce lock overhead on in-memory datastructures: A database system optimized for in-memory storage can support much higher transaction rates than current systems. However, standard concurrency control methods used today do not scale to the high transaction rates achievable by such systems. In this paper we introduce two efficient concurrency control methods specifically designed for main-memory databases. Both use multiversioning to isolate read-only transactions from updates but differ in how atomicity is ensured: one is optimistic and one is pessimistic. To avoid expensive context switching, transactions never block during normal processing but they may have to wait before commit to ensure corr

5 0.82032549 1629 high scalability-2014-04-10-Paper: Scalable Atomic Visibility with RAMP Transactions - Scale Linearly to 100 Servers

Introduction: We are not yet at the End of History for database theory as Peter Bailis and the Database Group at UC Berkeley continue to prove with a great companion blog post to their new paper: Scalable Atomic Visibility with RAMP Transactions . I like the approach of pairing a blog post with a paper. A paper almost by definition is formal, but a blog post can help put a paper in context and give it some heart. From the abstract: Databases can provide scalability by partitioning data across several servers. However, multi-partition, multi-operation transactional access is often expensive, employing coordination-intensive locking, validation, or scheduling mechanisms. Accordingly, many real-world systems avoid mechanisms that provide useful semantics for multi-partition operations . This leads to incorrect behavior for a large class of applications including secondary indexing, foreign key enforcement, and materialized view maintenance . In this work, we identify a new isolation mode

6 0.81191081 963 high scalability-2010-12-23-Paper: CRDTs: Consistency without concurrency control

7 0.81110376 529 high scalability-2009-03-10-Paper: Consensus Protocols: Paxos

8 0.81082731 1142 high scalability-2011-11-14-Using Gossip Protocols for Failure Detection, Monitoring, Messaging and Other Good Things

9 0.80188495 676 high scalability-2009-08-08-Yahoo!'s PNUTS Database: Too Hot, Too Cold or Just Right?

10 0.80035919 983 high scalability-2011-02-02-Piccolo - Building Distributed Programs that are 11x Faster than Hadoop

11 0.78874594 1459 high scalability-2013-05-16-Paper: Warp: Multi-Key Transactions for Key-Value Stores

12 0.78286934 507 high scalability-2009-02-03-Paper: Optimistic Replication

13 0.77623391 1611 high scalability-2014-03-12-Paper: Scalable Eventually Consistent Counters over Unreliable Networks

14 0.77532578 844 high scalability-2010-06-18-Paper: The Declarative Imperative: Experiences and Conjectures in Distributed Logic

15 0.77415925 357 high scalability-2008-07-26-Google's Paxos Made Live – An Engineering Perspective

16 0.77060616 280 high scalability-2008-03-17-Paper: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web

17 0.76733166 1146 high scalability-2011-11-23-Paper: Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS

18 0.76383293 756 high scalability-2009-12-30-Terrastore - Scalable, elastic, consistent document store.

19 0.75865525 1273 high scalability-2012-06-27-Paper: Logic and Lattices for Distributed Programming

20 0.74177289 510 high scalability-2009-02-09-Paper: Consensus Protocols: Two-Phase Commit

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.166), (2, 0.195), (10, 0.013), (27, 0.213), (61, 0.077), (79, 0.123), (85, 0.068), (94, 0.066)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.91579413 705 high scalability-2009-09-16-Paper: A practical scalable distributed B-tree

2 0.91052902 1097 high scalability-2011-08-12-Stuff The Internet Says On Scalability For August 12, 2011

Introduction: Submitted for your scaling pleasure, you may not scale often, but when you scale, please drink us: Quotably quotable quotes: @mardix : There is no single point of truth in #NoSQL . #Consistency is no longer global, it's relative to the one accessing it. #Scalability @kekline : RT @CurtMonash: "...from industry figures, Basho/Riak is our third-biggest competitor." How often do you encounter them? "Never have" #nosql @dave_jacobs : Love being in a city where I can overhear a convo about Heroku scalability while doing deadlifts. #ahsanfrancisco @satheeshilu : Doctor at #hospital in india says #ge #healthcare software is slow to handle 100K X-rays an year.Scalability is critical 4 Indian #software @sufw : How can it be possible that Tagged has 80m users and I have *never* heard of it!?! @EventCloudPro : One of my vacation realizations? Whole #bigdata thing has turned into a lotta #bighype - many distinct issues & nothing to do w/ #bigdata No

3 0.9071514 555 high scalability-2009-04-04-Performance Anti-Pattern

Introduction: Want your apps to run faster? Here’s what not to do. By: Bart Smaalders, Sun Microsystems. Performance Anti-Patterns: - Fixing Performance at the End of the Project - Measuring and Comparing the Wrong Things - Algorithmic Antipathy - Reusing Software - Iterating Because That’s What Computers Do Well - Premature Optimization - Focusing on What You Can See Rather Than on the Problem - Software Layering - Excessive Numbers of Threads - Asymmetric Hardware Utilization - Not Optimizing for the Common Case - Needless Swapping of Cache Lines Between CPUs For more detail go there

4 0.90133691 1230 high scalability-2012-04-18-Ansible - A Simple Model-Driven Configuration Management and Command Execution Framework

Introduction: This is guest post by Michael DeHaan (@laserllama), a software developer and architect, on Ansible , a simple deployment, model-driven configuration management, and command execution framework. I owe High Scalability a great deal of credit for the idea behind my latest software project. I was reading about how an older tool I helped create, Func, was used at Tumblr , and it kicked some ideas into gear. This article is about what happened from that idea. My observation, which the article reinforced, was that many shops end up using a configuration management tool (Puppet, Chef, cfengine), a separate deployment tool (Capistrano, Fabric) and yet another separate ad-hoc task execution tool (Func, pssh, etc) because one class of tool historically hasn't been good at all three jobs. My other observation (not from the article) was that the whole "infrastructure as code" movement, while revolutionary, and definitely great for many, was probably secretly grating on a good number of

5 0.90056378 1141 high scalability-2011-11-11-Stuff The Internet Says On Scalability For November 11, 2011

Introduction: You got performance in my scalability! You got scalability in my performance! Two great tastes that taste great together: Quotable quotes: @jasoncbooth : Tired of the term #nosql. I would like to coin NRDS (pronounced "nerds"), standing for Non Relational Data Store. @zenfeed : One lesson I learn about scalability, is that it has a LOT to do with simplicity and consistency. Ray Walters : Quad-core chips in mobile phones is nothing but a marketing snow job Flickr: Real-time Updates on the Cheap for Fun and Profit . How Flickr added real-time push feed on the cheap. Events happen all over Flickr, uploads and updates (around 100/s depending on the time of day), all of them inserting tasks. Implemented with Cache, Tasks, & Queues: PubSubHubbub; Async task system Gearman; use async EVERYWHERE; use Redis Lists for queues; cron to consume events off the queue; Cloud Event Processing - Big Data, Low Latency Use Cases at LinkedIn by Colin Clark. It

6 0.89926291 544 high scalability-2009-03-18-QCon London 2009: Upgrading Twitter without service disruptions

7 0.89113086 1483 high scalability-2013-06-27-Paper: XORing Elephants: Novel Erasure Codes for Big Data

8 0.88964188 883 high scalability-2010-08-20-Hot Scalability Links For Aug 20, 2010

9 0.8857041 835 high scalability-2010-06-03-Hot Scalability Links for June 3, 2010

10 0.87454009 28 high scalability-2007-07-25-Product: NetApp MetroCluster Software

11 0.8630532 900 high scalability-2010-09-11-Google's Colossus Makes Search Real-time by Dumping MapReduce

12 0.84100211 666 high scalability-2009-07-30-Learn How to Think at Scale

13 0.83692819 250 high scalability-2008-02-17-Web Accelerators - snake oil or miracle remedy?

14 0.81677395 108 high scalability-2007-10-03-Paper: Brewer's Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Services

15 0.81312889 1594 high scalability-2014-02-12-Paper: Network Stack Specialization for Performance

16 0.81109321 984 high scalability-2011-02-04-Stuff The Internet Says On Scalability For February 4, 2011

17 0.81100762 1102 high scalability-2011-08-22-Strategy: Run a Scalable, Available, and Cheap Static Site on S3 or GitHub

18 0.8099798 1389 high scalability-2013-01-18-Stuff The Internet Says On Scalability For January 18, 2013

19 0.80710071 304 high scalability-2008-04-19-How to build a real-time analytics system?

20 0.80663931 1622 high scalability-2014-03-31-How WhatsApp Grew to Nearly 500 Million Users, 11,000 cores, and 70 Million Messages a Second