high_scalability high_scalability-2010 high_scalability-2010-963 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: For a great Christmas read forget The Night Before Christmas , a heart warming poem written by Clement Moore for his children, that created the modern idea of Santa Clause we all know and anticipate each Christmas eve. Instead, curl up with a some potent eggnog , nog being any drink made with rum, and read CRDTs: Consistency without concurrency control  by Mihai Letia, Nuno Preguiça, and Marc Shapiro, which talks about CRDTs (Commutative Replicated Data Type), a data type whose operations commute when they are concurrent . From the introduction, which also serves as a nice concise overview of distributed consistency issues: Shared read-only data is easy to scale by using well-understood replication techniques. However, sharing mutable data at a large scale is a difficult problem, because of the CAP impossibility result [5]. Two approaches dominate in practice. One ensures scalability by giving up consistency guarantees, for instance using the Last-Writer-Wins (LWW) approach [
sentIndex sentText sentNum sentScore
1 For a great Christmas read forget The Night Before Christmas , a heart warming poem written by Clement Moore for his children, that created the modern idea of Santa Clause we all know and anticipate each Christmas eve. [sent-1, score-0.149]
2 From the introduction, which also serves as a nice concise overview of distributed consistency issues: Shared read-only data is easy to scale by using well-understood replication techniques. [sent-3, score-0.202]
3 However, sharing mutable data at a large scale is a difficult problem, because of the CAP impossibility result [5]. [sent-4, score-0.077]
4 One ensures scalability by giving up consistency guarantees, for instance using the Last-Writer-Wins (LWW) approach [7]. [sent-6, score-0.279]
5 The alternative guarantees consistency by serialising all updates, which does not scale beyond a small cluster [12]. [sent-7, score-0.22]
6 Optimistic replication allows replicas to diverge, eventually resolving conflicts either by LWW-like methods or by serialisation [11]. [sent-8, score-0.425]
7 If concurrent updates to some datum commute, and all of its replicas execute all updates in causal order, then the replicas converge. [sent-10, score-0.806]
8 The CRDT approach ensures that there are no conflicts, hence, no need for consensus-based concurrency control. [sent-12, score-0.207]
9 This new research direction is promising as it ensures consistency in the large scale at a low cost, at least for some applications. [sent-14, score-0.279]
10 A delete-element operation can be emulated by adding "deleted" elements to a second set. [sent-16, score-0.168]
11 A more interesting example is WOOT, a CRDT for concurrent editing [9], pioneering but inefficient, and its successor Logoot [13]. [sent-19, score-0.358]
12 As an existence proof of non-trivial, useful, practical and ecient CRDT, we exhibit one that implements an ordered set with insert-at-position and delete operations. [sent-20, score-0.105]
13 It is called Treedoc, because sequence elements are identified compactly using a naming tree, and because its first use was concurrent document editing [10]. [sent-21, score-0.313]
14 Its design presents original solutions to scalability issues, namely restructuring the tree without violating commutativity, supporting very large and variable numbers of writable replicas, and leveraging the data structure to ensure causal ordering without vector clocks. [sent-22, score-0.486]
15 While the advantages of commutativity are well documented, we are the first (to our knowledge) to address the design of CRDTs. [sent-24, score-0.133]
16 In future work, we plan to explore what other interesting CRDTs may exist, and what are the theoretical and practical requirements for CRDTs. [sent-25, score-0.105]
17  Related Articles Replication: Optimistic approaches  by Yasushi Saito and Marc Shapiro. [sent-27, score-0.07]
18 Designing a commutative replicated data type  by Marc Shapiro and Nuno Preguiça. [sent-28, score-0.395]
wordName wordTfidf (topN-words)
[('crdt', 0.44), ('crdts', 0.266), ('commutative', 0.195), ('christmas', 0.183), ('marc', 0.183), ('pregui', 0.18), ('replicas', 0.168), ('nuno', 0.164), ('shapiro', 0.154), ('ensures', 0.145), ('commute', 0.141), ('consistency', 0.134), ('commutativity', 0.133), ('editing', 0.127), ('causal', 0.12), ('conflicts', 0.113), ('practical', 0.105), ('replicated', 0.102), ('type', 0.098), ('concurrent', 0.095), ('optimistic', 0.091), ('elements', 0.091), ('updates', 0.089), ('guarantees', 0.086), ('saito', 0.082), ('yasushi', 0.082), ('poem', 0.082), ('rum', 0.082), ('serialisation', 0.082), ('tree', 0.08), ('impossibility', 0.077), ('datum', 0.077), ('emulated', 0.077), ('writable', 0.077), ('diverge', 0.073), ('successor', 0.073), ('curl', 0.073), ('restructuring', 0.073), ('violating', 0.073), ('compactly', 0.071), ('approaches', 0.07), ('potent', 0.068), ('concise', 0.068), ('warming', 0.067), ('clause', 0.065), ('pioneering', 0.063), ('namely', 0.063), ('resolving', 0.062), ('concurrency', 0.062), ('documented', 0.061)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 963 high scalability-2010-12-23-Paper: CRDTs: Consistency without concurrency control
Introduction: For a great Christmas read forget The Night Before Christmas , a heart warming poem written by Clement Moore for his children, that created the modern idea of Santa Clause we all know and anticipate each Christmas eve. Instead, curl up with a some potent eggnog , nog being any drink made with rum, and read CRDTs: Consistency without concurrency control  by Mihai Letia, Nuno Preguiça, and Marc Shapiro, which talks about CRDTs (Commutative Replicated Data Type), a data type whose operations commute when they are concurrent . From the introduction, which also serves as a nice concise overview of distributed consistency issues: Shared read-only data is easy to scale by using well-understood replication techniques. However, sharing mutable data at a large scale is a difficult problem, because of the CAP impossibility result [5]. Two approaches dominate in practice. One ensures scalability by giving up consistency guarantees, for instance using the Last-Writer-Wins (LWW) approach [
2 0.18891723 507 high scalability-2009-02-03-Paper: Optimistic Replication
Introduction: To scale in the large you have to partition. Data has to be spread around, replicated, and kept consistent (keeping replicas sufficiently similar to one another despite operations being submitted independently at different sites). The result is a highly available, well performing, and scalable system. Partitioning is required, but it's a pain to do efficiently and correctly. Until Quantum teleportation becomes a reality how data is kept consistent across a bewildering number of failure scenarios is a key design decision. This excellent paper by Yasushi Saito and Marc Shapiro takes us on a wild ride (OK, maybe not so wild) of different approaches to achieving consistency. What's cool about this paper is they go over some real systems that we are familiar with and cover how they work: DNS (single-master, state-transfer), Usenet (multi-master), PDAs (multi-master, state-transfer, manual or application-specific conflict resolution), Bayou (multi-master, operation-transfer, epidemic
Introduction: Teams from Princeton and CMU are working together to solve one of the most difficult problems in the repertoire: scalable geo-distributed data stores. Major companies like Google and Facebook have been working on multiple datacenter database functionality for some time, but there's still a general lack of available systems that work for complex data scenarios. The ideas in this paper-- Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS --are different. It's not another eventually consistent system, or a traditional transaction oriented system, or a replication based system, or a system that punts on the issue. It's something new, a causally consistent system that achieves ALPS system properties. Move over CAP, NoSQL, etc, we have another acronym: ALPS - Available (operations always complete successfully), Low-latency (operations complete quickly (single digit milliseconds)), Partition-tolerant (operates with a partition), and Scalable (just a
4 0.14762197 1153 high scalability-2011-12-08-Update on Scalable Causal Consistency For Wide-Area Storage With COPS
Introduction: Here are a few updates on the article Paper: Don’t Settle For Eventual: Scalable Causal Consistency For Wide-Area Storage With COPS from Mike Freedman and Wyatt Lloyd. Q: How software architectures could change in response to casual+ consistency? A : I don't really think they would much. Somebody would still run a two-tier architecture in their datacenter: a front-tier of webservers running both (say) PHP and our client library, and a back tier of storage nodes running COPS. (I'm not sure if it was obvious given the discussion of our "thick" client -- you should think of the COPS client dropping in where a memcache client library does...albeit ours has per-session state.) Q: Why not just use vector clocks? A : The problem with vector clocks and scalability has always been that the size of vector clocks in O(N), where N is the number of nodes. So if we want to scale to a datacenter with 10K nodes, each piece of metadata must have size O(10K). And in fact, vector
5 0.12287726 1611 high scalability-2014-03-12-Paper: Scalable Eventually Consistent Counters over Unreliable Networks
Introduction: Counting at scale in a distributed environment is surprisingly hard . And it's a subject we've covered before in various ways: Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory , How to update video views count effectively? , Numbers Everyone Should Know (sharded counters) . Kellabyte (which is an excellent blog) in Scalable Eventually Consistent Counters talks about how the Cassandra counter implementation scores well on the scalability and high availability front, but in so doing has "over and under counting problem in partitioned environments." Which is often fine. But if you want more accuracy there's a PN-counter, which is a CRDT (convergent replicated data type) where "all the changes made to a counter on each node rather than storing and modifying a single value so that you can merge all the values into the proper final value. Of course the trade-off here is additional storage and processing but there are ways to optimize this."
6 0.10980442 676 high scalability-2009-08-08-Yahoo!'s PNUTS Database: Too Hot, Too Cold or Just Right?
7 0.10442786 1637 high scalability-2014-04-25-Stuff The Internet Says On Scalability For April 25th, 2014
9 0.094491258 1487 high scalability-2013-07-05-Stuff The Internet Says On Scalability For July 5, 2013
10 0.091345802 1273 high scalability-2012-06-27-Paper: Logic and Lattices for Distributed Programming
11 0.079071023 1345 high scalability-2012-10-22-Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale
12 0.074415684 687 high scalability-2009-08-24-How Google Serves Data from Multiple Datacenters
13 0.074097738 125 high scalability-2007-10-18-another approach to replication
14 0.074076094 1431 high scalability-2013-03-29-Stuff The Internet Says On Scalability For March 29, 2013
15 0.073876344 1600 high scalability-2014-02-21-Stuff The Internet Says On Scalability For February 21st, 2014
16 0.072750494 1258 high scalability-2012-06-05-Thesis: Concurrent Programming for Scalable Web Architectures
17 0.072094806 1236 high scalability-2012-04-30-Masstree - Much Faster than MongoDB, VoltDB, Redis, and Competitive with Memcached
18 0.070248052 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
19 0.069321901 11 high scalability-2007-07-15-Coyote Point Load Balancing Systems
20 0.069275737 1529 high scalability-2013-10-08-F1 and Spanner Holistically Compared
topicId topicWeight
[(0, 0.105), (1, 0.068), (2, -0.002), (3, 0.054), (4, 0.005), (5, 0.086), (6, 0.001), (7, -0.018), (8, -0.047), (9, -0.015), (10, 0.026), (11, 0.036), (12, -0.053), (13, -0.043), (14, 0.019), (15, 0.034), (16, 0.039), (17, 0.014), (18, 0.037), (19, -0.017), (20, 0.037), (21, 0.057), (22, -0.05), (23, 0.015), (24, -0.07), (25, -0.047), (26, 0.017), (27, 0.001), (28, 0.03), (29, -0.064), (30, 0.025), (31, 0.0), (32, -0.043), (33, 0.031), (34, -0.052), (35, -0.027), (36, -0.01), (37, -0.017), (38, 0.011), (39, 0.037), (40, -0.023), (41, 0.017), (42, 0.004), (43, -0.023), (44, 0.038), (45, 0.016), (46, -0.004), (47, 0.017), (48, 0.02), (49, -0.058)]
simIndex simValue blogId blogTitle
same-blog 1 0.92431539 963 high scalability-2010-12-23-Paper: CRDTs: Consistency without concurrency control
Introduction: For a great Christmas read forget The Night Before Christmas , a heart warming poem written by Clement Moore for his children, that created the modern idea of Santa Clause we all know and anticipate each Christmas eve. Instead, curl up with a some potent eggnog , nog being any drink made with rum, and read CRDTs: Consistency without concurrency control  by Mihai Letia, Nuno Preguiça, and Marc Shapiro, which talks about CRDTs (Commutative Replicated Data Type), a data type whose operations commute when they are concurrent . From the introduction, which also serves as a nice concise overview of distributed consistency issues: Shared read-only data is easy to scale by using well-understood replication techniques. However, sharing mutable data at a large scale is a difficult problem, because of the CAP impossibility result [5]. Two approaches dominate in practice. One ensures scalability by giving up consistency guarantees, for instance using the Last-Writer-Wins (LWW) approach [
2 0.87302542 507 high scalability-2009-02-03-Paper: Optimistic Replication
Introduction: To scale in the large you have to partition. Data has to be spread around, replicated, and kept consistent (keeping replicas sufficiently similar to one another despite operations being submitted independently at different sites). The result is a highly available, well performing, and scalable system. Partitioning is required, but it's a pain to do efficiently and correctly. Until Quantum teleportation becomes a reality how data is kept consistent across a bewildering number of failure scenarios is a key design decision. This excellent paper by Yasushi Saito and Marc Shapiro takes us on a wild ride (OK, maybe not so wild) of different approaches to achieving consistency. What's cool about this paper is they go over some real systems that we are familiar with and cover how they work: DNS (single-master, state-transfer), Usenet (multi-master), PDAs (multi-master, state-transfer, manual or application-specific conflict resolution), Bayou (multi-master, operation-transfer, epidemic
3 0.87213075 1273 high scalability-2012-06-27-Paper: Logic and Lattices for Distributed Programming
Introduction: Neil Conway from Berkeley CS is giving an advanced level talk at a meetup today in San Francisco on a new paper: Logic and Lattices for Distributed Programming - extending set logic to support CRDT-style lattices. The description of the meetup is probably the clearest introduction to the paper: Developers are increasingly choosing datastores that sacrifice strong consistency guarantees in exchange for improved performance and availability. Unfortunately, writing reliable distributed programs without the benefit of strong consistency can be very challenging. In this talk, I'll discuss work from our group at UC Berkeley that aims to make it easier to write distributed programs without relying on strong consistency. Bloom is a declarative programming language for distributed computing, while CALM is an analysis technique that identifies programs that are guaranteed to be eventually consistent. I'll then discuss our recent work on extending CALM to support a broader range of
4 0.8458873 1374 high scalability-2012-12-18-Georeplication: When Bad Things Happen to Good Systems
Introduction: Georeplication is one of the standard techniques for dealing when bad things--failure and latency--happen to good systems. The problem is always: how do you do that? Murat Demirbas , Associate Professor at SUNY Buffalo, has a couple of really good posts that can help: MDCC: Multi-Data Center Consistency and Making Geo-Replicated Systems Fast as Possible, Consistent when Necessary . In MDCC: Multi-Data Center Consistency Murat discusses a paper that says synchronous wide-area replication can be feasible. There's a quick and clear explanation of Paxos and various optimizations that is worth the price of admission. We find that strong consistency doesn't have to be lost across a WAN: The good thing about using Paxos over the WAN is you /almost/ get the full CAP (all three properties: consistency, availability, and partition-freedom). As we discussed earlier (Paxos taught), Paxos is CP, that is, in the presence of a partition, Paxos keeps consistency over availability. But, P
Introduction: Teams from Princeton and CMU are working together to solve one of the most difficult problems in the repertoire: scalable geo-distributed data stores. Major companies like Google and Facebook have been working on multiple datacenter database functionality for some time, but there's still a general lack of available systems that work for complex data scenarios. The ideas in this paper-- Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS --are different. It's not another eventually consistent system, or a traditional transaction oriented system, or a replication based system, or a system that punts on the issue. It's something new, a causally consistent system that achieves ALPS system properties. Move over CAP, NoSQL, etc, we have another acronym: ALPS - Available (operations always complete successfully), Low-latency (operations complete quickly (single digit milliseconds)), Partition-tolerant (operates with a partition), and Scalable (just a
6 0.79618967 1153 high scalability-2011-12-08-Update on Scalable Causal Consistency For Wide-Area Storage With COPS
7 0.79596037 890 high scalability-2010-09-01-Paper: The Case for Determinism in Database Systems
8 0.79108828 1611 high scalability-2014-03-12-Paper: Scalable Eventually Consistent Counters over Unreliable Networks
9 0.78580964 1450 high scalability-2013-05-01-Myth: Eric Brewer on Why Banks are BASE Not ACID - Availability Is Revenue
10 0.78335512 1629 high scalability-2014-04-10-Paper: Scalable Atomic Visibility with RAMP Transactions - Scale Linearly to 100 Servers
12 0.76509774 1243 high scalability-2012-05-10-Paper: Paxos Made Moderately Complex
13 0.76110637 1459 high scalability-2013-05-16-Paper: Warp: Multi-Key Transactions for Key-Value Stores
14 0.75293779 676 high scalability-2009-08-08-Yahoo!'s PNUTS Database: Too Hot, Too Cold or Just Right?
15 0.74953938 1463 high scalability-2013-05-23-Paper: Calvin: Fast Distributed Transactions for Partitioned Database Systems
16 0.73859453 705 high scalability-2009-09-16-Paper: A practical scalable distributed B-tree
17 0.73703867 950 high scalability-2010-11-30-NoCAP – Part III – GigaSpaces clustering explained..
18 0.72051311 625 high scalability-2009-06-10-Managing cross partition transactions in a distributed KV system
19 0.71564788 529 high scalability-2009-03-10-Paper: Consensus Protocols: Paxos
20 0.71301949 357 high scalability-2008-07-26-Google's Paxos Made Live – An Engineering Perspective
topicId topicWeight
[(1, 0.134), (2, 0.161), (5, 0.025), (10, 0.011), (17, 0.041), (30, 0.042), (35, 0.023), (47, 0.023), (56, 0.022), (61, 0.08), (79, 0.053), (83, 0.235), (85, 0.037), (94, 0.032)]
simIndex simValue blogId blogTitle
1 0.87484729 71 high scalability-2007-08-22-Profiling WEB applications
Introduction: Hi, Some of the articles of the site claims profiling is essential. Is there any established approach to profiling WEB apps? Or it too much depends on technologies used?
same-blog 2 0.87127793 963 high scalability-2010-12-23-Paper: CRDTs: Consistency without concurrency control
Introduction: For a great Christmas read forget The Night Before Christmas , a heart warming poem written by Clement Moore for his children, that created the modern idea of Santa Clause we all know and anticipate each Christmas eve. Instead, curl up with a some potent eggnog , nog being any drink made with rum, and read CRDTs: Consistency without concurrency control  by Mihai Letia, Nuno Preguiça, and Marc Shapiro, which talks about CRDTs (Commutative Replicated Data Type), a data type whose operations commute when they are concurrent . From the introduction, which also serves as a nice concise overview of distributed consistency issues: Shared read-only data is easy to scale by using well-understood replication techniques. However, sharing mutable data at a large scale is a difficult problem, because of the CAP impossibility result [5]. Two approaches dominate in practice. One ensures scalability by giving up consistency guarantees, for instance using the Last-Writer-Wins (LWW) approach [
3 0.77646524 1026 high scalability-2011-04-18-6 Ways Not to Scale that Will Make You Hip, Popular and Loved By VCs
Introduction: This is a hilarious presentation by Josh Berkus , called Scale Fail , given at O'Reilly MySQL CE 2011. Josh is entertaining, well spoken, and cleverly hides insight inside chaos. And he makes some dang good points along the way. Josh has a problem, you see Josh has learned how to make sites that are both scalable and reliable. So he's puzzled why companies "whose downtime interfaces (Twitter) are more well known than their uptime interfaces" get all the attention, respect, and money for being failures. Just doing your job doesn't make you a hero. You need these self-inflicted wounds in-order to have the war stories to share at conferences. They get the attention. Just doing your job is boring. This is so unfair in that way life can be. So if you want to turn the tables and take the low road to fame and fortune, here's Josh's program for learning how not to scale: Be trendy . Use the tool that has the most buzz: NoSQL, Cloud, MapReduce, Rails, RabbitMQ. It helps you no
4 0.75061625 1203 high scalability-2012-03-02-Stuff The Internet Says On Scalability For March 2, 2012
Introduction: Please don't squeeze the HighScalability: Quotable quotes: @karmafile : "Scalability" is a much more evil word than we make it out to be @ostaquet : More hardware won't solve #SQL resp. time issues; proper indexing does. @datachick : All computing technology is the rearrangement of data. Data is the center of the universe @jamesurquhart : "Complexity is a characteristic of the system, not of the parts in it." Data is the star of the cat walk, looking fierce in Ilya Katsov's impeccably constructed post on NoSQL Data Modeling Techniques : In this article I provide a short comparison of NoSQL system families from the data modeling point of view and digest several common modeling techniques. Peter Burns talks computer nanosecond time scales as a human might experience them. Your memory == computer registers , L1 cache == papers kept close by, L2 cache == books, RAM == the library down the street, and going to disk is a 3 year odessy for data. F
5 0.74036342 355 high scalability-2008-07-21-Eucalyptus - Build Your Own Private EC2 Cloud
Introduction: Update: InfoQ links to a few excellent Eucalyptus updates: Velocity Conference Video by Rich Wolski and a Visualization.com interview Rich Wolski on Eucalyptus: Open Source Cloud Computing . Eucalyptus is generating some excitement on the Cloud Computing group as a potential vendor neutral EC2 compatible cloud platform. Two reasons why Eucalyptus is potentially important: private clouds and cloud portability: Private clouds . Let's say you want a cloud like infrastructure for architectural purposes but you want it to run on your own hardware in your own secure environment. How would you do this today? Hm.... Cloud portability . With the number of cloud offerings increasing how can you maintain some level of vendor neutrality among this "swarm" of different options? Portability is a key capability for cloud customers as the only real power customers have is in where they take their business and the only way you can change suppliers is if there's a ready market of fun
6 0.73565459 236 high scalability-2008-02-03-Ideas on how to scale a shared inventory database???
7 0.72358805 734 high scalability-2009-10-30-Hot Scalabilty Links for October 30 2009
9 0.70682484 821 high scalability-2010-05-03-MocoSpace Architecture - 3 Billion Mobile Page Views a Month
10 0.69573903 427 high scalability-2008-10-22-Server load balancing architectures, Part 2: Application-level load balancing
11 0.69561875 869 high scalability-2010-07-30-Hot Scalability Links for July 30, 2010
12 0.69426715 364 high scalability-2008-08-14-Product: Terracotta - Open Source Network-Attached Memory
13 0.69217885 998 high scalability-2011-03-03-Stack Overflow Architecture Update - Now at 95 Million Page Views a Month
14 0.69124424 1533 high scalability-2013-10-16-Interview With Google's Ilya Grigorik On His New Book: High Performance Browser Networking
15 0.69084257 1123 high scalability-2011-09-23-The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month…
17 0.69072616 1609 high scalability-2014-03-11-Building a Social Music Service Using AWS, Scala, Akka, Play, MongoDB, and Elasticsearch
18 0.69007778 623 high scalability-2009-06-10-Dealing with multi-partition transactions in a distributed KV solution
19 0.68992019 1399 high scalability-2013-02-05-Ask HighScalability: Memcached and Relations
20 0.68920785 950 high scalability-2010-11-30-NoCAP – Part III – GigaSpaces clustering explained..