high_scalability high_scalability-2008 high_scalability-2008-280 knowledge-graph by maker-knowledge-mining

280 high scalability-2008-03-17-Paper: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web

meta infos for this blog

Source: html

Introduction: Consistent hashing is one of those ideas that really puts the science in computer science and reminds us why all those really smart people spend years slaving over algorithms. Consistent hashing is "a scheme that provides hash table functionality in a way that the addition or removal of one slot does not significantly change the mapping of keys to slots" and was originally a way of distributing requests among a changing population of web servers. My first reaction to the idea was "wow, that's really smart" and I sadly realized I would never come up with something so elegant. I then immediately saw applications for it everywhere. And consistent hashing is used everywhere: distributed hash tables, overlay networks, P2P, IM, caching, and CDNs. Here's the abstract from the original paper and after the abstract are some links to a few very good articles with accessible explanations of consistent hashing and its applications in the real world. Abstract: We describe a family of caching

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Consistent hashing is one of those ideas that really puts the science in computer science and reminds us why all those really smart people spend years slaving over algorithms. [sent-1, score-0.687]

2 Consistent hashing is "a scheme that provides hash table functionality in a way that the addition or removal of one slot does not significantly change the mapping of keys to slots" and was originally a way of distributing requests among a changing population of web servers. [sent-2, score-0.842]

3 My first reaction to the idea was "wow, that's really smart" and I sadly realized I would never come up with something so elegant. [sent-3, score-0.137]

4 And consistent hashing is used everywhere: distributed hash tables, overlay networks, P2P, IM, caching, and CDNs. [sent-5, score-1.209]

5 Here's the abstract from the original paper and after the abstract are some links to a few very good articles with accessible explanations of consistent hashing and its applications in the real world. [sent-6, score-1.11]

6 Abstract: We describe a family of caching protocols for distributed networks that can be used to decrease or eliminate the occurrence of hot spots in the network. [sent-7, score-1.139]

7 Our protocols are particularly designed for use with very large networks such as the Internet, where delays caused by hot spots can be severe, and where it is not feasible for every server to have complete information about the current state of the entire network. [sent-8, score-0.791]

8 The protocols are easy to implement using existing network protocols such as TCP/IP, and require very little overhead. [sent-9, score-0.634]

9 The protocols work with local control, make efficient use of existing resources, and scale gracefully as the network grows. [sent-10, score-0.38]

10 Our caching protocols are based on a special kind of hashing that we call consistent hashing. [sent-11, score-1.275]

11 Roughly speaking, a consistent hash function is one which changes minimally as the range of the function changes. [sent-12, score-0.889]

12 Through the development of good consistent hash functions, we are able to develop caching protocols which do not require users to have a current or even consistent view of the network. [sent-13, score-1.537]

13 We believe that consistent hash functions may eventually prove to be useful in other applications such as distributed name servers and/or quorum systems. [sent-14, score-1.025]

14 Other excellent resources for learning more about consistent hashing are at: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web Consistent Hashing by Tom White. [sent-15, score-0.812]

15 A good explanation and some actual Java code as an implementation. [sent-16, score-0.076]

16 Another good explanation with an emphasis on useful applications: load distribution on failure, load tuning by capacity, method for bringing servers on line, redundant caching to protect the database in case of failure. [sent-18, score-0.353]

17 Distributed Hash Tables : an infrastructure that can be used to build more complex services, such as distributed file systems, peer-to-peer file sharing and content distribution systems, cooperative web caching, multicast, anycast, domain name services, and instant messaging. [sent-19, score-0.362]

18 Notable distributed networks that use DHTs include BitTorrent (with extensions), eDonkey network, YaCy, and the Coral Content Distribution Network. [sent-20, score-0.201]

19 It allows a distributed set of participants to agree on a single node as a rendezvous point for a given key, without any central coordination. [sent-22, score-0.166]

20 Dynamo , Amazon's database uses consistent hashing. [sent-23, score-0.415]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('consistent', 0.415), ('hashing', 0.397), ('protocols', 0.317), ('hash', 0.244), ('spots', 0.188), ('hashingby', 0.179), ('caching', 0.146), ('distribution', 0.131), ('tom', 0.129), ('hot', 0.11), ('networks', 0.105), ('family', 0.104), ('distributed', 0.096), ('coral', 0.09), ('abstract', 0.088), ('relieving', 0.084), ('minimally', 0.08), ('slot', 0.08), ('science', 0.078), ('anycast', 0.077), ('dhts', 0.077), ('sadly', 0.077), ('explanation', 0.076), ('function', 0.075), ('occurrence', 0.073), ('slots', 0.073), ('functions', 0.071), ('feasible', 0.071), ('quorum', 0.071), ('severe', 0.071), ('toolbox', 0.071), ('participants', 0.07), ('wow', 0.07), ('tables', 0.069), ('name', 0.068), ('bittorrent', 0.067), ('cooperative', 0.067), ('reminds', 0.067), ('smart', 0.067), ('multicast', 0.064), ('extensions', 0.063), ('gracefully', 0.063), ('notable', 0.063), ('explanations', 0.062), ('removal', 0.061), ('reaction', 0.06), ('applications', 0.06), ('population', 0.06), ('im', 0.059), ('overlay', 0.057)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999982 280 high scalability-2008-03-17-Paper: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web

2 0.35645604 892 high scalability-2010-09-02-Distributed Hashing Algorithms by Example: Consistent Hashing

Introduction: Consistent Hashing is a specific implementation of hashing that is well suited for many of today’s web-scale load balancing problems. Specifically, it can be seen in use in various caching solutions like Memcached and is applicable to NoSQL solutions as well. Consistent Hashing is used particularly because it provides a solution for the typical “hashcode mod n” method of distributing keys across a series of servers. It does this by allowing servers to be added or removed without significantly upsetting the distribution of keys, nor does it require that all keys be rehashed to accommodate the change in the number of servers. You can read the full store here .

3 0.16569948 589 high scalability-2009-05-05-Drop ACID and Think About Data

Introduction: The abstract for the talk given by Bob Ippolito, co-founder and CTO of Mochi Media, Inc: Building large systems on top of a traditional single-master RDBMS data storage layer is no longer good enough. This talk explores the landscape of new technologies available today to augment your data layer to improve performance and reliability. Is your application a good fit for caches, bloom filters, bitmap indexes, column stores, distributed key/value stores, or document databases? Learn how they work (in theory and practice) and decide for yourself. Bob does an excellent job highlighting different products and the key concepts to understand when pondering the wide variety of new database offerings. It's unlikely you'll be able to say oh, this is the database for me after watching the presentation, but you will be much better informed on your options. And I imagine slightly confused as to what to do :-) An interesting observation in the talk is that the more robust products are internal

4 0.16185749 889 high scalability-2010-08-30-Pomegranate - Storing Billions and Billions of Tiny Little Files

Introduction: Pomegranate is a novel distributed file system built over distributed tabular storage that acts an awful lot like a NoSQL system. It's targeted at increasing the performance of tiny object access in order to support applications like online photo and micro-blog services, which require high concurrency, high throughput, and low latency. Their tests seem to indicate it works: We have demonstrate that file system over tabular storage performs well for highly concurrent access. In our test cluster, we observed linearly increased more than 100,000 aggregate read and write requests served per second ( RPS ). Rather than sitting atop the file system like almost every other K-V store, Pomegranate is baked into file system. The idea is that the file system API is common to every platform so it wouldn't require a separate API to use. Every application could use it out of the box. The features of Pomegranate are: It handles billions of small files efficiently, even in on

5 0.15151589 19 high scalability-2007-07-16-Paper: Replication Under Scalable Hashing

Introduction: Replication Under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution From the abstract: Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or disks. RUSH algorithms distribute objects to servers according to user-specified server weighting. While all RUSH variants support addition of servers to the system, different variants have different characteristics with respect to lookup time in petabyte-scale systems, performance with mirroring (as opposed to redundancy codes), and storage server removal. All RUSH variants redistribute as few objects as possible when new servers are added or existing servers

6 0.12605447 1205 high scalability-2012-03-07-Scale Indefinitely on S3 With These Secrets of the S3 Masters

7 0.11994835 651 high scalability-2009-07-02-Product: Project Voldemort - A Distributed Database

8 0.11945918 1334 high scalability-2012-10-04-Stuff The Internet Says On Scalability For October 5, 2012

9 0.11042538 315 high scalability-2008-05-05-HSCALE - Handling 200 Million Transactions Per Month Using Transparent Partitioning With MySQL Proxy

10 0.10830554 1142 high scalability-2011-11-14-Using Gossip Protocols for Failure Detection, Monitoring, Messaging and Other Good Things

11 0.10012133 1146 high scalability-2011-11-23-Paper: Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS

12 0.099210858 787 high scalability-2010-03-03-Hot Scalability Links for March 3, 2010

13 0.099203609 1063 high scalability-2011-06-17-Stuff The Internet Says On Scalability For June 17, 2011

14 0.098510452 883 high scalability-2010-08-20-Hot Scalability Links For Aug 20, 2010

15 0.097852699 468 high scalability-2008-12-17-Ringo - Distributed key-value storage for immutable data

16 0.096480846 973 high scalability-2011-01-14-Stuff The Internet Says On Scalability For January 14, 2011

17 0.096388973 1017 high scalability-2011-04-06-Netflix: Run Consistency Checkers All the time to Fixup Transactions

18 0.095913641 1022 high scalability-2011-04-13-Paper: NoSQL Databases - NoSQL Introduction and Overview

19 0.093729451 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?

20 0.093654372 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.16), (1, 0.072), (2, -0.021), (3, 0.024), (4, 0.004), (5, 0.052), (6, 0.021), (7, -0.015), (8, -0.079), (9, 0.055), (10, 0.022), (11, 0.018), (12, -0.083), (13, -0.03), (14, 0.006), (15, 0.026), (16, 0.046), (17, 0.01), (18, 0.022), (19, -0.088), (20, -0.001), (21, 0.113), (22, -0.029), (23, 0.061), (24, -0.066), (25, -0.031), (26, 0.038), (27, 0.029), (28, 0.016), (29, -0.051), (30, -0.02), (31, -0.03), (32, -0.026), (33, -0.007), (34, -0.017), (35, -0.05), (36, -0.01), (37, -0.017), (38, 0.034), (39, -0.044), (40, 0.003), (41, 0.03), (42, 0.04), (43, -0.022), (44, -0.048), (45, 0.062), (46, -0.017), (47, 0.044), (48, 0.008), (49, 0.062)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96579409 280 high scalability-2008-03-17-Paper: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web

2 0.72891897 705 high scalability-2009-09-16-Paper: A practical scalable distributed B-tree

Introduction: We've seen a lot of NoSQL action lately built around distributed hash tables. Btrees are getting jealous. Btrees, once the king of the database world, want their throne back. Paul Buchheit surfaced a paper: A practical scalable distributed B-tree by Marcos K. Aguilera and Wojciech Golab, that might help spark a revolution. From the Abstract: We propose a new algorithm for a practical, fault tolerant, and scalable B-tree distributed over a set of servers. Our algorithm supports practical features not present in prior work: transactions that allow atomic execution of multiple operations over multiple B-trees, online migration of B-tree nodes between servers, and dynamic addition and removal of servers. Moreover, our algorithm is conceptually simple: we use transactions to manipulate B-tree nodes so that clients need not use complicated concurrency and locking protocols used in prior work. To execute these transactions quickly, we rely on three techniques: (1) We use optimistic

3 0.72736925 529 high scalability-2009-03-10-Paper: Consensus Protocols: Paxos

Introduction: Update: Barbara Liskov’s Turing Award, and Byzantine Fault Tolerance . Henry Robinson has created an excellent series of articles on consensus protocols. We already covered his 2 Phase Commit article and he also has a 3 Phase Commit article showing how to handle 2PC under single node failures. But that is not enough! 3PC works well under node failures, but fails for network failures. So another consensus mechanism is needed that handles both network and node failures. And that's Paxos . Paxos correctly handles both types of failures, but it does this by becoming inaccessible if too many components fail. This is the "liveness" property of protocols. Paxos waits until the faults are fixed. Read queries can be handled, but updates will be blocked until the protocol thinks it can make forward progress. The liveness of Paxos is primarily dependent on network stability. In a distributed heterogeneous environment you are at risk of losing the ability to make updates. Users hate t

4 0.72622234 1611 high scalability-2014-03-12-Paper: Scalable Eventually Consistent Counters over Unreliable Networks

Introduction: Counting at scale in a distributed environment is surprisingly hard . And it's a subject we've covered before in various ways: Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory , How to update video views count effectively? , Numbers Everyone Should Know (sharded counters) . Kellabyte (which is an excellent blog) in Scalable Eventually Consistent Counters talks about how the Cassandra counter implementation scores well on the scalability and high availability front, but in so doing has "over and under counting problem in partitioned environments." Which is often fine. But if you want more accuracy there's a PN-counter, which is a CRDT (convergent replicated data type) where "all the changes made to a counter on each node rather than storing and modifying a single value so that you can merge all the values into the proper final value. Of course the trade-off here is additional storage and processing but there are ways to optimize this."

5 0.70429397 676 high scalability-2009-08-08-Yahoo!'s PNUTS Database: Too Hot, Too Cold or Just Right?

Introduction: So far every massively scalable database is a bundle of compromises. For some the weak guarantees of Amazon's eventual consistency model are too cold. For many the strong guarantees of standard RDBMS distributed transactions are too hot. Google App Engine tries to get it just right with entity groups . Yahoo! is also trying to get is just right by offering per-record timeline consistency, which hopes to serve up a heaping bowl of rich database functionality and low latency at massive scale : We describe PNUTS [Platform for Nimble Universal Table Storage], a massively parallel and geographically distributed database system for Yahoo!â€™s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of con-current requests including updates and queries, and novel per-record consistency guarantees. It is a hosted, centrally managed, and geographically distributed service, and utilizes automated load-balancing and failover to redu

6 0.68580782 1142 high scalability-2011-11-14-Using Gossip Protocols for Failure Detection, Monitoring, Messaging and Other Good Things

7 0.67406124 1146 high scalability-2011-11-23-Paper: Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS

8 0.67396069 892 high scalability-2010-09-02-Distributed Hashing Algorithms by Example: Consistent Hashing

9 0.67113018 890 high scalability-2010-09-01-Paper: The Case for Determinism in Database Systems

10 0.66477799 1629 high scalability-2014-04-10-Paper: Scalable Atomic Visibility with RAMP Transactions - Scale Linearly to 100 Servers

11 0.66437292 889 high scalability-2010-08-30-Pomegranate - Storing Billions and Billions of Tiny Little Files

12 0.66350585 1463 high scalability-2013-05-23-Paper: Calvin: Fast Distributed Transactions for Partitioned Database Systems

13 0.66233027 122 high scalability-2007-10-14-Product: The Spread Toolkit

14 0.66214222 1374 high scalability-2012-12-18-Georeplication: When Bad Things Happen to Good Systems

15 0.65741324 507 high scalability-2009-02-03-Paper: Optimistic Replication

16 0.65507042 1459 high scalability-2013-05-16-Paper: Warp: Multi-Key Transactions for Key-Value Stores

17 0.6531055 979 high scalability-2011-01-27-Comet - An Example of the New Key-Code Databases

18 0.64269823 1236 high scalability-2012-04-30-Masstree - Much Faster than MongoDB, VoltDB, Redis, and Competitive with Memcached

19 0.63539743 983 high scalability-2011-02-02-Piccolo - Building Distributed Programs that are 11x Faster than Hadoop

20 0.63367015 497 high scalability-2009-01-19-Papers: Readings in Distributed Systems

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.184), (2, 0.202), (10, 0.015), (40, 0.315), (47, 0.011), (56, 0.013), (61, 0.109), (79, 0.051)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.94167167 860 high scalability-2010-07-17-Hot Scalability Links for July 17, 2010

Introduction: And by hot I also mean temperature. Summer has arrived. It's sizzling here in Silicon Valley. Thank you air conditioning! Scale the web by appointing a Crawler Czar? Tom Foremski has the idea that Google should open up their index so sites wouldn't have to endure the constant pounding by ravenous crawler bots. Don MacAskill of SmugMug estimates 50% of our web server CPU resources are spent serving crawlers. What a waste. How this would all work with real-time feeds, paid feeds (Twitter, movies, ...), etc. is unknown, but does it make sense for all that money to be spent on extracting the same data over and over again? Tweets of Gold: jamesurquhart : Key to applications is architecture. Key for infrastructure supporting archs is configurability. Configurability==features . tjake : People who choose their datastore based oh hearsay and not their own evaluation are doomed . b6n : No global lock ever goes unpunished. MichaelSurt

2 0.93582821 1419 high scalability-2013-03-07-It's a VM Wasteland - A Near Optimal Packing of VMs to Machines Reduces TCO by 22%

Introduction: In Algorithm Design for Performance Aware VM Consolidation we learn some shocking facts (gambling in Casablanca?): Average server utilization in many data centers is low, estimated between 5% and 15%. This is wasteful because an idle server often consumes more than 50% of peak power. Surely that's just for old style datacenters? Nope. In Google data centers, workloads that are consolidated use only 50% of the processor cores. Every other processor core is left unused simply to ensure that performance does not degrade. It's a VM wasteland. The goal is to reduce waste by packing VMs onto machines without hurting performance or wasting resources. The idea is to select VMs that interfere the least with each other and places them together on the same server. It's a NP-Complete problem, but this paper describes a practical method that performs provably close to the optimal. Interestingly they can optimize for performance or power efficiency, so you can use different algorithm

3 0.93432933 27 high scalability-2007-07-25-Product: 3 PAR REMOTE COPY

Introduction: 3PAR Remote Copy is a uniquely simple and efficient replication technology that allows customers to protect and share any application data affordably. Built upon 3PAR Thin Copy technology, Remote Copy lowers the total cost of storage by addressing the cost and complexity of remote replication. Common Uses of 3PAR Remote Copy: Affordable Disaster Recovery: Mirror data cost-effectively across town or across the world. Centralized Archive: Replicate data from multiple 3PAR InServs located in multiple data centers to a centralized data archive location. Resilient Pod Architecture: Mutually replicate tier 1 or 2 data to tier 3 capacity between two InServs (application pods). Remote Data Access: Replicate data to a remote location for sharing of data with remote users.

4 0.92486399 482 high scalability-2009-01-04-Alternative Memcache Usage: A Highly Scalable, Highly Available, In-Memory Shard Index

Introduction: While working with Memcache the other night, it dawned on me that it’s usage as a distributed caching mechanism was really just one of many ways to use it. That there are in fact many alternative usages that one could find for Memcache if they could just realize what Memcache really is at its core – a simple distributed hash-table – is an important point worthy of further discussion. To be clear, when I say “simple”, by no means am I implying that Memcache’s implementation is simple, just that the ideas behind it are such. Think about that for a minute. What else could we use a simple distributed hash-table for, besides caching? How about using it as an alternative to the traditional shard lookup method we used in our Master Index Lookup scalability strategy, discussed previously here.

5 0.91756505 402 high scalability-2008-10-05-Paper: Scalability Design Patterns

Introduction: I have introduced pattern languages in my earlier post on The Pattern Bible for Distributed Computing . Achieving highest possible scalability is a complex combination of many factors. This PLoP 2007 paper presents a pattern language that can be used to make a system highly scalable. The Scalability Pattern Language introduced by Kanwardeep Singh Ahluwalia includes patterns to: Introduce Scalability Optimize Algorithm Add Hardware Add Parallelism Add Intra-Process Parallelism Add Inter-Porcess Parallelism Add Hybrid Parallelism Optimize Decentralization Control Shared Resources Automate Scalability

6 0.91219944 1466 high scalability-2013-05-29-Amazon: Creating a Customer Utopia One Culture Hack at a Time

same-blog 7 0.90128672 280 high scalability-2008-03-17-Paper: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web

8 0.90104252 1414 high scalability-2013-03-01-Stuff The Internet Says On Scalability For February 29, 2013

9 0.90029657 848 high scalability-2010-06-25-Hot Scalability Links for June 25, 2010

10 0.88411289 778 high scalability-2010-02-15-The Amazing Collective Compute Power of the Ambient Cloud

11 0.87958544 330 high scalability-2008-05-27-Should Twitter be an All-You-Can-Eat Buffet or a Vending Machine?

12 0.8769471 1471 high scalability-2013-06-06-Paper: Memory Barriers: a Hardware View for Software Hackers

13 0.86945575 300 high scalability-2008-04-07-Scalr - Open Source Auto-scaling Hosting on Amazon EC2

14 0.86464608 338 high scalability-2008-06-02-Total Cost of Ownership for different web development frameworks

15 0.85037494 757 high scalability-2010-01-04-11 Strategies to Rock Your Startup’s Scalability in 2010

16 0.83297688 768 high scalability-2010-02-01-What Will Kill the Cloud?

17 0.8265512 879 high scalability-2010-08-12-Think of Latency as a Pseudo-permanent Network Partition

18 0.81167001 1492 high scalability-2013-07-17-How do you create a 100th Monkey software development culture?

19 0.80384868 97 high scalability-2007-09-18-Session management in highly scalable web sites

20 0.77835107 564 high scalability-2009-04-10-counting # of views, calculating most-least viewed