high_scalability high_scalability-2007 high_scalability-2007-19 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Replication Under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution From the abstract: Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or disks. RUSH algorithms distribute objects to servers according to user-specified server weighting. While all RUSH variants support addition of servers to the system, different variants have different characteristics with respect to lookup time in petabyte-scale systems, performance with mirroring (as opposed to redundancy codes), and storage server removal. All RUSH variants redistribute as few objects as possible when new servers are added or existing servers
sentIndex sentText sentNum sentScore
1 We have developed a family of decentralized algorithms, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or disks. [sent-2, score-1.205]
2 RUSH algorithms distribute objects to servers according to user-specified server weighting. [sent-3, score-0.791]
3 While all RUSH variants support addition of servers to the system, different variants have different characteristics with respect to lookup time in petabyte-scale systems, performance with mirroring (as opposed to redundancy codes), and storage server removal. [sent-4, score-1.69]
4 All RUSH variants redistribute as few objects as possible when new servers are added or existing servers are removed, and all variants guarantee that no two replicas of a particular object are ever placed on the same server. [sent-5, score-1.834]
5 Because there is no central directory, clients can compute data locations in parallel, allowing thousands of clients to access objects on thousands of servers simultaneously. [sent-6, score-1.019]
wordName wordTfidf (topN-words)
[('rush', 0.435), ('variants', 0.427), ('objects', 0.266), ('decentralized', 0.229), ('family', 0.218), ('hashing', 0.207), ('algorithms', 0.192), ('thousandsof', 0.187), ('imbalance', 0.176), ('mirroring', 0.142), ('clients', 0.14), ('opposed', 0.135), ('object', 0.135), ('codes', 0.127), ('replication', 0.109), ('servers', 0.107), ('placed', 0.103), ('scalable', 0.102), ('removing', 0.102), ('removed', 0.1), ('lookup', 0.098), ('characteristics', 0.096), ('replicas', 0.096), ('locations', 0.094), ('abstract', 0.091), ('according', 0.089), ('redundancy', 0.087), ('distribute', 0.087), ('device', 0.079), ('replicated', 0.078), ('collection', 0.074), ('developed', 0.07), ('either', 0.067), ('typical', 0.065), ('particular', 0.064), ('storage', 0.061), ('addition', 0.06), ('compute', 0.06), ('fully', 0.058), ('adding', 0.058), ('thousands', 0.058), ('existing', 0.058), ('components', 0.057), ('data', 0.056), ('results', 0.054), ('server', 0.05), ('possible', 0.044), ('built', 0.042), ('access', 0.04), ('best', 0.038)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 19 high scalability-2007-07-16-Paper: Replication Under Scalable Hashing
Introduction: Replication Under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution From the abstract: Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or disks. RUSH algorithms distribute objects to servers according to user-specified server weighting. While all RUSH variants support addition of servers to the system, different variants have different characteristics with respect to lookup time in petabyte-scale systems, performance with mirroring (as opposed to redundancy codes), and storage server removal. All RUSH variants redistribute as few objects as possible when new servers are added or existing servers
2 0.18595898 973 high scalability-2011-01-14-Stuff The Internet Says On Scalability For January 14, 2011
Introduction: Submitted for your reading pleasure... On the new year Twitter set a record with 6,939 Tweets Per Second (TPS). Cool video visualizing New Year's Eve Tweet data across the world. Marko Rodriguez in Memoirs of a Graph Addict: Despair to Redemption tells a stirring tale of how graph programming saved the world from certain destruction by realizing Aritstotle's dream of an eudaimonia-driven society. Could a relational database do that? The tools of the revolution can be found at tinkerprop.com , which describes a databases agnostic stack for working with property graphs, they include Blueprints - a property graph model interface; Pipes - a dataflow netowork using process grapphs; Gremlin - a graph based programming language; Rexster - a RESTful graph shell. The never never ending battle of good versus evil has nothing on programmers arguing about bracket policies or sync vs async programming models. In this node.js thread, I love async, but I can't code like this , the batt
Introduction: Consistent hashing is one of those ideas that really puts the science in computer science and reminds us why all those really smart people spend years slaving over algorithms. Consistent hashing is "a scheme that provides hash table functionality in a way that the addition or removal of one slot does not significantly change the mapping of keys to slots" and was originally a way of distributing requests among a changing population of web servers. My first reaction to the idea was "wow, that's really smart" and I sadly realized I would never come up with something so elegant. I then immediately saw applications for it everywhere. And consistent hashing is used everywhere: distributed hash tables, overlay networks, P2P, IM, caching, and CDNs. Here's the abstract from the original paper and after the abstract are some links to a few very good articles with accessible explanations of consistent hashing and its applications in the real world. Abstract: We describe a family of caching
4 0.12974919 892 high scalability-2010-09-02-Distributed Hashing Algorithms by Example: Consistent Hashing
Introduction: Consistent Hashing is a specific implementation of hashing that is well suited for many of today’s web-scale load balancing problems. Specifically, it can be seen in use in various caching solutions like Memcached and is applicable to NoSQL solutions as well. Consistent Hashing is used particularly because it provides a solution for the typical “hashcode mod n” method of distributing keys across a series of servers. It does this by allowing servers to be added or removed without significantly upsetting the distribution of keys, nor does it require that all keys be rehashed to accommodate the change in the number of servers. You can read the full store here .
5 0.12842001 125 high scalability-2007-10-18-another approach to replication
Introduction: File replication based on erasure codes can reduce total replicas size 2 times and more.
6 0.12310652 373 high scalability-2008-08-29-Product: ScaleOut StateServer is Memcached on Steroids
7 0.11482569 542 high scalability-2009-03-17-IBM WebSphere eXtreme Scale (IMDG)
8 0.10573222 831 high scalability-2010-05-26-End-To-End Performance Study of Cloud Services
9 0.090702131 1198 high scalability-2012-02-24-Stuff The Internet Says On Scalability For February 24, 2012
10 0.085939139 364 high scalability-2008-08-14-Product: Terracotta - Open Source Network-Attached Memory
11 0.085733779 1413 high scalability-2013-02-27-42 Monster Problems that Attack as Loads Increase
12 0.08379437 784 high scalability-2010-02-25-Paper: High Performance Scalable Data Stores
14 0.079406552 1483 high scalability-2013-06-27-Paper: XORing Elephants: Novel Erasure Codes for Big Data
15 0.076937392 589 high scalability-2009-05-05-Drop ACID and Think About Data
16 0.076216698 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
17 0.070600569 619 high scalability-2009-06-05-HotPads Shows the True Cost of Hosting on Amazon
18 0.070397094 1521 high scalability-2013-09-23-Salesforce Architecture - How they Handle 1.3 Billion Transactions a Day
19 0.069349721 1599 high scalability-2014-02-19-Planetary-Scale Computing Architectures for Electronic Trading and How Algorithms Shape Our World
topicId topicWeight
[(0, 0.101), (1, 0.045), (2, 0.016), (3, -0.031), (4, -0.023), (5, 0.076), (6, 0.049), (7, -0.04), (8, -0.056), (9, 0.025), (10, 0.008), (11, -0.004), (12, -0.023), (13, -0.003), (14, 0.005), (15, 0.035), (16, 0.01), (17, 0.019), (18, 0.02), (19, -0.014), (20, -0.015), (21, 0.049), (22, -0.015), (23, -0.018), (24, -0.036), (25, -0.055), (26, 0.043), (27, 0.01), (28, 0.007), (29, -0.008), (30, -0.021), (31, -0.01), (32, -0.007), (33, -0.016), (34, -0.038), (35, -0.031), (36, 0.049), (37, -0.021), (38, 0.006), (39, -0.006), (40, 0.039), (41, -0.007), (42, 0.029), (43, 0.004), (44, -0.067), (45, 0.0), (46, -0.014), (47, 0.027), (48, 0.048), (49, 0.008)]
simIndex simValue blogId blogTitle
same-blog 1 0.91426748 19 high scalability-2007-07-16-Paper: Replication Under Scalable Hashing
Introduction: Replication Under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution From the abstract: Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or disks. RUSH algorithms distribute objects to servers according to user-specified server weighting. While all RUSH variants support addition of servers to the system, different variants have different characteristics with respect to lookup time in petabyte-scale systems, performance with mirroring (as opposed to redundancy codes), and storage server removal. All RUSH variants redistribute as few objects as possible when new servers are added or existing servers
Introduction: Consistent hashing is one of those ideas that really puts the science in computer science and reminds us why all those really smart people spend years slaving over algorithms. Consistent hashing is "a scheme that provides hash table functionality in a way that the addition or removal of one slot does not significantly change the mapping of keys to slots" and was originally a way of distributing requests among a changing population of web servers. My first reaction to the idea was "wow, that's really smart" and I sadly realized I would never come up with something so elegant. I then immediately saw applications for it everywhere. And consistent hashing is used everywhere: distributed hash tables, overlay networks, P2P, IM, caching, and CDNs. Here's the abstract from the original paper and after the abstract are some links to a few very good articles with accessible explanations of consistent hashing and its applications in the real world. Abstract: We describe a family of caching
3 0.71334356 373 high scalability-2008-08-29-Product: ScaleOut StateServer is Memcached on Steroids
Introduction: ScaleOut StateServer is an in-memory distributed cache across a server farm or compute grid. Unlike middleware vendors, StateServer is aims at being a very good data cache, it doesn't try to handle job scheduling as well. StateServer is what you might get when you take Memcached and merge in all the value added distributed caching features you've ever dreamed of. True, Memcached is free and ScaleOut StateServer is very far from free, but for those looking a for a satisfying out-of-the-box experience, StateServer may be just the caching solution you are looking for. Yes, "solution" is one of those "oh my God I'm going to pay through the nose" indicator words, but it really applies here. Memcached is a framework whereas StateServer has already prepackaged most features you would need to add through your own programming efforts. Why use a distributed cache? Because it combines the holly quadrinity of computing: better performance, linear scalability, high availability, and fast applica
4 0.71236402 889 high scalability-2010-08-30-Pomegranate - Storing Billions and Billions of Tiny Little Files
Introduction: Pomegranate is a novel distributed file system built over distributed tabular storage that acts an awful lot like a NoSQL system. It's targeted at increasing the performance of tiny object access in order to support applications like online photo and micro-blog services, which require high concurrency, high throughput, and low latency. Their tests seem to indicate it works: We have demonstrate that file system over tabular storage performs well for highly concurrent access. In our test cluster, we observed linearly increased more than 100,000 aggregate read and write requests served per second ( RPS ). Rather than sitting atop the file system like almost every other K-V store, Pomegranate is baked into file system. The idea is that the file system API is common to every platform so it wouldn't require a separate API to use. Every application could use it out of the box. The features of Pomegranate are: It handles billions of small files efficiently, even in on
5 0.70865864 705 high scalability-2009-09-16-Paper: A practical scalable distributed B-tree
Introduction: We've seen a lot of NoSQL action lately built around distributed hash tables. Btrees are getting jealous. Btrees, once the king of the database world, want their throne back. Paul Buchheit surfaced a paper: A practical scalable distributed B-tree by Marcos K. Aguilera and Wojciech Golab, that might help spark a revolution. From the Abstract: We propose a new algorithm for a practical, fault tolerant, and scalable B-tree distributed over a set of servers. Our algorithm supports practical features not present in prior work: transactions that allow atomic execution of multiple operations over multiple B-trees, online migration of B-tree nodes between servers, and dynamic addition and removal of servers. Moreover, our algorithm is conceptually simple: we use transactions to manipulate B-tree nodes so that clients need not use complicated concurrency and locking protocols used in prior work. To execute these transactions quickly, we rely on three techniques: (1) We use optimistic
6 0.70571357 1463 high scalability-2013-05-23-Paper: Calvin: Fast Distributed Transactions for Partitioned Database Systems
7 0.67090279 892 high scalability-2010-09-02-Distributed Hashing Algorithms by Example: Consistent Hashing
8 0.65840679 103 high scalability-2007-09-28-Kosmos File System (KFS) is a New High End Google File System Option
9 0.65038007 696 high scalability-2009-09-07-Product: Infinispan - Open Source Data Grid
10 0.64676964 542 high scalability-2009-03-17-IBM WebSphere eXtreme Scale (IMDG)
11 0.64512289 1236 high scalability-2012-04-30-Masstree - Much Faster than MongoDB, VoltDB, Redis, and Competitive with Memcached
13 0.63698518 528 high scalability-2009-03-06-Product: Lightcloud - Key-Value Database
14 0.63309449 979 high scalability-2011-01-27-Comet - An Example of the New Key-Code Databases
15 0.63287985 125 high scalability-2007-10-18-another approach to replication
16 0.62875694 1299 high scalability-2012-08-06-Paper: High-Performance Concurrency Control Mechanisms for Main-Memory Databases
17 0.62870198 368 high scalability-2008-08-17-Wuala - P2P Online Storage Cloud
18 0.62505889 1138 high scalability-2011-11-07-10 Core Architecture Pattern Variations for Achieving Scalability
19 0.61957651 122 high scalability-2007-10-14-Product: The Spread Toolkit
20 0.6186378 983 high scalability-2011-02-02-Piccolo - Building Distributed Programs that are 11x Faster than Hadoop
topicId topicWeight
[(1, 0.095), (2, 0.191), (10, 0.051), (40, 0.118), (79, 0.085), (81, 0.226), (94, 0.1)]
simIndex simValue blogId blogTitle
same-blog 1 0.85208338 19 high scalability-2007-07-16-Paper: Replication Under Scalable Hashing
Introduction: Replication Under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution From the abstract: Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or disks. RUSH algorithms distribute objects to servers according to user-specified server weighting. While all RUSH variants support addition of servers to the system, different variants have different characteristics with respect to lookup time in petabyte-scale systems, performance with mirroring (as opposed to redundancy codes), and storage server removal. All RUSH variants redistribute as few objects as possible when new servers are added or existing servers
2 0.810215 540 high scalability-2009-03-16-Cisco and Sun to Compete for Unified Computing?
Introduction: A recent InfoWorld article claims that "With Cisco expected to enter the blade market and Sun expected to offer networking equipment, things could get interesting awfully fast." How does this effect your infrastructure strategy and decisions? Would you consider to build scalable web applications on the Cisco Unified Computing System? Or would you consider to build a router out of a server with the use of OpenSolaris and Project Crossbow as the article suggests? Will any of these initiatives change the way we build scalable web infrastructure or are these just attempts to sale these systems? What do you think?
3 0.7317124 773 high scalability-2010-02-06-GEO-aware traffic load balancing and caching at CNBC.com
Introduction: CNBC, like many large web sites, relied on a CDN for content delivery. Recently, we started looking to see if we could improve this model. Our criteria was: - improve response time - have better control over traffic (real time reporting, change management and alerting) - better utilize internal datacenters and their infrastructure - shield users from any troubles at the origin infrastructure - cost out After researching the market, we turned to two vendors: Dyn (Dynamic Network Services) and aiScaler . We' have had about a year worth of experience with aiScaler (search for "CNBC" to see my previous post ), but Dyn was a new vendor for us. We started building our relationship at Velocity conference in the summer of 2009. Dyn has recently started offering a geo-aware DNS load balancing solution, using Anycast and the distributed nature of their DNS presence to enable a key component of what we were trying to achieve: steer users to geographically closest origin
4 0.72926813 1471 high scalability-2013-06-06-Paper: Memory Barriers: a Hardware View for Software Hackers
Introduction: It's not often you get so enthusiastic a recommendation for a paper as Sergio Bossa gives Memory Barriers: a Hardware View for Software Hackers : If you only want to read one piece about CPUs architecture, cache coherency and memory barriers, make it this one. It is a clear and well written article. It even has a quiz. What's it about? So what possessed CPU designers to cause them to inflict memory barriers on poor unsuspecting SMP software designers? In short, because reordering memory references allows much better performance, and so memory barriers are needed to force ordering in things like synchronization primitives whose correct operation depends on ordered memory references. Getting a more detailed answer to this question requires a good understanding of how CPU caches work, and especially what is required to make caches really work well. The following sections: present the structure of a cache, describe how cache-coherency protocols ensure that CPUs agree on t
Introduction: In Algorithm Design for Performance Aware VM Consolidation we learn some shocking facts (gambling in Casablanca?): Average server utilization in many data centers is low, estimated between 5% and 15%. This is wasteful because an idle server often consumes more than 50% of peak power. Surely that's just for old style datacenters? Nope. In Google data centers, workloads that are consolidated use only 50% of the processor cores. Every other processor core is left unused simply to ensure that performance does not degrade. It's a VM wasteland. The goal is to reduce waste by packing VMs onto machines without hurting performance or wasting resources. The idea is to select VMs that interfere the least with each other and places them together on the same server. It's a NP-Complete problem, but this paper describes a practical method that performs provably close to the optimal. Interestingly they can optimize for performance or power efficiency, so you can use different algorithm
6 0.72118139 879 high scalability-2010-08-12-Think of Latency as a Pseudo-permanent Network Partition
7 0.71711648 1375 high scalability-2012-12-21-Stuff The Internet Says On Scalability For December 21, 2012
8 0.71157855 330 high scalability-2008-05-27-Should Twitter be an All-You-Can-Eat Buffet or a Vending Machine?
9 0.71146584 778 high scalability-2010-02-15-The Amazing Collective Compute Power of the Ambient Cloud
10 0.71030372 1492 high scalability-2013-07-17-How do you create a 100th Monkey software development culture?
11 0.70834756 482 high scalability-2009-01-04-Alternative Memcache Usage: A Highly Scalable, Highly Available, In-Memory Shard Index
12 0.70555848 985 high scalability-2011-02-08-Mollom Architecture - Killing Over 373 Million Spams at 100 Requests Per Second
13 0.70143199 1516 high scalability-2013-09-13-Stuff The Internet Says On Scalability For September 13, 2013
14 0.70027411 768 high scalability-2010-02-01-What Will Kill the Cloud?
15 0.69283772 1174 high scalability-2012-01-13-Stuff The Internet Says On Scalability For January 13, 2012
16 0.68987238 1124 high scalability-2011-09-26-17 Techniques Used to Scale Turntable.fm and Labmeeting to Millions of Users
17 0.6892634 1223 high scalability-2012-04-06-Stuff The Internet Says On Scalability For April 6, 2012
18 0.68867081 757 high scalability-2010-01-04-11 Strategies to Rock Your Startup’s Scalability in 2010
20 0.68760437 1466 high scalability-2013-05-29-Amazon: Creating a Customer Utopia One Culture Hack at a Time