high_scalability high_scalability-2010 high_scalability-2010-892 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Consistent Hashing is a specific implementation of hashing that is well suited for many of today’s web-scale load balancing problems. Specifically, it can be seen in use in various caching solutions like Memcached and is applicable to NoSQL solutions as well. Consistent Hashing is used particularly because it provides a solution for the typical “hashcode mod n” method of distributing keys across a series of servers. It does this by allowing servers to be added or removed without significantly upsetting the distribution of keys, nor does it require that all keys be rehashed to accommodate the change in the number of servers. You can read the full store here .
sentIndex sentText sentNum sentScore
1 Consistent Hashing is a specific implementation of hashing that is well suited for many of today’s web-scale load balancing problems. [sent-1, score-1.102]
2 Specifically, it can be seen in use in various caching solutions like Memcached and is applicable to NoSQL solutions as well. [sent-2, score-0.843]
3 Consistent Hashing is used particularly because it provides a solution for the typical “hashcode mod n” method of distributing keys across a series of servers. [sent-3, score-1.504]
4 It does this by allowing servers to be added or removed without significantly upsetting the distribution of keys, nor does it require that all keys be rehashed to accommodate the change in the number of servers. [sent-4, score-1.476]
wordName wordTfidf (topN-words)
[('hashing', 0.485), ('keys', 0.386), ('storehere', 0.274), ('mod', 0.238), ('consistent', 0.208), ('accommodate', 0.197), ('suited', 0.197), ('applicable', 0.192), ('distributing', 0.173), ('solutions', 0.169), ('removed', 0.156), ('method', 0.139), ('specifically', 0.135), ('particularly', 0.132), ('significantly', 0.12), ('allowing', 0.116), ('distribution', 0.107), ('series', 0.105), ('typical', 0.102), ('seen', 0.099), ('balancing', 0.097), ('specific', 0.096), ('various', 0.095), ('implementation', 0.094), ('added', 0.093), ('require', 0.092), ('today', 0.085), ('memcached', 0.084), ('nosql', 0.079), ('provides', 0.072), ('caching', 0.068), ('full', 0.068), ('change', 0.065), ('solution', 0.064), ('read', 0.054), ('number', 0.054), ('across', 0.052), ('well', 0.049), ('without', 0.048), ('load', 0.047), ('servers', 0.042), ('used', 0.041), ('many', 0.037), ('use', 0.026), ('like', 0.025)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 892 high scalability-2010-09-02-Distributed Hashing Algorithms by Example: Consistent Hashing
Introduction: Consistent Hashing is a specific implementation of hashing that is well suited for many of today’s web-scale load balancing problems. Specifically, it can be seen in use in various caching solutions like Memcached and is applicable to NoSQL solutions as well. Consistent Hashing is used particularly because it provides a solution for the typical “hashcode mod n” method of distributing keys across a series of servers. It does this by allowing servers to be added or removed without significantly upsetting the distribution of keys, nor does it require that all keys be rehashed to accommodate the change in the number of servers. You can read the full store here .
Introduction: Consistent hashing is one of those ideas that really puts the science in computer science and reminds us why all those really smart people spend years slaving over algorithms. Consistent hashing is "a scheme that provides hash table functionality in a way that the addition or removal of one slot does not significantly change the mapping of keys to slots" and was originally a way of distributing requests among a changing population of web servers. My first reaction to the idea was "wow, that's really smart" and I sadly realized I would never come up with something so elegant. I then immediately saw applications for it everywhere. And consistent hashing is used everywhere: distributed hash tables, overlay networks, P2P, IM, caching, and CDNs. Here's the abstract from the original paper and after the abstract are some links to a few very good articles with accessible explanations of consistent hashing and its applications in the real world. Abstract: We describe a family of caching
3 0.15178747 1205 high scalability-2012-03-07-Scale Indefinitely on S3 With These Secrets of the S3 Masters
Introduction: In a great article, Amazon S3 Performance Tips & Tricks , Doug Grismore, Director of Storage Operations for AWS, has outed the secret arcana normally reserved for Premium Developer Support customers on how to really use S3: Size matters . Workloads with less than 50-100 total requests per second don't require any special effort. Customers that routinely perform thousands of requests per second need a plan. Automated partitioning . Automated systems scale S3 horizontally by continuously splitting data into partitions based on high request rates and the number of keys in a partition (which leads to slow lookups). Lessons you've learned with sharding may also apply to S3. Avoid hot spots . Like most sharding schemes, you want to avoid hot spots by the smart selection of key names. S3 objects are stored in buckets . Each object is identified using a key. Keys are kept in sorted order. Keys in S3 are partitioned by prefix. Objects that sort together are stored together, s
4 0.12974919 19 high scalability-2007-07-16-Paper: Replication Under Scalable Hashing
Introduction: Replication Under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution From the abstract: Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or disks. RUSH algorithms distribute objects to servers according to user-specified server weighting. While all RUSH variants support addition of servers to the system, different variants have different characteristics with respect to lookup time in petabyte-scale systems, performance with mirroring (as opposed to redundancy codes), and storage server removal. All RUSH variants redistribute as few objects as possible when new servers are added or existing servers
5 0.11779543 542 high scalability-2009-03-17-IBM WebSphere eXtreme Scale (IMDG)
Introduction: IBM WebSphere eXtreme Scale is IBMs in memory data grid product (IMDG). It can be used as a key-value store which partitions the keys (using a form of consistent hashing) over a set of servers such that each server is responsible for a subset of the keys. It automatically handles replication which can be either synchronous of asynchronous and handles advanced placement so that replicas can be placed in different physical zones when compared to the placement of the primary. Think buildings, racks, floor, data centers. It is fully elastic in that servers can be added and removed and it automatically redistributes the partition primaries and backups. It can be scaled from one server to hundreds if not thousands of JVMs in a single grid. Each additional server provides more CPU, memory capacity and network and it scales linearly with grid growth. It also has a key-graph mode where a graph of objects can be associated with a single key and it allows fine grained modification of that
6 0.11436669 651 high scalability-2009-07-02-Product: Project Voldemort - A Distributed Database
7 0.11409633 589 high scalability-2009-05-05-Drop ACID and Think About Data
8 0.10783824 728 high scalability-2009-10-26-Facebook's Memcached Multiget Hole: More machines != More Capacity
9 0.10489152 117 high scalability-2007-10-08-Paper: Understanding and Building High Availability-Load Balanced Clusters
10 0.10426407 872 high scalability-2010-08-05-Pairing NoSQL and Relational Data Storage: MySQL with MongoDB
11 0.10349065 877 high scalability-2010-08-12-Designing Web Applications for Scalability
12 0.095048591 971 high scalability-2011-01-10-Riak's Bitcask - A Log-Structured Hash Table for Fast Key-Value Data
13 0.093805961 468 high scalability-2008-12-17-Ringo - Distributed key-value storage for immutable data
14 0.091723882 906 high scalability-2010-09-22-Applying Scalability Patterns to Infrastructure Architecture
15 0.089663416 1236 high scalability-2012-04-30-Masstree - Much Faster than MongoDB, VoltDB, Redis, and Competitive with Memcached
16 0.089261554 514 high scalability-2009-02-18-Numbers Everyone Should Know
17 0.086745106 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
18 0.084913224 509 high scalability-2009-02-05-Product: HAProxy - The Reliable, High Performance TCP-HTTP Load Balancer
19 0.081957713 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
20 0.081307411 427 high scalability-2008-10-22-Server load balancing architectures, Part 2: Application-level load balancing
topicId topicWeight
[(0, 0.092), (1, 0.04), (2, -0.011), (3, -0.037), (4, 0.013), (5, 0.088), (6, 0.01), (7, -0.063), (8, -0.057), (9, 0.015), (10, 0.016), (11, 0.022), (12, -0.066), (13, 0.009), (14, -0.015), (15, 0.014), (16, 0.067), (17, -0.003), (18, 0.022), (19, -0.041), (20, -0.011), (21, 0.055), (22, 0.018), (23, 0.014), (24, -0.007), (25, -0.033), (26, 0.0), (27, 0.008), (28, -0.025), (29, -0.026), (30, 0.035), (31, -0.023), (32, 0.022), (33, -0.01), (34, -0.042), (35, -0.034), (36, 0.007), (37, -0.037), (38, 0.037), (39, -0.029), (40, 0.024), (41, 0.083), (42, 0.029), (43, -0.004), (44, 0.003), (45, 0.061), (46, 0.018), (47, 0.009), (48, 0.07), (49, 0.03)]
simIndex simValue blogId blogTitle
same-blog 1 0.97069728 892 high scalability-2010-09-02-Distributed Hashing Algorithms by Example: Consistent Hashing
Introduction: Consistent Hashing is a specific implementation of hashing that is well suited for many of today’s web-scale load balancing problems. Specifically, it can be seen in use in various caching solutions like Memcached and is applicable to NoSQL solutions as well. Consistent Hashing is used particularly because it provides a solution for the typical “hashcode mod n” method of distributing keys across a series of servers. It does this by allowing servers to be added or removed without significantly upsetting the distribution of keys, nor does it require that all keys be rehashed to accommodate the change in the number of servers. You can read the full store here .
Introduction: Consistent hashing is one of those ideas that really puts the science in computer science and reminds us why all those really smart people spend years slaving over algorithms. Consistent hashing is "a scheme that provides hash table functionality in a way that the addition or removal of one slot does not significantly change the mapping of keys to slots" and was originally a way of distributing requests among a changing population of web servers. My first reaction to the idea was "wow, that's really smart" and I sadly realized I would never come up with something so elegant. I then immediately saw applications for it everywhere. And consistent hashing is used everywhere: distributed hash tables, overlay networks, P2P, IM, caching, and CDNs. Here's the abstract from the original paper and after the abstract are some links to a few very good articles with accessible explanations of consistent hashing and its applications in the real world. Abstract: We describe a family of caching
3 0.61479664 1611 high scalability-2014-03-12-Paper: Scalable Eventually Consistent Counters over Unreliable Networks
Introduction: Counting at scale in a distributed environment is surprisingly hard . And it's a subject we've covered before in various ways: Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory , How to update video views count effectively? , Numbers Everyone Should Know (sharded counters) . Kellabyte (which is an excellent blog) in Scalable Eventually Consistent Counters talks about how the Cassandra counter implementation scores well on the scalability and high availability front, but in so doing has "over and under counting problem in partitioned environments." Which is often fine. But if you want more accuracy there's a PN-counter, which is a CRDT (convergent replicated data type) where "all the changes made to a counter on each node rather than storing and modifying a single value so that you can merge all the values into the proper final value. Of course the trade-off here is additional storage and processing but there are ways to optimize this."
4 0.61341876 1138 high scalability-2011-11-07-10 Core Architecture Pattern Variations for Achieving Scalability
Introduction: Srinath Perera has put together a strong list of architecture patterns based on three meta patterns: distribution, caching, and asynchronous processing. He contends these three are the primal patterns and the following patterns are but different combinations: LB (Load Balancers) + Shared nothing Units . Units that do not share anything with each other fronted with a load balancer that routes incoming messages to a unit based on some criteria. LB + Stateless Nodes + Scalable Storage . Several stateless nodes talking to a scalable storage, and a load balancer distributes load among the nodes. Peer to Peer Architectures (Distributed Hash Table (DHT) and Content Addressable Networks (CAN)) . Algorithm for scaling up logarithmically. Distributed Queues . Queue implementation (FIFO delivery) implemented as a network service. Publish/Subscribe Paradigm . Network publish subscribe brokers that route messages to each other. Gossip and Nature-inspired Architectures . Each
5 0.60203356 1236 high scalability-2012-04-30-Masstree - Much Faster than MongoDB, VoltDB, Redis, and Competitive with Memcached
Introduction: The EuroSys 2012 system conference has an excellent live blog summary of their talks for: Day 1 , Day 2 , Day 3 (thanks Henry at the Paper Trail blog ). Summaries for each of the accepted papers are here . One of the more interesting papers from a NoSQL perspective was Cache Craftiness for Fast Multicore Key-Value Storage , a wonderfully detailed description of the low level techniques used to implement Masstree: A storage system specialized for key-value data in which all data fits in memory, but must persist across server restarts. It supports arbitrary, variable-length keys. It allows range queries over those keys: clients can traverse subsets of the database, or the whole database, in sorted order by key. On a 16-core machine Masstree achieves six to ten million operations per second on parts A–C of the Yahoo! Cloud Serving Benchmark benchmark, more than 30 as fast as VoltDB [5] or MongoDB [2]. If you are looking for innovative detailed high performance design, t
6 0.59589857 1325 high scalability-2012-09-19-The 4 Building Blocks of Architecting Systems for Scale
9 0.57266891 19 high scalability-2007-07-16-Paper: Replication Under Scalable Hashing
10 0.55878776 507 high scalability-2009-02-03-Paper: Optimistic Replication
11 0.55854708 696 high scalability-2009-09-07-Product: Infinispan - Open Source Data Grid
12 0.55595607 602 high scalability-2009-05-17-Scaling Django Web Apps by Mike Malone
13 0.55491436 149 high scalability-2007-11-12-Scaling Using Cache Farms and Read Pooling
14 0.54673755 963 high scalability-2010-12-23-Paper: CRDTs: Consistency without concurrency control
15 0.5466457 1260 high scalability-2012-06-07-Case Study on Scaling PaaS infrastructure
16 0.54038757 577 high scalability-2009-04-22-Gear6 Web cache - the hardware solution for working with Memcache
17 0.53797966 391 high scalability-2008-09-23-The 7 Stages of Scaling Web Apps
18 0.53197962 373 high scalability-2008-08-29-Product: ScaleOut StateServer is Memcached on Steroids
19 0.52614772 784 high scalability-2010-02-25-Paper: High Performance Scalable Data Stores
20 0.5251044 589 high scalability-2009-05-05-Drop ACID and Think About Data
topicId topicWeight
[(1, 0.179), (2, 0.236), (31, 0.187), (40, 0.106), (61, 0.123), (94, 0.022)]
simIndex simValue blogId blogTitle
1 0.94134015 207 high scalability-2008-01-10-Sharding with Cookie-Based Session Storage
Introduction: In a recent project, I utilized RoR's cookie-based session storage to shard geographically distinct user groups. My technique for doing so was unique and, although it was a premature optimization, it is none-the-less an idea worth exploring.
same-blog 2 0.93784899 892 high scalability-2010-09-02-Distributed Hashing Algorithms by Example: Consistent Hashing
Introduction: Consistent Hashing is a specific implementation of hashing that is well suited for many of today’s web-scale load balancing problems. Specifically, it can be seen in use in various caching solutions like Memcached and is applicable to NoSQL solutions as well. Consistent Hashing is used particularly because it provides a solution for the typical “hashcode mod n” method of distributing keys across a series of servers. It does this by allowing servers to be added or removed without significantly upsetting the distribution of keys, nor does it require that all keys be rehashed to accommodate the change in the number of servers. You can read the full store here .
3 0.91107839 615 high scalability-2009-06-01-HotPads on AWS
Introduction: HotPads abandoned our managed hosting in December and took the leap over to EC2 and its siblings. The presentation has a lot of detail on costs and other things to watch out for, so if you're currently planning your "cloud" architecture, you'll find some of this really helpful.
4 0.88149387 1651 high scalability-2014-05-20-It's Networking. In Space! Or How E.T. Will Phone Home.
Introduction: What will the version of the Internet that follows us to the stars look like? Yes, people are really thinking seriously about this sort of thing. Specifically the InterPlanetary Networking Special Interest Group (IPNSIG). Ansible-like faster-than-light communication it isn't. There's no magical warp drive. Nor is a network of telepaths acting as a 'verse spanning telegraph system. It's more mundane than that. And in many ways more interesting as it's sort of like the old Internet on steroids, the one that was based on on UUCP and dial-up connections, but over vastly longer distances and with much longer delays : The Interplanetary Internet (based on IPN, also called InterPlaNet) is a conceived computer network in space, consisting of a set of network nodes which can communicate with each other.[1][2] Communication would be greatly delayed by the great interplanetary distances, so the IPN needs a new set of protocols and technology that are tolerant to large delays and
Introduction: Consistent hashing is one of those ideas that really puts the science in computer science and reminds us why all those really smart people spend years slaving over algorithms. Consistent hashing is "a scheme that provides hash table functionality in a way that the addition or removal of one slot does not significantly change the mapping of keys to slots" and was originally a way of distributing requests among a changing population of web servers. My first reaction to the idea was "wow, that's really smart" and I sadly realized I would never come up with something so elegant. I then immediately saw applications for it everywhere. And consistent hashing is used everywhere: distributed hash tables, overlay networks, P2P, IM, caching, and CDNs. Here's the abstract from the original paper and after the abstract are some links to a few very good articles with accessible explanations of consistent hashing and its applications in the real world. Abstract: We describe a family of caching
6 0.82982332 785 high scalability-2010-02-26-MySQL and Memcached: End of an Era?
7 0.82921946 368 high scalability-2008-08-17-Wuala - P2P Online Storage Cloud
9 0.82732654 379 high scalability-2008-09-04-Database question for upcoming project
10 0.82352209 482 high scalability-2009-01-04-Alternative Memcache Usage: A Highly Scalable, Highly Available, In-Memory Shard Index
11 0.82074922 1255 high scalability-2012-06-01-Stuff The Internet Says On Scalability For June 1, 2012
12 0.81984073 97 high scalability-2007-09-18-Session management in highly scalable web sites
13 0.81970042 564 high scalability-2009-04-10-counting # of views, calculating most-least viewed
14 0.81825519 292 high scalability-2008-03-30-Scaling Out MySQL
15 0.81751031 757 high scalability-2010-01-04-11 Strategies to Rock Your Startup’s Scalability in 2010
16 0.81317163 537 high scalability-2009-03-12-QCon London 2009: Database projects to watch closely
17 0.81311929 407 high scalability-2008-10-10-The Art of Capacity Planning: Scaling Web Resources
18 0.81218231 1414 high scalability-2013-03-01-Stuff The Internet Says On Scalability For February 29, 2013
19 0.81070852 848 high scalability-2010-06-25-Hot Scalability Links for June 25, 2010
20 0.81008327 64 high scalability-2007-08-10-How do we make a large real-time search engine?