high_scalability high_scalability-2010 high_scalability-2010-958 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Good article from manageability.com  summarizing design patterns from Pat Helland's amazing paper Life beyond Distributed Transactions: an Apostate's Opinion . Entities are uniquely identified - each entity which represents disjoint data (i.e. no overlap of data between entities) should have a unique key. Multiple disjoint scopes of transactional serializability - in other words there are these 'entities' and that you cannot perform atomic transactions across these entities. At-Least-Once messaging - that is an application must tolerate message retries and out-of-order arrival of messages. Messages are adressed to entities - that is one can't abstract away from the business logic the existence of the unique keys for addresing entities. Addressing however is independent of location. Entities manage conversational state per party - that is, to ensure idemptency an entity needs to remember that a message has been previously processed. Furthermore, in a world with
sentIndex sentText sentNum sentScore
1 Entities are uniquely identified - each entity which represents disjoint data (i. [sent-3, score-0.522]
2 no overlap of data between entities) should have a unique key. [sent-5, score-0.205]
3 Multiple disjoint scopes of transactional serializability - in other words there are these 'entities' and that you cannot perform atomic transactions across these entities. [sent-6, score-0.55]
4 At-Least-Once messaging - that is an application must tolerate message retries and out-of-order arrival of messages. [sent-7, score-0.408]
5 Messages are adressed to entities - that is one can't abstract away from the business logic the existence of the unique keys for addresing entities. [sent-8, score-0.513]
6 Entities manage conversational state per party - that is, to ensure idemptency an entity needs to remember that a message has been previously processed. [sent-10, score-0.397]
7 Furthermore, in a world without atomic transactions, outcomes need to be 'negotiated' using some kind of workflow capability. [sent-11, score-0.347]
8 Alternate indexes cannot reside within a single scope of serializability - that is, one can't assume the indices or references to entities can be update atomically. [sent-12, score-0.805]
9 There is the potential that these indices may become out of sync. [sent-13, score-0.207]
10 Messaging between Entities are Tentative - that is, entities need to accept some level of uncertainty and that messages that are sent are requests form commitment and may possibly be cancelled. [sent-14, score-0.526]
11 The article then compares how these principles compare to the design principles used to develop S3:Â Decentralization: Use fully decentralized techniques to remove scaling bottlenecks and single points of failure. [sent-15, score-0.519]
12 Asynchrony: The system makes progress under all circumstances. [sent-16, score-0.162]
13 Autonomy: The system is designed such that individual components can make decisions based on local information. [sent-17, score-0.31]
14 Local responsibility: Each individual component is responsible for achieving its consistency; this is never the burden of its peers. [sent-18, score-0.263]
15 Controlled concurrency: Operations are designed such that no or limited concurrency control is required. [sent-19, score-0.105]
16 Failure tolerant: The system considers the failure of components to be a normal mode of operation, and continues operation with no or minimal interruption. [sent-20, score-0.562]
17 Controlled parallelism: Abstractions used in the system are of such granularity that parallelism can be used to improve performance and robustness of recovery or the introduction of new nodes. [sent-21, score-0.57]
18 Decompose into small well-understood building blocks: Do not try to provide a single service that does everything for every one, but instead build small components that can be used as building blocks for other services. [sent-22, score-0.341]
19 Symmetry: Nodes in the system are identical in terms of functionality, and require no or minimal node-specific configuration to function. [sent-23, score-0.321]
20 Simplicity: The system should be made as simple as possible (- but no simpler). [sent-24, score-0.085]
wordName wordTfidf (topN-words)
[('entities', 0.331), ('disjoint', 0.277), ('indices', 0.207), ('atomic', 0.155), ('entity', 0.153), ('minimal', 0.144), ('apostate', 0.139), ('conversational', 0.139), ('blocks', 0.135), ('parallelism', 0.134), ('principles', 0.131), ('helland', 0.13), ('components', 0.128), ('transactions', 0.118), ('overlap', 0.116), ('decentralization', 0.116), ('pat', 0.113), ('arrival', 0.113), ('outcomes', 0.108), ('uncertainty', 0.105), ('summarizing', 0.105), ('message', 0.105), ('concurrency', 0.105), ('operation', 0.103), ('considers', 0.102), ('robustness', 0.099), ('furthermore', 0.099), ('individual', 0.097), ('retries', 0.097), ('granularity', 0.096), ('reside', 0.094), ('compares', 0.094), ('addressing', 0.093), ('existence', 0.093), ('tolerate', 0.093), ('represents', 0.092), ('identical', 0.092), ('commitment', 0.09), ('references', 0.089), ('unique', 0.089), ('abstractions', 0.087), ('burden', 0.086), ('decentralized', 0.085), ('system', 0.085), ('workflow', 0.084), ('scope', 0.084), ('achieving', 0.08), ('tolerant', 0.08), ('used', 0.078), ('progress', 0.077)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999982 958 high scalability-2010-12-16-7 Design Patterns for Almost-infinite Scalability
Introduction: Good article from manageability.com  summarizing design patterns from Pat Helland's amazing paper Life beyond Distributed Transactions: an Apostate's Opinion . Entities are uniquely identified - each entity which represents disjoint data (i.e. no overlap of data between entities) should have a unique key. Multiple disjoint scopes of transactional serializability - in other words there are these 'entities' and that you cannot perform atomic transactions across these entities. At-Least-Once messaging - that is an application must tolerate message retries and out-of-order arrival of messages. Messages are adressed to entities - that is one can't abstract away from the business logic the existence of the unique keys for addresing entities. Addressing however is independent of location. Entities manage conversational state per party - that is, to ensure idemptency an entity needs to remember that a message has been previously processed. Furthermore, in a world with
2 0.18850997 327 high scalability-2008-05-27-How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale
Introduction: Update 3 : ReadWriteWeb says Google App Engine Announces New Pricing Plans, APIs, Open Access . Pricing is specified but I'm not sure what to make of it yet. An image manipulation library is added (thus the need to pay for more CPU :-) and memcached support has been added. Memcached will help resolve the can't write for every read problem that pops up when keeping counters. Update 2 : onGWT.com threw a GAE load party and a lot of people came. The results at Load test : Google App Engine = 1, Community = 0 . GAE handled a peak of 35 requests/second and a sustained 10 requests/second. Some think performance was good, others not so good. My GMT watch broke and I was late to arrive. Maybe next time. Also added a few new design rules from the post. Update : Added a few new rules gleaned from the GAE Meetup : Design By Explicit Cost Model and Puts are Precious. How do you structure your database using a distributed hash table like BigTable ? The answer isn't what you might expect. If
3 0.17561093 279 high scalability-2008-03-17-Microsoft's New Database Cloud Ready to Rumble with Amazon
Introduction: Update: Zdnet says Ozzie signals Microsoft’s surrender to the cloud . CD ROMs are to the internet as the internet is to the cloud and Microsoft aims to scratch and claw its way into this paradigm shift as well. The gloves are off. The tag line for Microsoft's new SQL Server Data Service is Your Data, Any Place, Any Time . Thems fighten' words. Microsoft is itch'n for a fight! Who will be Amazon's second? The service description: SQL Server Data Services (SSDS) are highly scalable, on-demand data storage and query processing utility services. Built on robust SQL Server database and Windows Server technologies, these services provide high availability, security and support standards-based web interfaces for easy programming and quick provisioning. Sounds like a fast uppercut aimed squarely at SimpleDB's jaw. As a developer what do you need to know? Highly available and highly scalable. Targeted at applications that can tolerate high internet latencies. Pr
4 0.12751362 683 high scalability-2009-08-18-Hardware Architecture Example (geographical level mapping of servers)
Introduction: I have put down my thoughts in the architecture discussed in the blog . Although I have done substantial research to understand how things should work before deciding this architecture but I will be requiring huge amount of inputs from everyone to come to an architecture decision. Hardware entities which were thought while designing the entities are: 1. Master Web Server which will map different users to web servers placed in different geographical locations. (will prefer storing a mapping table in RAM) 2. Web Servers 3. Application Servers 4. Master Database Servers (to implement entity wise look up sharding) 5. Slave Database Servers. Will really appreciate if some good inputs of using Cloud Computing are given and how to go about it against or in addition to the given architecture. Would like to in fact know people's view on when to decide using cloud computing techniques. Looking forward for inputs from the community.
5 0.11564993 301 high scalability-2008-04-08-Google AppEngine - A First Look
Introduction: I haven't developed an AppEngine application yet, I'm just taking a look around their documentation and seeing what stands out for me. It's not the much speculated super cluster VM . AppEngine is solidly grounded in code and structure. It reminds me a little of the guy who ran a website out of S3 with a splash of Heroku thrown in as a chaser. The idea is clearly to take advantage of our massive multi-core future by creating a shared nothing infrastructure based firmly on a core set of infinitely scalable database, storage and CPU services. Don't forget Google also has a few other services to leverage: email, login, blogs, video, search, ads, metrics, and apps. A shared nothing request is a simple beast. By its very nature shared nothing architectures must be composed of services which are themselves already scalable and Google is signing up to supply that scalable infrastructure. Google has been busy creating a platform of out-of-the-box scalable services to build
6 0.11284351 1240 high scalability-2012-05-07-Startups are Creating a New System of the World for IT
8 0.099393032 972 high scalability-2011-01-11-Google Megastore - 3 Billion Writes and 20 Billion Read Transactions Daily
9 0.098432526 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
10 0.094461732 1600 high scalability-2014-02-21-Stuff The Internet Says On Scalability For February 21st, 2014
11 0.09081921 612 high scalability-2009-05-31-Parallel Programming for real-world
12 0.090152346 514 high scalability-2009-02-18-Numbers Everyone Should Know
13 0.089524679 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud
15 0.088134393 96 high scalability-2007-09-18-Amazon Architecture
16 0.087081507 1421 high scalability-2013-03-11-Low Level Scalability Solutions - The Conditioning Collection
17 0.086488076 1447 high scalability-2013-04-26-Stuff The Internet Says On Scalability For April 26, 2013
18 0.086278223 653 high scalability-2009-07-08-Servers Component - How to choice and build perfect server
19 0.08603029 1166 high scalability-2011-12-30-Stuff The Internet Says On Scalability For December 30, 2011
20 0.085908979 687 high scalability-2009-08-24-How Google Serves Data from Multiple Datacenters
topicId topicWeight
[(0, 0.151), (1, 0.077), (2, -0.001), (3, 0.038), (4, 0.003), (5, 0.049), (6, 0.047), (7, -0.011), (8, -0.069), (9, -0.008), (10, 0.019), (11, 0.082), (12, -0.052), (13, -0.015), (14, 0.029), (15, 0.006), (16, -0.009), (17, -0.013), (18, 0.036), (19, -0.014), (20, 0.061), (21, 0.008), (22, -0.033), (23, -0.032), (24, -0.022), (25, -0.041), (26, 0.021), (27, 0.003), (28, 0.041), (29, 0.022), (30, -0.034), (31, 0.007), (32, 0.017), (33, -0.01), (34, -0.035), (35, -0.076), (36, 0.05), (37, -0.004), (38, 0.024), (39, -0.031), (40, -0.046), (41, 0.024), (42, -0.066), (43, 0.006), (44, 0.013), (45, -0.023), (46, -0.002), (47, 0.001), (48, -0.021), (49, 0.017)]
simIndex simValue blogId blogTitle
same-blog 1 0.942496 958 high scalability-2010-12-16-7 Design Patterns for Almost-infinite Scalability
Introduction: Good article from manageability.com  summarizing design patterns from Pat Helland's amazing paper Life beyond Distributed Transactions: an Apostate's Opinion . Entities are uniquely identified - each entity which represents disjoint data (i.e. no overlap of data between entities) should have a unique key. Multiple disjoint scopes of transactional serializability - in other words there are these 'entities' and that you cannot perform atomic transactions across these entities. At-Least-Once messaging - that is an application must tolerate message retries and out-of-order arrival of messages. Messages are adressed to entities - that is one can't abstract away from the business logic the existence of the unique keys for addresing entities. Addressing however is independent of location. Entities manage conversational state per party - that is, to ensure idemptency an entity needs to remember that a message has been previously processed. Furthermore, in a world with
2 0.74098361 983 high scalability-2011-02-02-Piccolo - Building Distributed Programs that are 11x Faster than Hadoop
Introduction: Piccolo (not this or this ) is a system for distributed computing, Piccolo is a n ew data-centric programming model for writing parallel in-memory applications in data centers . Unlike existing data-flow models, Piccolo allows computation running on different machines to share distributed, mutable state via a key-value table interface. T raditional data-centric models (such as Hadoop) which present the user a single object at a time to operate on, Piccolo exposes a global table interface which is available to all parts of the computation simultaneously. This allows users to specify programs in an intuitive manner very similar to that of writing programs for a single machine. Using an in-memory key-value store is a very different approach from the canonical map-reduce, which is based on using distributed file systems. The results are impressive: Experiments have shown that Piccolo is fast and pro-vides excellent scaling for many applications. The performance of PageRank and
3 0.72572947 1544 high scalability-2013-11-07-Paper: Tempest: Scalable Time-Critical Web Services Platform
Introduction: An interesting and different implementation approach: Tempest: Scalable Time-Critical Web Services Platform : Tempest is a new framework for developing time-critical web services. Tempest enables developers to build scalable, fault-tolerant services that can then be automatically replicated and deployed across clusters of computing nodes. The platform automatically adapts to load fluctuations, reacts when components fail, and ensures consistency between replicas by repairing when inconsistencies do occur. Tempest relies on a family of epidemic protocols and on Ricochet, a reliable time critical multicast protocol with probabilistic guarantees. Tempest is built around a novel storage abstraction called the TempestCollection in which application developers store the state of a service. Our platform handles the replication of this state across clones of the service, persistence, and failure handling. To minimize the need for specialized knowledge on the part of the application deve
4 0.72368133 507 high scalability-2009-02-03-Paper: Optimistic Replication
Introduction: To scale in the large you have to partition. Data has to be spread around, replicated, and kept consistent (keeping replicas sufficiently similar to one another despite operations being submitted independently at different sites). The result is a highly available, well performing, and scalable system. Partitioning is required, but it's a pain to do efficiently and correctly. Until Quantum teleportation becomes a reality how data is kept consistent across a bewildering number of failure scenarios is a key design decision. This excellent paper by Yasushi Saito and Marc Shapiro takes us on a wild ride (OK, maybe not so wild) of different approaches to achieving consistency. What's cool about this paper is they go over some real systems that we are familiar with and cover how they work: DNS (single-master, state-transfer), Usenet (multi-master), PDAs (multi-master, state-transfer, manual or application-specific conflict resolution), Bayou (multi-master, operation-transfer, epidemic
5 0.72342074 1299 high scalability-2012-08-06-Paper: High-Performance Concurrency Control Mechanisms for Main-Memory Databases
Introduction: If you stayed up all night watching the life reaffirming Curiosity landing on Mars , then this paper, High-Performance Concurrency Control Mechanisms for Main-Memory Databases , has nothing to do with that at all, but it is an excellent look at how to use optimistic MVCC schemes to reduce lock overhead on in-memory datastructures: A database system optimized for in-memory storage can support much higher transaction rates than current systems. However, standard concurrency control methods used today do not scale to the high transaction rates achievable by such systems. In this paper we introduce two efficient concurrency control methods specifically designed for main-memory databases. Both use multiversioning to isolate read-only transactions from updates but differ in how atomicity is ensured: one is optimistic and one is pessimistic. To avoid expensive context switching, transactions never block during normal processing but they may have to wait before commit to ensure corr
6 0.70740831 357 high scalability-2008-07-26-Google's Paxos Made Live – An Engineering Perspective
7 0.70372945 705 high scalability-2009-09-16-Paper: A practical scalable distributed B-tree
8 0.69430888 972 high scalability-2011-01-11-Google Megastore - 3 Billion Writes and 20 Billion Read Transactions Daily
9 0.68674308 510 high scalability-2009-02-09-Paper: Consensus Protocols: Two-Phase Commit
10 0.67944652 1431 high scalability-2013-03-29-Stuff The Internet Says On Scalability For March 29, 2013
12 0.67776966 1219 high scalability-2012-03-30-Stuff The Internet Says On Scalability For March 30, 2012
13 0.67744559 1276 high scalability-2012-07-04-Top Features of a Scalable Database
14 0.6771419 844 high scalability-2010-06-18-Paper: The Declarative Imperative: Experiences and Conjectures in Distributed Logic
15 0.67682898 963 high scalability-2010-12-23-Paper: CRDTs: Consistency without concurrency control
16 0.6757943 676 high scalability-2009-08-08-Yahoo!'s PNUTS Database: Too Hot, Too Cold or Just Right?
17 0.67527062 890 high scalability-2010-09-01-Paper: The Case for Determinism in Database Systems
18 0.67410278 1553 high scalability-2013-11-25-How To Make an Infinitely Scalable Relational Database Management System (RDBMS)
19 0.67225403 1017 high scalability-2011-04-06-Netflix: Run Consistency Checkers All the time to Fixup Transactions
20 0.66884404 1446 high scalability-2013-04-25-Paper: Making reliable distributed systems in the presence of software errors
topicId topicWeight
[(1, 0.139), (2, 0.204), (10, 0.039), (24, 0.175), (30, 0.013), (47, 0.07), (61, 0.043), (73, 0.027), (79, 0.133), (85, 0.038), (94, 0.032)]
simIndex simValue blogId blogTitle
1 0.95613039 385 high scalability-2008-09-16-Product: Func - Fedora Unified Network Controller
Introduction: Func is used to manage a large network using bash or Python scripts. It targets easy and simple remote scripting and one-off tasks over SSH by creating a secure (SSL certifications) XMLRPC API for communication. Any kind of application can be written on top of it. Other configuration management tools specialize in mass configuration. They say here's what the machine should look like and keep it that way. Func allows you to program your cluster. If you've ever tried to securely remote script a gang of machines using SSH keys you know what a total nightmare that can be. Some example commands: Using the command line: func "*.example.org" call yumcmd update Using the Pthon API: import func.overlord.client as fc client = fc.Client("*.example.org;*.example.com") client.yumcmd.update() client.service.start("acme-server") print client.hardware.info() Func may certainly overlap in functionality with other tools like Puppet and cfengine, but as programmers we always need more than one
2 0.92645073 1458 high scalability-2013-05-15-Lesson from Airbnb: Give Yourself Permission to Experiment with Non-scalable Changes
Introduction: If you are stuck drowning in too much data and too many options and are dazzled by all the possibilities of code, here's a helpful bit of advice from Airbnb's rags to riches origin story : it's okay to do things that don’t scale . A corollary is the idea of paying attention to and learning from what your users are actually doing and let that lead you without out that annoying voice in your head second guessing you, yelling but that will never scale! Worry about building something good, then worry about making it scale. In Airbnb's case they noticed people weren't booking rooms because the pictures sucked. So they flew to New York and shot some beautiful images. This is a very non-scalable and non-technical solution. Yet it was the turning point for Airbnb and sparked their climb out of the "trough of sorrow." Previously they had been limited by the Silicon Valley idea that every feature had to be scalable. Not every solution can be found behind a computer screen. For the full
same-blog 3 0.91613352 958 high scalability-2010-12-16-7 Design Patterns for Almost-infinite Scalability
Introduction: Good article from manageability.com  summarizing design patterns from Pat Helland's amazing paper Life beyond Distributed Transactions: an Apostate's Opinion . Entities are uniquely identified - each entity which represents disjoint data (i.e. no overlap of data between entities) should have a unique key. Multiple disjoint scopes of transactional serializability - in other words there are these 'entities' and that you cannot perform atomic transactions across these entities. At-Least-Once messaging - that is an application must tolerate message retries and out-of-order arrival of messages. Messages are adressed to entities - that is one can't abstract away from the business logic the existence of the unique keys for addresing entities. Addressing however is independent of location. Entities manage conversational state per party - that is, to ensure idemptency an entity needs to remember that a message has been previously processed. Furthermore, in a world with
4 0.91512227 1091 high scalability-2011-08-02-How Will DIDO Wireless Networking Change Everything?
Introduction: A conjunction of a few new technologies may trigger disruptive changes in the future. This observation was prompted by a talk Steve Perlman gave at the Columbia Engineering School: Benjamin Button, Cloud Everything and Why Shannon's Law Isn't . In it he covers a set of technologies that at first may seem unrelated, but turn out to be deeply related after all, culminating in a realization of the long talked about vision of an application utility, where all applications are hosted and run out of the cloud. First Perlman talks about the realistic human rendering technology developed at Rearden, his research incubator company. This technology was developed over many years and is the secret behind the wonderful effects found in movies like Benjamin Button . It is now being used in many other films, and promises to revolutionize film making, possibly even replacing actors with computers, in real-time. The next invention at Rearden is OnLive , a cloud based gaming technology for pla
5 0.89607531 1015 high scalability-2011-04-01-Stuff The Internet Says On Scalability For April 1, 2011
Introduction: Submitted for your reading pleasure, no foolin'... Quotable Quotes: @zateriosystems : thinking about scalability?, are you OK to double your capacity in one week?, a startup should be ready...ready to jump. @sklacy : Maybe what I should have said is "Design for scalability, deploy without it." @MikeHale : Scalability is customer 2000 having the same experience as customer 1 #sqlsat67 @LusciousPear : The meaning of #NoSQL is shut up @deobrat : The biggest bottleneck to scalability are ignorant developers. Most don't even try saving extra CPU cycles or memory bytes :( @w_westendorp : . @ijansch : Cloud computing is like outsourcing your scalability problems @edyavno : Billy Newport essentially just affirmed the theme I've been propagating: "Distributed Caching is the enterprise NoSQL" #strangeloop #nosql @monkchips : HP CEO Leo Apotheker says "relational databases are becoming less and less relevant to the future stack" 12,000 Requests per secon
6 0.89081109 1058 high scalability-2011-06-13-Automation on AWS with Ruby and Puppet
7 0.88286978 738 high scalability-2009-11-06-Product: Resque - GitHub's Distrubuted Job Queue
8 0.88116717 60 high scalability-2007-08-07-Can you profit from the coming Content Delivery Network wars?
9 0.87276232 396 high scalability-2008-09-26-Lucasfilm: The Real Magic is in the Data Center
10 0.87090439 663 high scalability-2009-07-28-37signals Architecture
11 0.84931701 1537 high scalability-2013-10-25-Stuff The Internet Says On Scalability For October 25th, 2013
12 0.84907871 1530 high scalability-2013-10-11-Stuff The Internet Says On Scalability For October 11th, 2013
13 0.8489449 1302 high scalability-2012-08-10-Stuff The Internet Says On Scalability For August 10, 2012
14 0.84685105 676 high scalability-2009-08-08-Yahoo!'s PNUTS Database: Too Hot, Too Cold or Just Right?
15 0.84151691 852 high scalability-2010-07-07-Strategy: Recompute Instead of Remember Big Data
16 0.840693 327 high scalability-2008-05-27-How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale
17 0.83775008 761 high scalability-2010-01-17-Applications Become Black Boxes Using Markets to Scale and Control Costs
18 0.83692473 716 high scalability-2009-10-06-Building a Unique Data Warehouse
19 0.8369087 1166 high scalability-2011-12-30-Stuff The Internet Says On Scalability For December 30, 2011
20 0.83643538 274 high scalability-2008-03-12-YouTube Architecture