high_scalability high_scalability-2009 high_scalability-2009-594 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Wille Faler has created an excellent list of best practices for building scalable and high performance systems. Here's a short summary of his points: Offload the database - Avoid hitting the database, and avoid opening transactions or connections unless you absolutely need to use them. What a difference a cache makes - For read heavy applications caching is the easiest way offload the database. Cache as coarse-grained objects as possible - Coarse-grained objects save CPU and time by requiring fewer reads to assemble objects. Don’t store transient state permanently - Is it really necessary to store your transient data in the database? Location, Location - put things close to where they are supposed to be delivered. Constrain concurrent access to limited resource - it's quicker to let a single thread do work and finish rather than flooding finite resources with 200 client threads. Staged, asynchronous processing - separate a process using asynchronicity int
sentIndex sentText sentNum sentScore
1 Here's a short summary of his points: Offload the database - Avoid hitting the database, and avoid opening transactions or connections unless you absolutely need to use them. [sent-2, score-0.924]
2 What a difference a cache makes - For read heavy applications caching is the easiest way offload the database. [sent-3, score-0.582]
3 Cache as coarse-grained objects as possible - Coarse-grained objects save CPU and time by requiring fewer reads to assemble objects. [sent-4, score-0.829]
4 Don’t store transient state permanently - Is it really necessary to store your transient data in the database? [sent-5, score-0.86]
5 Location, Location - put things close to where they are supposed to be delivered. [sent-6, score-0.207]
6 Constrain concurrent access to limited resource - it's quicker to let a single thread do work and finish rather than flooding finite resources with 200 client threads. [sent-7, score-0.952]
7 Staged, asynchronous processing - separate a process using asynchronicity into separate steps mediated by queues and executed by a limited number of workers in each step. [sent-8, score-1.305]
8 Minimize network chatter - Avoid remote communication if you can as it's slower and less reliable than local computation. [sent-9, score-0.412]
wordName wordTfidf (topN-words)
[('transient', 0.285), ('offload', 0.273), ('avoid', 0.224), ('faler', 0.215), ('hascreated', 0.215), ('wille', 0.215), ('asynchronicity', 0.202), ('flooding', 0.185), ('assemble', 0.175), ('finite', 0.167), ('mediated', 0.16), ('limited', 0.156), ('objects', 0.153), ('quicker', 0.148), ('finish', 0.139), ('grained', 0.138), ('easiest', 0.135), ('opening', 0.133), ('separate', 0.132), ('workers', 0.129), ('supposed', 0.119), ('hitting', 0.118), ('absolutely', 0.114), ('executed', 0.113), ('requiring', 0.109), ('unless', 0.105), ('steps', 0.103), ('store', 0.103), ('fewer', 0.1), ('slower', 0.096), ('queues', 0.095), ('location', 0.092), ('computation', 0.092), ('remote', 0.091), ('difference', 0.089), ('close', 0.088), ('summary', 0.086), ('heavy', 0.085), ('necessary', 0.084), ('asynchronous', 0.083), ('concurrent', 0.083), ('communication', 0.083), ('connections', 0.075), ('thread', 0.074), ('reliable', 0.074), ('points', 0.073), ('reads', 0.071), ('short', 0.069), ('local', 0.068), ('save', 0.068)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 594 high scalability-2009-05-08-Eight Best Practices for Building Scalable Systems
Introduction: Wille Faler has created an excellent list of best practices for building scalable and high performance systems. Here's a short summary of his points: Offload the database - Avoid hitting the database, and avoid opening transactions or connections unless you absolutely need to use them. What a difference a cache makes - For read heavy applications caching is the easiest way offload the database. Cache as coarse-grained objects as possible - Coarse-grained objects save CPU and time by requiring fewer reads to assemble objects. Don’t store transient state permanently - Is it really necessary to store your transient data in the database? Location, Location - put things close to where they are supposed to be delivered. Constrain concurrent access to limited resource - it's quicker to let a single thread do work and finish rather than flooding finite resources with 200 client threads. Staged, asynchronous processing - separate a process using asynchronicity int
2 0.090451866 1413 high scalability-2013-02-27-42 Monster Problems that Attack as Loads Increase
Introduction: For solutions take a look at: 7 Life Saving Scalability Defenses Against Load Monster Attacks . This is a look at all the bad things that can happen to your carefully crafted program as loads increase: all hell breaks lose. Sure, you can scale out or scale up, but you can also choose to program better. Make your system handle larger loads. This saves money because fewer boxes are needed and it will make the entire application more reliable and have better response times. And it can be quite satisfying as a programmer. Large Number Of Objects We usually get into scaling problems when the number of objects gets larger. Clearly resource usage of all types is stressed as the number of objects grow. Continuous Failures Makes An Infinite Event Stream During large network failure scenarios there is never time for the system recover. We are in a continual state of stress. Lots of High Priority Work For example, rerouting is a high priority activity. If there is a large amount
3 0.088075705 960 high scalability-2010-12-20-Netflix: Use Less Chatty Protocols in the Cloud - Plus 26 Fixes
Introduction: In 5 Lessons We’ve Learned Using AWS , Netflix's John Ciancutti says one of the big lessons they've learned is to create less chatty protocols : In the Netflix data centers, we have a high capacity, super fast, highly reliable network. This has afforded us the luxury of designing around chatty APIs to remote systems. AWS networking has more variable latency. We’ve had to be much more structured about “over the wire” interactions, even as we’ve transitioned to a more highly distributed architecture. There's not a lot of advice out there on how to create protocols. Combine that with a rush to the cloud and you have a perfect storm for chatty applications crushing application performance. Netflix is far from the first to be surprised by the less than stellar networks inside AWS. A chatty protocol is one where a client makes a series of requests to a server and the client must wait on each reply before sending the next request. On a LAN this can work great. LAN's are typically
4 0.086015522 1633 high scalability-2014-04-16-Six Lessons Learned the Hard Way About Scaling a Million User System
Introduction: Ever come to a point where you feel you've learned enough to share your experiences in the hopes of helping others traveling the same road? That's what Martin Kleppmann has done in an lovingly written Six things I wish we had known about scaling , an article well worth your time. It's not advice about scaling a Twitter, but of building a million user system, which is the sweet spot for a lot of projects. His conclusion rings true: Building scalable systems is not all sexy roflscale fun. It’s a lot of plumbing and yak shaving. A lot of hacking together tools that really ought to exist already, but all the open source solutions out there are too bad (and yours ends up bad too, but at least it solves your particular problem). Here's a gloss on the six lessons (plus a bonus lesson): Realistic load testing is hard . Testing a large distributed system is not like a scientific experiment that can be conducted under ideal conditions. This is hard for the scientific minded to acce
5 0.085835159 1456 high scalability-2013-05-13-The Secret to 10 Million Concurrent Connections -The Kernel is the Problem, Not the Solution
Introduction: Now that we have the C10K concurrent connection problem licked, how do we level up and support 10 million concurrent connections? Impossible you say. Nope, systems right now are delivering 10 million concurrent connections using techniques that are as radical as they may be unfamiliar. To learn how it’s done we turn to Robert Graham , CEO of Errata Security, and his absolutely fantastic talk at Shmoocon 2013 called C10M Defending The Internet At Scale . Robert has a brilliant way of framing the problem that I’ve never heard of before. He starts with a little bit of history, relating how Unix wasn’t originally designed to be a general server OS, it was designed to be a control system for a telephone network. It was the telephone network that actually transported the data so there was a clean separation between the control plane and the data plane. The problem is we now use Unix servers as part of the data plane , which we shouldn’t do at all. If we were des
6 0.084153458 436 high scalability-2008-11-02-Strategy: How to Manage Sessions Using Memcached
7 0.082397699 738 high scalability-2009-11-06-Product: Resque - GitHub's Distrubuted Job Queue
8 0.080960676 717 high scalability-2009-10-07-How to Avoid the Top 5 Scale-Out Pitfalls
9 0.079587802 1193 high scalability-2012-02-16-A Short on the Pinterest Stack for Handling 3+ Million Users
10 0.077339053 511 high scalability-2009-02-12-MySpace Architecture
11 0.07680472 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
12 0.075932294 1421 high scalability-2013-03-11-Low Level Scalability Solutions - The Conditioning Collection
13 0.075863726 327 high scalability-2008-05-27-How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale
14 0.075799003 1302 high scalability-2012-08-10-Stuff The Internet Says On Scalability For August 10, 2012
15 0.075302154 589 high scalability-2009-05-05-Drop ACID and Think About Data
16 0.074549355 1346 high scalability-2012-10-24-Saving Cash Using Less Cache - 90% Savings in the Caching Tier
17 0.07292068 898 high scalability-2010-09-09-6 Scalability Lessons
18 0.072106384 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud
19 0.072066739 359 high scalability-2008-07-29-Ehcache - A Java Distributed Cache
topicId topicWeight
[(0, 0.114), (1, 0.085), (2, -0.015), (3, -0.026), (4, -0.005), (5, 0.054), (6, 0.04), (7, 0.021), (8, -0.083), (9, -0.037), (10, -0.002), (11, 0.001), (12, -0.018), (13, 0.016), (14, -0.024), (15, -0.028), (16, -0.006), (17, -0.009), (18, 0.01), (19, -0.013), (20, -0.026), (21, -0.0), (22, 0.033), (23, 0.033), (24, -0.015), (25, -0.015), (26, 0.053), (27, 0.056), (28, 0.039), (29, -0.021), (30, -0.001), (31, -0.005), (32, -0.0), (33, -0.013), (34, -0.011), (35, -0.042), (36, -0.022), (37, -0.005), (38, 0.026), (39, 0.002), (40, 0.01), (41, -0.035), (42, -0.022), (43, 0.011), (44, 0.041), (45, 0.01), (46, -0.048), (47, 0.008), (48, -0.037), (49, 0.002)]
simIndex simValue blogId blogTitle
same-blog 1 0.96419239 594 high scalability-2009-05-08-Eight Best Practices for Building Scalable Systems
Introduction: Wille Faler has created an excellent list of best practices for building scalable and high performance systems. Here's a short summary of his points: Offload the database - Avoid hitting the database, and avoid opening transactions or connections unless you absolutely need to use them. What a difference a cache makes - For read heavy applications caching is the easiest way offload the database. Cache as coarse-grained objects as possible - Coarse-grained objects save CPU and time by requiring fewer reads to assemble objects. Don’t store transient state permanently - Is it really necessary to store your transient data in the database? Location, Location - put things close to where they are supposed to be delivered. Constrain concurrent access to limited resource - it's quicker to let a single thread do work and finish rather than flooding finite resources with 200 client threads. Staged, asynchronous processing - separate a process using asynchronicity int
2 0.70697057 673 high scalability-2009-08-07-Strategy: Break Up the Memcache Dog Pile
Introduction: Update: Asynchronous HTTP cache validations . A proposed HTTP caching extension: if your application can afford to show slightly out of date content, then stale-while-revalidate can guarantee that the user will always be served directly from the cache, hence guaranteeing a consistent response-time user-experience. Caching is like aspirin for headaches. Head hurts: pop a 'sprin. Slow site: add caching. Facebook must have a lot of headaches because they popped 805 memcached servers between 10,000 web servers and 1,800 MySQL servers and they reportedly have a 99% cache hit rate. But what's the best way for you to cache for your application? It's a remarkably complex and rich topic. Alexey Kovyrin talks about one common caching problem called the Dog Pile Effect in Dog-pile Effect and How to Avoid it with Ruby on Rails . Glenn Franxman also has a Django solution in MintCache . Data is usually cached because it's too expensive to calculate for every hit. Maybe it's a gnarly S
3 0.70539981 174 high scalability-2007-12-05-Product: Tugela Cache
Introduction: Tugela Cache is a cache system like memecached, but instead of storing data just in RAM, it stores data in the file system using a b-tree. You trade latency in order to have a very large cache. It's useful for sites that have caching requirements that exceed their available memory. It uses the same wire protocol as memcached so it can be dropped in without a hassle. From the website: As large MediaWiki deployments may gain performance using Memcached, at some level cost of RAM to store all objects becomes too high. In order to balance resource usage and make more use of our Apache server disks, Tugela, the distributed cached on-disk hash database, has arrived. Tugela Cache is derived from Memcached. Much of the code remains the same, but notably, these changes: Internal slab allocator replaced by BerkeleyDB B-Tree database. Expiry policy management moved to external program tugela-expire Much statistics code made obsolete. An interesting point brought up in the comme
4 0.69117504 1421 high scalability-2013-03-11-Low Level Scalability Solutions - The Conditioning Collection
Introduction: We talked about 42 Monster Problems That Attack As Loads Increase . And in The Aggregation Collection we talked about the value of prioritizing work and making smart queues as a way of absorbing and not reflecting traffic spikes. Now we move on to our next batch of strategies where the theme is conditioning , which is the idea of shaping and controlling flows of work within your application... Use Resources Proportional To a Fixed Limit This is probably the most important rule for achieving scalability within an application. What it means: Find the resource that has a fixed limit that you know you can support. For example, a guarantee to handle a certain number of objects in memory. So if we always use resources proportional to the number of objects it is likely we can prevent resource exhaustion. Devise ways of tying what you need to do to the individual resources. Some examples: Keep a list of purchase orders with line items over $20 (or whatever). Do not keep
5 0.68903375 1633 high scalability-2014-04-16-Six Lessons Learned the Hard Way About Scaling a Million User System
Introduction: Ever come to a point where you feel you've learned enough to share your experiences in the hopes of helping others traveling the same road? That's what Martin Kleppmann has done in an lovingly written Six things I wish we had known about scaling , an article well worth your time. It's not advice about scaling a Twitter, but of building a million user system, which is the sweet spot for a lot of projects. His conclusion rings true: Building scalable systems is not all sexy roflscale fun. It’s a lot of plumbing and yak shaving. A lot of hacking together tools that really ought to exist already, but all the open source solutions out there are too bad (and yours ends up bad too, but at least it solves your particular problem). Here's a gloss on the six lessons (plus a bonus lesson): Realistic load testing is hard . Testing a large distributed system is not like a scientific experiment that can be conducted under ideal conditions. This is hard for the scientific minded to acce
6 0.68252635 1229 high scalability-2012-04-17-YouTube Strategy: Adding Jitter isn't a Bug
7 0.68173474 602 high scalability-2009-05-17-Scaling Django Web Apps by Mike Malone
8 0.68028289 1246 high scalability-2012-05-16-Big List of 20 Common Bottlenecks
9 0.67474115 1413 high scalability-2013-02-27-42 Monster Problems that Attack as Loads Increase
10 0.67428434 359 high scalability-2008-07-29-Ehcache - A Java Distributed Cache
11 0.67249113 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
12 0.6633482 1236 high scalability-2012-04-30-Masstree - Much Faster than MongoDB, VoltDB, Redis, and Competitive with Memcached
13 0.66318923 1429 high scalability-2013-03-25-AppBackplane - A Framework for Supporting Multiple Application Architectures
14 0.66218823 1462 high scalability-2013-05-22-Strategy: Stop Using Linked-Lists
15 0.65899694 1425 high scalability-2013-03-18-Beyond Threads and Callbacks - Application Architecture Pros and Cons
16 0.65105909 910 high scalability-2010-09-30-Facebook and Site Failures Caused by Complex, Weakly Interacting, Layered Systems
17 0.64938575 1373 high scalability-2012-12-17-11 Uses For the Humble Presents Queue, er, Message Queue
18 0.64920908 1620 high scalability-2014-03-27-Strategy: Cache Stored Procedure Results
19 0.64472234 684 high scalability-2009-08-18-Real World Web: Performance & Scalability
20 0.64413452 1142 high scalability-2011-11-14-Using Gossip Protocols for Failure Detection, Monitoring, Messaging and Other Good Things
topicId topicWeight
[(1, 0.051), (2, 0.676), (10, 0.017), (42, 0.062), (61, 0.033), (79, 0.028), (85, 0.016), (94, 0.012)]
simIndex simValue blogId blogTitle
same-blog 1 0.99654555 594 high scalability-2009-05-08-Eight Best Practices for Building Scalable Systems
Introduction: Wille Faler has created an excellent list of best practices for building scalable and high performance systems. Here's a short summary of his points: Offload the database - Avoid hitting the database, and avoid opening transactions or connections unless you absolutely need to use them. What a difference a cache makes - For read heavy applications caching is the easiest way offload the database. Cache as coarse-grained objects as possible - Coarse-grained objects save CPU and time by requiring fewer reads to assemble objects. Don’t store transient state permanently - Is it really necessary to store your transient data in the database? Location, Location - put things close to where they are supposed to be delivered. Constrain concurrent access to limited resource - it's quicker to let a single thread do work and finish rather than flooding finite resources with 200 client threads. Staged, asynchronous processing - separate a process using asynchronicity int
2 0.99455851 878 high scalability-2010-08-12-Strategy: Terminate SSL Connections in Hardware and Reduce Server Count by 40%
Introduction: This is an interesting tidbit from near the end of the Packet Pushers podcast Show 15 – Saving the Web With Dinky Putt Putt Firewalls . The conversation was about how SSL connections need to terminate before they can be processed by a WAF ( Web Application Firewall ), which inspects HTTP for security problems like SQL injection and cross-site scripting exploits. Much was made that if programmers did their job better these appliances wouldn't be necessary, but I digress. To terminate SSL most shops run SSL connections into Intel based Linux boxes running Apache. This setup is convenient for developers, but it's not optimized for SSL, so it's slow and costly. Much of the capacity of these servers are unnecessarily consumed processing SSL. Load balancers on the other hand have crypto cards that terminate SSL very efficiently in hardware. Efficiently enough that if you are willing to get rid of the general purpose Linux boxes and use your big iron load balancers, your server count c
3 0.99412024 911 high scalability-2010-09-30-More Troubles with Caching
Introduction: As a tasty pairing with Facebook And Site Failures Caused By Complex, Weakly Interacting, Layered Systems , is another excellent tale of caching gone wrong by Peter Zaitsev, in an exciting twin billing: Cache Miss Storm and More on dangers of the caches . This is fascinating case where the cause turned out to be software upgrade that ran long because it had to be rolled back. During the long recovery time many of the cache entries timed out. When the database came back, slam, all the clients queried the database to repopulate the cache and bad things happened to the database. The solution was equally interesting: So the immediate solution to bring the system up was surprisingly simple. We just had to get traffic on the system in stages allowing Memcache to be warmed up. There were no code which would allow to do it on application side so we did it on MySQL side instead. “SET GLOBAL max_connections=20” to limit number of connections to MySQL and so let application to err when i
4 0.99298233 836 high scalability-2010-06-04-Strategy: Cache Larger Chunks - Cache Hit Rate is a Bad Indicator
Introduction: Isn't the secret to fast, scalable websites to cache everything ? Caching, if not the secret sauce of many a website, is it at least a popular condiment. But not so fast says Peter Zaitsev in Beyond great cache hit ratio . The point Peter makes is that we read about websites like Amazon and Facebook that can literally make hundreds of calls to satisfy a user request. Even if you have an awesome cache hit ratio, pages can still be slow because making and processing all those requests takes time. The solution is to remove requests all together . You do this by caching larger blocks so you have to make fewer requests. The post has a lot of good advice worth reading: 1) Make non cacheable blocks as small as possible, 2) Maximize amount of uses of the cache item, 3) Control invalidation, 4) Multi-Get.
5 0.99273366 223 high scalability-2008-01-25-Google: Introduction to Distributed System Design
Introduction: Update: Google added videos on Cluster Computing and MapReduce . There are five lectures: Introduction, MapReduce, Distributed File Systems, Clustering Algorithms, and Graph Algorithms . Advanced website design depends on deep distributed system design knowledge. Where do you get this knowledge? Try Google. They have a a whole Code for Educators program with tutorials and lectures on AJAX programming, distributed systems, and web security. Looks pretty nice.
6 0.992625 436 high scalability-2008-11-02-Strategy: How to Manage Sessions Using Memcached
7 0.99024278 56 high scalability-2007-08-03-Running Hadoop MapReduce on Amazon EC2 and Amazon S3
8 0.99024278 565 high scalability-2009-04-13-Benchmark for keeping data in browser in AJAX projects
9 0.98952788 205 high scalability-2008-01-10-Letting Clients Know What's Changed: Push Me or Pull Me?
10 0.98937494 455 high scalability-2008-12-01-MySQL Database Scale-out and Replication for High Growth Businesses
11 0.98766935 1155 high scalability-2011-12-12-Netflix: Developing, Deploying, and Supporting Software According to the Way of the Cloud
12 0.98090148 967 high scalability-2011-01-03-Stuff The Internet Says On Scalability For January 3, 2010
13 0.97817367 50 high scalability-2007-07-31-BerkeleyDB & other distributed high performance key-value databases
14 0.97650409 723 high scalability-2009-10-16-Paper: Scaling Online Social Networks without Pains
15 0.97274601 844 high scalability-2010-06-18-Paper: The Declarative Imperative: Experiences and Conjectures in Distributed Logic
16 0.97254103 1199 high scalability-2012-02-27-Zen and the Art of Scaling - A Koan and Epigram Approach
17 0.97160053 1190 high scalability-2012-02-10-Stuff The Internet Says On Scalability For February 10, 2012
18 0.97142261 1591 high scalability-2014-02-05-Little’s Law, Scalability and Fault Tolerance: The OS is your bottleneck. What you can do?
19 0.96685731 1006 high scalability-2011-03-17-Are long VM instance spin-up times in the cloud costing you money?
20 0.96599591 417 high scalability-2008-10-15-Outside.in Scales Up with Engine Yard and moving from PHP to Ruby on Rails