high_scalability high_scalability-2009 high_scalability-2009-673 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Update: Asynchronous HTTP cache validations . A proposed HTTP caching extension: if your application can afford to show slightly out of date content, then stale-while-revalidate can guarantee that the user will always be served directly from the cache, hence guaranteeing a consistent response-time user-experience. Caching is like aspirin for headaches. Head hurts: pop a 'sprin. Slow site: add caching. Facebook must have a lot of headaches because they popped 805 memcached servers between 10,000 web servers and 1,800 MySQL servers and they reportedly have a 99% cache hit rate. But what's the best way for you to cache for your application? It's a remarkably complex and rich topic. Alexey Kovyrin talks about one common caching problem called the Dog Pile Effect in Dog-pile Effect and How to Avoid it with Ruby on Rails . Glenn Franxman also has a Django solution in MintCache . Data is usually cached because it's too expensive to calculate for every hit. Maybe it's a gnarly S
sentIndex sentText sentNum sentScore
1 Facebook must have a lot of headaches because they popped 805 memcached servers between 10,000 web servers and 1,800 MySQL servers and they reportedly have a 99% cache hit rate. [sent-6, score-0.433]
2 But what's the best way for you to cache for your application? [sent-7, score-0.266]
3 Maybe it's a gnarly SQL query you want to avoid and a little stale data is OK. [sent-12, score-0.638]
4 Or maybe you have the temerity to write to your database and cause its cache to flush so database caching isn't sufficient at a certain level of scale. [sent-14, score-0.412]
5 Data freshness requires a refrigeration truck or an expiry time on your cache entry that causes stats to be periodically recalculated. [sent-21, score-0.744]
6 Now, what happens when your cached data expires and a 1000 requests simultaneously try to recalculate the expensive to calculate data? [sent-22, score-0.501]
7 And since memcached operations are not atomic it's possible stale data could be cached and you'll serve stale data. [sent-24, score-1.247]
8 No Expire Solution If cache items never expire then there can never be a recalculation storm. [sent-27, score-0.515]
9 This approach can also be used to pre-warm the the cache so a newly brought up system doesn't peg the database. [sent-31, score-0.352]
10 Memcached can still evict your cache item when it starts running out of memory. [sent-33, score-0.543]
11 It uses a LRU (least recently used) policy so your cache item may not be around when a program needs it which means it will have to go without, use a local cache, or recalculate. [sent-34, score-0.47]
12 This approach also doesn't work well for item specific caching. [sent-36, score-0.29]
13 It works for globally calculated items like top N posts, but it doesn't really make sense to periodically cache items for user data when the user isn't even active. [sent-37, score-0.651]
14 Stale Date Solution This solution introduces a stale date in addition to the expiration date. [sent-39, score-0.741]
15 For example, set the item to timeout in 24 hours, but the embedded timeout might be five minutes in the future. [sent-42, score-0.582]
16 On a get from the cache determine if the stale timeout expired and on expiry immediately set a time in the future and re-store the data as is. [sent-43, score-1.276]
17 Fetch data from the DB and update the cache with the latest value. [sent-45, score-0.419]
18 Alexey describes a different two key approach: Create two keys in memcached: MAIN key with expiration time a bit higher than normal + a STALE key which expires earlier. [sent-46, score-0.67]
19 If the stale has expired, re-calculate and set the stale key again. [sent-48, score-1.076]
20 I dislike embedding meta data with data so I like Alexey's approach a bit better, even though it doubles the key space. [sent-49, score-0.536]
wordName wordTfidf (topN-words)
[('stale', 0.486), ('cache', 0.266), ('item', 0.204), ('timeout', 0.189), ('alexey', 0.188), ('glenn', 0.16), ('expire', 0.16), ('date', 0.156), ('expiry', 0.137), ('periodically', 0.131), ('recalculate', 0.126), ('expired', 0.122), ('expires', 0.118), ('key', 0.104), ('memcached', 0.104), ('expiration', 0.099), ('cached', 0.095), ('items', 0.089), ('calculate', 0.086), ('approach', 0.086), ('window', 0.082), ('maybe', 0.078), ('update', 0.077), ('data', 0.076), ('avoid', 0.076), ('batalion', 0.073), ('evict', 0.073), ('freshness', 0.073), ('kovyrin', 0.073), ('memcachedby', 0.073), ('refrigeration', 0.073), ('thedog', 0.073), ('bit', 0.072), ('describes', 0.069), ('rails', 0.068), ('dogs', 0.068), ('effect', 0.068), ('caching', 0.068), ('defeats', 0.065), ('stats', 0.064), ('headaches', 0.063), ('hurts', 0.063), ('refreshed', 0.063), ('craig', 0.061), ('embedding', 0.061), ('dislike', 0.061), ('piling', 0.061), ('closes', 0.061), ('ruby', 0.06), ('suppose', 0.059)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999976 673 high scalability-2009-08-07-Strategy: Break Up the Memcache Dog Pile
Introduction: Update: Asynchronous HTTP cache validations . A proposed HTTP caching extension: if your application can afford to show slightly out of date content, then stale-while-revalidate can guarantee that the user will always be served directly from the cache, hence guaranteeing a consistent response-time user-experience. Caching is like aspirin for headaches. Head hurts: pop a 'sprin. Slow site: add caching. Facebook must have a lot of headaches because they popped 805 memcached servers between 10,000 web servers and 1,800 MySQL servers and they reportedly have a 99% cache hit rate. But what's the best way for you to cache for your application? It's a remarkably complex and rich topic. Alexey Kovyrin talks about one common caching problem called the Dog Pile Effect in Dog-pile Effect and How to Avoid it with Ruby on Rails . Glenn Franxman also has a Django solution in MintCache . Data is usually cached because it's too expensive to calculate for every hit. Maybe it's a gnarly S
2 0.24467488 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
Introduction: The primero recommendation for speeding up a website is almost always to add cache and more cache. And after that add a little more cache just in case. Memcached is almost always given as the recommended cache to use. What we don't often hear is how to effectively use a cache in our own products. MySQL hosted two excellent webinars (referenced below) on the subject of how to deploy and use memcached. The star of the show, other than MySQL of course, is Farhan Mashraqi of Fotolog. You may recall we did an earlier article on Fotolog in Secrets to Fotolog's Scaling Success , which was one of my personal favorites. Fotolog, as they themselves point out, is probably the largest site nobody has ever heard of, pulling in more page views than even Flickr. Fotolog has 51 instances of memcached on 21 servers with 175G in use and 254G available. As a large successful photo-blogging site they have very demanding performance and scaling requirements. To meet those requirements they've developed a
3 0.15412895 436 high scalability-2008-11-02-Strategy: How to Manage Sessions Using Memcached
Introduction: Dormando shows an enlightened middle way for storing sessions in cache and the database. Sessions are a perfect cache candidate because they are transient, smallish, and since they are usually accessed on every page access removing all that load from the database is a good thing. But as Dormando points out session caches have problems. If you remove expiration times from the cache and you run out of memory then no more logins. If a cache server fails or needs to be upgrade then you just logged out a bunch of potentially angry users. The middle ground Dormando proposes is using both the cache and the database: Reads : read from the cache first, then the database. Typical cache logic. Writes : write to memcached every time, write to the database every N seconds (assuming the data has changed). There's a small chance of data loss, but you've still greatly reduced the database load while providing reliability. Nice solution.
4 0.14318492 467 high scalability-2008-12-16-[ANN] New Open Source Cache System
Introduction: The SHOP.COM Cache System is now available at http://code.google.com/p/sccache/ The SHOP.COM Cache System is an object cache system that... * is an in-process cache and external, shared Cache * is horizontally scalable * stores cached objects to disk * supports associative keys * is non-transactional * can have any size key and any size data * does auto-GC based on TTL * is container and platform neutral It was built in-house at SHOP.COM (by me) and has powered our website for years. We are open-sourcing it in the hope that it will be useful to others and to get some help in its maintenance. This is our first open source attempt and we'd appreciate any help and comments.
5 0.14230587 495 high scalability-2009-01-17-Intro to Caching,Caching algorithms and caching frameworks part 1
Introduction: Informative and well organized post on caching . Talks about: Why do we need cache?, What is Cache?, Cache Hit, Cache Miss, Storage Cost, Retrieval Cost, Invalidation, Replacement Policy, Optimal Replacement Policy, Caching Algorithms, Least Frequently Used (LFU), Least Recently Used (LRU), Least Recently Used 2(LRU2), Two Queues, Adaptive Replacement Cache (ACR), Most Recently Used (MRU), First in First out (FIFO), Distributed caching, Measuring Cache.
6 0.13435002 703 high scalability-2009-09-12-How Google Taught Me to Cache and Cash-In
7 0.12893751 639 high scalability-2009-06-27-Scaling Twitter: Making Twitter 10000 Percent Faster
8 0.12876286 1135 high scalability-2011-10-31-15 Ways to Make Your Application Feel More Responsive under Google App Engine
10 0.12518957 836 high scalability-2010-06-04-Strategy: Cache Larger Chunks - Cache Hit Rate is a Bad Indicator
11 0.12328488 174 high scalability-2007-12-05-Product: Tugela Cache
12 0.12228229 1633 high scalability-2014-04-16-Six Lessons Learned the Hard Way About Scaling a Million User System
13 0.12107718 1017 high scalability-2011-04-06-Netflix: Run Consistency Checkers All the time to Fixup Transactions
14 0.11964408 911 high scalability-2010-09-30-More Troubles with Caching
15 0.11956121 1346 high scalability-2012-10-24-Saving Cash Using Less Cache - 90% Savings in the Caching Tier
16 0.11914101 367 high scalability-2008-08-17-Strategy: Drop Memcached, Add More MySQL Servers
17 0.11894046 729 high scalability-2009-10-28-And the winner is: MySQL or Memcached or Tokyo Tyrant?
18 0.11599802 373 high scalability-2008-08-29-Product: ScaleOut StateServer is Memcached on Steroids
19 0.11369102 602 high scalability-2009-05-17-Scaling Django Web Apps by Mike Malone
20 0.11366396 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
topicId topicWeight
[(0, 0.174), (1, 0.114), (2, -0.052), (3, -0.127), (4, 0.012), (5, 0.069), (6, 0.021), (7, -0.001), (8, -0.051), (9, -0.038), (10, -0.009), (11, -0.003), (12, -0.027), (13, 0.092), (14, -0.072), (15, -0.047), (16, -0.037), (17, -0.058), (18, 0.033), (19, -0.062), (20, -0.084), (21, 0.022), (22, 0.113), (23, 0.096), (24, -0.026), (25, 0.044), (26, 0.03), (27, 0.101), (28, -0.039), (29, -0.027), (30, -0.044), (31, 0.009), (32, -0.034), (33, 0.018), (34, -0.06), (35, 0.037), (36, 0.028), (37, 0.008), (38, 0.063), (39, 0.02), (40, 0.012), (41, 0.015), (42, -0.07), (43, -0.009), (44, 0.016), (45, -0.017), (46, -0.057), (47, 0.006), (48, -0.009), (49, -0.054)]
simIndex simValue blogId blogTitle
same-blog 1 0.97566324 673 high scalability-2009-08-07-Strategy: Break Up the Memcache Dog Pile
Introduction: Update: Asynchronous HTTP cache validations . A proposed HTTP caching extension: if your application can afford to show slightly out of date content, then stale-while-revalidate can guarantee that the user will always be served directly from the cache, hence guaranteeing a consistent response-time user-experience. Caching is like aspirin for headaches. Head hurts: pop a 'sprin. Slow site: add caching. Facebook must have a lot of headaches because they popped 805 memcached servers between 10,000 web servers and 1,800 MySQL servers and they reportedly have a 99% cache hit rate. But what's the best way for you to cache for your application? It's a remarkably complex and rich topic. Alexey Kovyrin talks about one common caching problem called the Dog Pile Effect in Dog-pile Effect and How to Avoid it with Ruby on Rails . Glenn Franxman also has a Django solution in MintCache . Data is usually cached because it's too expensive to calculate for every hit. Maybe it's a gnarly S
2 0.89431024 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
Introduction: The primero recommendation for speeding up a website is almost always to add cache and more cache. And after that add a little more cache just in case. Memcached is almost always given as the recommended cache to use. What we don't often hear is how to effectively use a cache in our own products. MySQL hosted two excellent webinars (referenced below) on the subject of how to deploy and use memcached. The star of the show, other than MySQL of course, is Farhan Mashraqi of Fotolog. You may recall we did an earlier article on Fotolog in Secrets to Fotolog's Scaling Success , which was one of my personal favorites. Fotolog, as they themselves point out, is probably the largest site nobody has ever heard of, pulling in more page views than even Flickr. Fotolog has 51 instances of memcached on 21 servers with 175G in use and 254G available. As a large successful photo-blogging site they have very demanding performance and scaling requirements. To meet those requirements they've developed a
3 0.88520151 436 high scalability-2008-11-02-Strategy: How to Manage Sessions Using Memcached
Introduction: Dormando shows an enlightened middle way for storing sessions in cache and the database. Sessions are a perfect cache candidate because they are transient, smallish, and since they are usually accessed on every page access removing all that load from the database is a good thing. But as Dormando points out session caches have problems. If you remove expiration times from the cache and you run out of memory then no more logins. If a cache server fails or needs to be upgrade then you just logged out a bunch of potentially angry users. The middle ground Dormando proposes is using both the cache and the database: Reads : read from the cache first, then the database. Typical cache logic. Writes : write to memcached every time, write to the database every N seconds (assuming the data has changed). There's a small chance of data loss, but you've still greatly reduced the database load while providing reliability. Nice solution.
4 0.86040169 836 high scalability-2010-06-04-Strategy: Cache Larger Chunks - Cache Hit Rate is a Bad Indicator
Introduction: Isn't the secret to fast, scalable websites to cache everything ? Caching, if not the secret sauce of many a website, is it at least a popular condiment. But not so fast says Peter Zaitsev in Beyond great cache hit ratio . The point Peter makes is that we read about websites like Amazon and Facebook that can literally make hundreds of calls to satisfy a user request. Even if you have an awesome cache hit ratio, pages can still be slow because making and processing all those requests takes time. The solution is to remove requests all together . You do this by caching larger blocks so you have to make fewer requests. The post has a lot of good advice worth reading: 1) Make non cacheable blocks as small as possible, 2) Maximize amount of uses of the cache item, 3) Control invalidation, 4) Multi-Get.
5 0.85421723 495 high scalability-2009-01-17-Intro to Caching,Caching algorithms and caching frameworks part 1
Introduction: Informative and well organized post on caching . Talks about: Why do we need cache?, What is Cache?, Cache Hit, Cache Miss, Storage Cost, Retrieval Cost, Invalidation, Replacement Policy, Optimal Replacement Policy, Caching Algorithms, Least Frequently Used (LFU), Least Recently Used (LRU), Least Recently Used 2(LRU2), Two Queues, Adaptive Replacement Cache (ACR), Most Recently Used (MRU), First in First out (FIFO), Distributed caching, Measuring Cache.
6 0.84962815 174 high scalability-2007-12-05-Product: Tugela Cache
7 0.84726238 1346 high scalability-2012-10-24-Saving Cash Using Less Cache - 90% Savings in the Caching Tier
8 0.83514792 911 high scalability-2010-09-30-More Troubles with Caching
9 0.83326143 467 high scalability-2008-12-16-[ANN] New Open Source Cache System
10 0.83293629 359 high scalability-2008-07-29-Ehcache - A Java Distributed Cache
11 0.80325615 1633 high scalability-2014-04-16-Six Lessons Learned the Hard Way About Scaling a Million User System
12 0.79554147 1321 high scalability-2012-09-12-Using Varnish for Paywalls: Moving Logic to the Edge
13 0.78605491 703 high scalability-2009-09-12-How Google Taught Me to Cache and Cash-In
14 0.77876377 247 high scalability-2008-02-12-We want to cache a lot :) How do we go about it ?
16 0.75608498 1620 high scalability-2014-03-27-Strategy: Cache Stored Procedure Results
17 0.75011992 908 high scalability-2010-09-28-6 Strategies for Scaling BBC iPlayer
18 0.74803054 927 high scalability-2010-10-26-Marrying memcached and NoSQL
19 0.7451117 996 high scalability-2011-02-28-A Practical Guide to Varnish - Why Varnish Matters
20 0.74123299 367 high scalability-2008-08-17-Strategy: Drop Memcached, Add More MySQL Servers
topicId topicWeight
[(1, 0.105), (2, 0.266), (10, 0.028), (30, 0.066), (38, 0.038), (40, 0.021), (77, 0.012), (79, 0.099), (85, 0.032), (90, 0.207), (94, 0.039)]
simIndex simValue blogId blogTitle
same-blog 1 0.88577455 673 high scalability-2009-08-07-Strategy: Break Up the Memcache Dog Pile
Introduction: Update: Asynchronous HTTP cache validations . A proposed HTTP caching extension: if your application can afford to show slightly out of date content, then stale-while-revalidate can guarantee that the user will always be served directly from the cache, hence guaranteeing a consistent response-time user-experience. Caching is like aspirin for headaches. Head hurts: pop a 'sprin. Slow site: add caching. Facebook must have a lot of headaches because they popped 805 memcached servers between 10,000 web servers and 1,800 MySQL servers and they reportedly have a 99% cache hit rate. But what's the best way for you to cache for your application? It's a remarkably complex and rich topic. Alexey Kovyrin talks about one common caching problem called the Dog Pile Effect in Dog-pile Effect and How to Avoid it with Ruby on Rails . Glenn Franxman also has a Django solution in MintCache . Data is usually cached because it's too expensive to calculate for every hit. Maybe it's a gnarly S
2 0.88236606 1546 high scalability-2013-11-11-Ask HS: What is a good OLAP database choice with node.js?
Introduction: This question was asked over email and I thought a larger audience might want to take a whack at it. With a business associate, I am trying to develop a financial software that handles financial reports of listed companies. We managed to create this database with all the data necessary to do financial analysis. My associate is a Business Intelligence specialist so he is keen to use OLAPs databases like Microsoft Analysis Services or Jedox Palo, which enables in-memory calculations and very fast aggregation, slicing and dicing of data or write-backs. At the same time I did an online course (MOOC) from Stanford CS184 called Startup Engineering which promoted/talked a lot about javascript and especially node.js as the language of the future for servers. As I am keen to use open-source technologies (would be keen to avoid MS SSAS) for the development of a website to access this financial data , and there are so many choices for databases out there (Postgre, MongoDB, MySQL etc..but d
3 0.87596995 188 high scalability-2007-12-19-How can I learn to scale my project?
Introduction: This is a question asked on the ycombinator list and there are some good responses. I gave a quick response, but I particularly like neilk's knock out of the park insightful answer: Read Cal Henderson's book. (I'd add in Theo's book and Release It! too) The center of your design should be the data store, not a process. You transition the data store from state to state, securely and reliably, in small increments. Avoid globals and session state. The more "pure" your function is, the easier it will be to cache or partition. Don't make your data store too smart. Calculations and renderings should happen in a separate, asynchronous process. The data store should be able to handle lots of concurrent connections. Minimize locking. (Read about optimistic locking). Protect your algorithm from the implementation of the data store, with a helper class or module or whatever. But don't (DO NOT) try to build a framework for any conceivable query. Just the ones your algorithm needs. V
4 0.84821719 344 high scalability-2008-06-09-FaceStat's Rousing Tale of Scaling Woe and Wisdom Won
Introduction: Lukas Biewald shares a fascinating slam by slam recount of how his FaceStat (upload your picture and be judged by the masses) site was battered by a link on Yahoo's main page that caused an almost instantaneous 650,000 page view jump on their site. Yahoo spends considerable effort making sure its own properties can handle the truly massive flow from the main page. Turning the Great Eye of the Internet towards an unsuspecting newborn site must be quite the diaper ready experience. Theo Schlossnagle eerily prophesized about such events in The Implications of Punctuated Scalabilium for Website Architecture : massive, unexpected and sudden traffic spikes will become more common as a fickle internet seeks ever for new entertainments (my summary). Exactly FaceStat's situation. This is also one of our first exposures to an application written on Merb, a popular Ruby on Rails competitor. For those who think Ruby is the problem, their architecture now serves 100 times the original load
5 0.84113216 1404 high scalability-2013-02-11-At Scale Even Little Wins Pay Off Big - Google and Facebook Examples
Introduction: There's a popular line of thought that says don't waste time on optimization because developing features is more important than saving money. True, you can always add resources, but at some point, especially in a more mature part of a product lifecycle: performance equals $$$. Two great examples of this evolution come from Facebook and Google. The upshot is that when you spend time and money on optimizing your tool chain you can get huge wins in performance, control, and costs. Certainly, don’t bother if you are just starting, but at some point you may want to switch to big development efforts in improving efficiency. Facebook and HipHop The Facebook example is quite well known: HipHop , a static PHP compiler released in 2010, after two years of development. PHP because Facebook implements their web tier in PHP . They've now developed a dynamic compiler, HipHop VM , using techniques like JIT, side exits, HipHop bytecode, type prediction, and parallel tracelet l
6 0.83495432 1204 high scalability-2012-03-06-Ask For Forgiveness Programming - Or How We'll Program 1000 Cores
7 0.83013701 1380 high scalability-2013-01-02-Why Pinterest Uses the Cloud Instead of Going Solo - To Be Or Not To Be
8 0.81312513 1374 high scalability-2012-12-18-Georeplication: When Bad Things Happen to Good Systems
9 0.80942315 1124 high scalability-2011-09-26-17 Techniques Used to Scale Turntable.fm and Labmeeting to Millions of Users
10 0.80941784 721 high scalability-2009-10-13-Why are Facebook, Digg, and Twitter so hard to scale?
11 0.80834788 1077 high scalability-2011-07-11-ATMCash Exploits Virtualization for Security - Immutability and Reversion
12 0.80776161 464 high scalability-2008-12-13-Strategy: Facebook Tweaks to Handle 6 Time as Many Memcached Requests
13 0.80619407 1456 high scalability-2013-05-13-The Secret to 10 Million Concurrent Connections -The Kernel is the Problem, Not the Solution
14 0.80487859 5 high scalability-2007-07-10-mixi.jp Architecture
15 0.80480713 936 high scalability-2010-11-09-Facebook Uses Non-Stored Procedures to Update Social Graphs
17 0.80254537 636 high scalability-2009-06-23-Learn How to Exploit Multiple Cores for Better Performance and Scalability
18 0.80178487 935 high scalability-2010-11-05-Hot Scalability Links For November 5th, 2010
19 0.80125272 274 high scalability-2008-03-12-YouTube Architecture
20 0.80120122 910 high scalability-2010-09-30-Facebook and Site Failures Caused by Complex, Weakly Interacting, Layered Systems