high_scalability high_scalability-2007 high_scalability-2007-174 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Tugela Cache is a cache system like memecached, but instead of storing data just in RAM, it stores data in the file system using a b-tree. You trade latency in order to have a very large cache. It's useful for sites that have caching requirements that exceed their available memory. It uses the same wire protocol as memcached so it can be dropped in without a hassle. From the website: As large MediaWiki deployments may gain performance using Memcached, at some level cost of RAM to store all objects becomes too high. In order to balance resource usage and make more use of our Apache server disks, Tugela, the distributed cached on-disk hash database, has arrived. Tugela Cache is derived from Memcached. Much of the code remains the same, but notably, these changes: Internal slab allocator replaced by BerkeleyDB B-Tree database. Expiry policy management moved to external program tugela-expire Much statistics code made obsolete. An interesting point brought up in the comme
sentIndex sentText sentNum sentScore
1 Tugela Cache is a cache system like memecached, but instead of storing data just in RAM, it stores data in the file system using a b-tree. [sent-1, score-0.358]
2 You trade latency in order to have a very large cache. [sent-2, score-0.213]
3 It's useful for sites that have caching requirements that exceed their available memory. [sent-3, score-0.137]
4 It uses the same wire protocol as memcached so it can be dropped in without a hassle. [sent-4, score-0.404]
5 From the website: As large MediaWiki deployments may gain performance using Memcached, at some level cost of RAM to store all objects becomes too high. [sent-5, score-0.259]
6 In order to balance resource usage and make more use of our Apache server disks, Tugela, the distributed cached on-disk hash database, has arrived. [sent-6, score-0.355]
7 Much of the code remains the same, but notably, these changes: Internal slab allocator replaced by BerkeleyDB B-Tree database. [sent-8, score-0.503]
8 Expiry policy management moved to external program tugela-expire Much statistics code made obsolete. [sent-9, score-0.419]
9 An interesting point brought up in the comments is using memcached with a larger cache size than physical RAM and then let the OS swap versus using a b-tree to access data on disk. [sent-10, score-1.173]
10 Nginx seems to use the "let the OS swap" approach to good effect. [sent-11, score-0.089]
11 It would be interesting to see which approach works better. [sent-12, score-0.2]
12 For an idea of how an in-process cache and a disk based cache hierarchy can work together take a look at Kevin Burton's IDEA: Hierarchy of caches for high performance AND high capacity memcached . [sent-13, score-0.859]
13 There's also an interesting variation called Memcachedb which is said to be "a better and simplified Tugela. [sent-14, score-0.448]
14 " It's more of a persistence mechanism than a cache. [sent-15, score-0.181]
wordName wordTfidf (topN-words)
[('tugela', 0.419), ('swap', 0.242), ('hierarchy', 0.231), ('cache', 0.217), ('berkeleydb', 0.19), ('notably', 0.17), ('mediawiki', 0.164), ('ram', 0.155), ('allocator', 0.151), ('slab', 0.144), ('variation', 0.144), ('exceed', 0.137), ('os', 0.13), ('expiration', 0.129), ('derived', 0.122), ('simplified', 0.12), ('wire', 0.119), ('kevin', 0.112), ('interesting', 0.111), ('remains', 0.109), ('memcached', 0.109), ('trade', 0.107), ('order', 0.106), ('policy', 0.103), ('deployments', 0.101), ('replaced', 0.099), ('dropped', 0.097), ('versus', 0.096), ('persistence', 0.096), ('brought', 0.091), ('statistics', 0.09), ('approach', 0.089), ('let', 0.087), ('gain', 0.087), ('hash', 0.086), ('mechanism', 0.085), ('caches', 0.085), ('cached', 0.083), ('enables', 0.083), ('disks', 0.081), ('external', 0.081), ('balance', 0.08), ('protocol', 0.079), ('comments', 0.078), ('moved', 0.077), ('said', 0.073), ('using', 0.071), ('internal', 0.071), ('stores', 0.07), ('program', 0.068)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 174 high scalability-2007-12-05-Product: Tugela Cache
Introduction: Tugela Cache is a cache system like memecached, but instead of storing data just in RAM, it stores data in the file system using a b-tree. You trade latency in order to have a very large cache. It's useful for sites that have caching requirements that exceed their available memory. It uses the same wire protocol as memcached so it can be dropped in without a hassle. From the website: As large MediaWiki deployments may gain performance using Memcached, at some level cost of RAM to store all objects becomes too high. In order to balance resource usage and make more use of our Apache server disks, Tugela, the distributed cached on-disk hash database, has arrived. Tugela Cache is derived from Memcached. Much of the code remains the same, but notably, these changes: Internal slab allocator replaced by BerkeleyDB B-Tree database. Expiry policy management moved to external program tugela-expire Much statistics code made obsolete. An interesting point brought up in the comme
2 0.20407937 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
Introduction: The primero recommendation for speeding up a website is almost always to add cache and more cache. And after that add a little more cache just in case. Memcached is almost always given as the recommended cache to use. What we don't often hear is how to effectively use a cache in our own products. MySQL hosted two excellent webinars (referenced below) on the subject of how to deploy and use memcached. The star of the show, other than MySQL of course, is Farhan Mashraqi of Fotolog. You may recall we did an earlier article on Fotolog in Secrets to Fotolog's Scaling Success , which was one of my personal favorites. Fotolog, as they themselves point out, is probably the largest site nobody has ever heard of, pulling in more page views than even Flickr. Fotolog has 51 instances of memcached on 21 servers with 175G in use and 254G available. As a large successful photo-blogging site they have very demanding performance and scaling requirements. To meet those requirements they've developed a
3 0.19000301 50 high scalability-2007-07-31-BerkeleyDB & other distributed high performance key-value databases
Introduction: I currently use BerkeleyDB as an embedded database http://www.oracle.com/database/berkeley-db/ a decision which was initially brought on by learning that Google used BerkeleyDB for their universal sign-on feature. Lustre looks impressive, but their white paper shows speeds of 800 files created per second, as a good number. However, BerkeleyDB on my mac mini does 200,000 row creations per second, and can be used as a distributed file system. I'm having I/O scalability issues with BerkeleyDB on one machine, and about to implement their distributed replication feature (and go multi-machine), which in effect makes it work like a distributed file system, but with local access speeds. That's why I was looking at Lustre. The key feature difference between BerkeleyDB and Lustre is that BerkeleyDB has a complete copy of all the data on each computer, making it not a viable solution for massive sized database applications. However, if you have < 1TB (ie, one disk) of total pos
4 0.18089318 467 high scalability-2008-12-16-[ANN] New Open Source Cache System
Introduction: The SHOP.COM Cache System is now available at http://code.google.com/p/sccache/ The SHOP.COM Cache System is an object cache system that... * is an in-process cache and external, shared Cache * is horizontally scalable * stores cached objects to disk * supports associative keys * is non-transactional * can have any size key and any size data * does auto-GC based on TTL * is container and platform neutral It was built in-house at SHOP.COM (by me) and has powered our website for years. We are open-sourcing it in the hope that it will be useful to others and to get some help in its maintenance. This is our first open source attempt and we'd appreciate any help and comments.
5 0.14499697 577 high scalability-2009-04-22-Gear6 Web cache - the hardware solution for working with Memcache
Introduction: The Gear6 Web Cache hybrid DRAM-flash memory architecture allows for 5-10 times more memcache memory per unit of rack space than DRAM-only configurations, and cuts memory costs by 50%. Other software enhancements include a slab allocator that is more efficient than traditional memcache implementations due to its fine-grained bucket sizing. Gear6 Web Cache also supports object sizes greater than 1 megabyte and manages evictions based on the cost of replacing objects, depending on the size and frequency of object access. It intelligently places cache instances across DRAM and flash, taking into account their different characteristics, while at the same time monitoring their health and detecting and de�allocating faulty or failing memory. Gear6 Web Cache is a Memcached protocol compliant solution that scales and accelerates web applications, reduces memory footprint, enhances availability and implements comprehensive Memcached management features. Designed to work with all popular memcac
6 0.14117044 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
7 0.13084863 436 high scalability-2008-11-02-Strategy: How to Manage Sessions Using Memcached
8 0.12992617 373 high scalability-2008-08-29-Product: ScaleOut StateServer is Memcached on Steroids
9 0.12328488 673 high scalability-2009-08-07-Strategy: Break Up the Memcache Dog Pile
10 0.12182589 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
11 0.11726651 1633 high scalability-2014-04-16-Six Lessons Learned the Hard Way About Scaling a Million User System
12 0.11287148 495 high scalability-2009-01-17-Intro to Caching,Caching algorithms and caching frameworks part 1
13 0.11164515 729 high scalability-2009-10-28-And the winner is: MySQL or Memcached or Tokyo Tyrant?
15 0.10796355 721 high scalability-2009-10-13-Why are Facebook, Digg, and Twitter so hard to scale?
16 0.1052089 248 high scalability-2008-02-13-What's your scalability plan?
17 0.1051881 1346 high scalability-2012-10-24-Saving Cash Using Less Cache - 90% Savings in the Caching Tier
18 0.10511205 367 high scalability-2008-08-17-Strategy: Drop Memcached, Add More MySQL Servers
19 0.10476514 1246 high scalability-2012-05-16-Big List of 20 Common Bottlenecks
20 0.10325227 274 high scalability-2008-03-12-YouTube Architecture
topicId topicWeight
[(0, 0.162), (1, 0.097), (2, -0.042), (3, -0.076), (4, -0.02), (5, 0.09), (6, 0.04), (7, -0.002), (8, -0.058), (9, 0.004), (10, -0.015), (11, -0.068), (12, 0.0), (13, 0.11), (14, -0.076), (15, -0.052), (16, -0.03), (17, -0.014), (18, 0.0), (19, 0.008), (20, -0.082), (21, 0.074), (22, 0.054), (23, 0.103), (24, -0.05), (25, -0.005), (26, 0.068), (27, 0.034), (28, -0.085), (29, -0.046), (30, -0.038), (31, 0.038), (32, -0.008), (33, -0.024), (34, -0.054), (35, 0.022), (36, -0.016), (37, 0.016), (38, 0.032), (39, 0.017), (40, 0.033), (41, 0.023), (42, -0.018), (43, 0.0), (44, 0.011), (45, -0.008), (46, -0.058), (47, -0.009), (48, -0.004), (49, -0.015)]
simIndex simValue blogId blogTitle
same-blog 1 0.97857761 174 high scalability-2007-12-05-Product: Tugela Cache
Introduction: Tugela Cache is a cache system like memecached, but instead of storing data just in RAM, it stores data in the file system using a b-tree. You trade latency in order to have a very large cache. It's useful for sites that have caching requirements that exceed their available memory. It uses the same wire protocol as memcached so it can be dropped in without a hassle. From the website: As large MediaWiki deployments may gain performance using Memcached, at some level cost of RAM to store all objects becomes too high. In order to balance resource usage and make more use of our Apache server disks, Tugela, the distributed cached on-disk hash database, has arrived. Tugela Cache is derived from Memcached. Much of the code remains the same, but notably, these changes: Internal slab allocator replaced by BerkeleyDB B-Tree database. Expiry policy management moved to external program tugela-expire Much statistics code made obsolete. An interesting point brought up in the comme
2 0.89850867 467 high scalability-2008-12-16-[ANN] New Open Source Cache System
Introduction: The SHOP.COM Cache System is now available at http://code.google.com/p/sccache/ The SHOP.COM Cache System is an object cache system that... * is an in-process cache and external, shared Cache * is horizontally scalable * stores cached objects to disk * supports associative keys * is non-transactional * can have any size key and any size data * does auto-GC based on TTL * is container and platform neutral It was built in-house at SHOP.COM (by me) and has powered our website for years. We are open-sourcing it in the hope that it will be useful to others and to get some help in its maintenance. This is our first open source attempt and we'd appreciate any help and comments.
3 0.8583076 436 high scalability-2008-11-02-Strategy: How to Manage Sessions Using Memcached
Introduction: Dormando shows an enlightened middle way for storing sessions in cache and the database. Sessions are a perfect cache candidate because they are transient, smallish, and since they are usually accessed on every page access removing all that load from the database is a good thing. But as Dormando points out session caches have problems. If you remove expiration times from the cache and you run out of memory then no more logins. If a cache server fails or needs to be upgrade then you just logged out a bunch of potentially angry users. The middle ground Dormando proposes is using both the cache and the database: Reads : read from the cache first, then the database. Typical cache logic. Writes : write to memcached every time, write to the database every N seconds (assuming the data has changed). There's a small chance of data loss, but you've still greatly reduced the database load while providing reliability. Nice solution.
4 0.85462755 673 high scalability-2009-08-07-Strategy: Break Up the Memcache Dog Pile
Introduction: Update: Asynchronous HTTP cache validations . A proposed HTTP caching extension: if your application can afford to show slightly out of date content, then stale-while-revalidate can guarantee that the user will always be served directly from the cache, hence guaranteeing a consistent response-time user-experience. Caching is like aspirin for headaches. Head hurts: pop a 'sprin. Slow site: add caching. Facebook must have a lot of headaches because they popped 805 memcached servers between 10,000 web servers and 1,800 MySQL servers and they reportedly have a 99% cache hit rate. But what's the best way for you to cache for your application? It's a remarkably complex and rich topic. Alexey Kovyrin talks about one common caching problem called the Dog Pile Effect in Dog-pile Effect and How to Avoid it with Ruby on Rails . Glenn Franxman also has a Django solution in MintCache . Data is usually cached because it's too expensive to calculate for every hit. Maybe it's a gnarly S
5 0.8526594 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
Introduction: The primero recommendation for speeding up a website is almost always to add cache and more cache. And after that add a little more cache just in case. Memcached is almost always given as the recommended cache to use. What we don't often hear is how to effectively use a cache in our own products. MySQL hosted two excellent webinars (referenced below) on the subject of how to deploy and use memcached. The star of the show, other than MySQL of course, is Farhan Mashraqi of Fotolog. You may recall we did an earlier article on Fotolog in Secrets to Fotolog's Scaling Success , which was one of my personal favorites. Fotolog, as they themselves point out, is probably the largest site nobody has ever heard of, pulling in more page views than even Flickr. Fotolog has 51 instances of memcached on 21 servers with 175G in use and 254G available. As a large successful photo-blogging site they have very demanding performance and scaling requirements. To meet those requirements they've developed a
6 0.84915233 495 high scalability-2009-01-17-Intro to Caching,Caching algorithms and caching frameworks part 1
7 0.845213 359 high scalability-2008-07-29-Ehcache - A Java Distributed Cache
8 0.83797842 1346 high scalability-2012-10-24-Saving Cash Using Less Cache - 90% Savings in the Caching Tier
9 0.79756647 836 high scalability-2010-06-04-Strategy: Cache Larger Chunks - Cache Hit Rate is a Bad Indicator
10 0.79269451 1633 high scalability-2014-04-16-Six Lessons Learned the Hard Way About Scaling a Million User System
11 0.77229267 367 high scalability-2008-08-17-Strategy: Drop Memcached, Add More MySQL Servers
12 0.76337749 696 high scalability-2009-09-07-Product: Infinispan - Open Source Data Grid
13 0.76211625 911 high scalability-2010-09-30-More Troubles with Caching
14 0.76150393 577 high scalability-2009-04-22-Gear6 Web cache - the hardware solution for working with Memcache
15 0.75909841 248 high scalability-2008-02-13-What's your scalability plan?
16 0.75496846 1246 high scalability-2012-05-16-Big List of 20 Common Bottlenecks
17 0.75002897 373 high scalability-2008-08-29-Product: ScaleOut StateServer is Memcached on Steroids
18 0.74911535 1462 high scalability-2013-05-22-Strategy: Stop Using Linked-Lists
19 0.74102366 1467 high scalability-2013-05-30-Google Finds NUMA Up to 20% Slower for Gmail and Websearch
topicId topicWeight
[(1, 0.155), (2, 0.313), (5, 0.26), (30, 0.046), (79, 0.119)]
simIndex simValue blogId blogTitle
1 0.93612885 534 high scalability-2009-03-12-Google TechTalk: Amdahl's Law in the Multicore Era
Introduction: Over the last several decades computer architects have been phenomenally successful turning the transistor bounty provided by Moore's Law into chips with ever increasing single-threaded performance. During many of these successful years, however, many researchers paid scant attention to multiprocessor work. Now as vendors turn to multicore chips, researchers are reacting with more papers on multi-threaded systems. While this is good, we are concerned that further work on single-thread performance will be squashed. To help understand future high-level trade-offs, we develop a corollary to Amdahl's Law for multicore chips [Hill & Marty, IEEE Computer 2008]. It models fixed chip resources for alternative designs that use symmetric cores, asymmetric cores, or dynamic techniques that allow cores to work together on sequential execution. Our results encourage multicore designers to view performance of the entire chip rather than focus on core efficiencies. Moreover, we observe that obtai
2 0.90793753 1273 high scalability-2012-06-27-Paper: Logic and Lattices for Distributed Programming
Introduction: Neil Conway from Berkeley CS is giving an advanced level talk at a meetup today in San Francisco on a new paper: Logic and Lattices for Distributed Programming - extending set logic to support CRDT-style lattices. The description of the meetup is probably the clearest introduction to the paper: Developers are increasingly choosing datastores that sacrifice strong consistency guarantees in exchange for improved performance and availability. Unfortunately, writing reliable distributed programs without the benefit of strong consistency can be very challenging. In this talk, I'll discuss work from our group at UC Berkeley that aims to make it easier to write distributed programs without relying on strong consistency. Bloom is a declarative programming language for distributed computing, while CALM is an analysis technique that identifies programs that are guaranteed to be eventually consistent. I'll then discuss our recent work on extending CALM to support a broader range of
3 0.90245533 899 high scalability-2010-09-09-How did Google Instant become Faster with 5-7X More Results Pages?
Introduction: We don't have a lot of details on how Google pulled off their technically very impressive Google Instant release, but in Google Instant behind the scenes , they did share some interesting facts: Google was serving more than a billion searches per day. With Google Instant they served 5-7X more results pages than previously. Typical search results were returned in less than a quarter of second. A team of 50+ worked on the project for an extended period of time. Although Google is associated with muscular data centers, they just didn't throw more server capacity at the problem, they worked smarter too. What were their general strategies? Increase backend server capacity. Add new caches to handle high request rates while keeping results fresh while the web is continuously crawled and re-indexed. Add User-state data to the back-ends to keep track of the results pages already shown to a given user, preventing the same results from being re-fetched repeatedly. Optim
4 0.90184784 341 high scalability-2008-06-06-GigaOm Structure 08 Conference on June 25th in San Francisco
Introduction: If you just can't get enough high scalability talk you might want to take a look GigaOm's Structure 08 conference. The slate of speakers looks appropriately interesting and San Francisco is truly magical this time of year. High Scalability readers even get a price break is you use the HIGHSCALE discount code! I'll be on vacation so I won't see you there, but it looks like a good time. For a nice change of pace consider visiting MoMA next door. Here's a blurb on the conference: A reminder to our readers about Structure 08, GigaOm's upcoming conference dedicated to web infrastructure. In addition to keynotes from leaders like Jim Crowe, chairman and CEO of Level 3 Communications and Werner Vogels, CTO of Amazon, the event will feature workshops from Google App Engine, Microsoft and a special workshop from Fenwick and West who will cover how to raise money for an infrastructure start up. Learn from the guru's at Amazon, Google, Microsoft, Sun, VMWare and more about what the future
5 0.87984776 1523 high scalability-2013-09-27-Stuff The Internet Says On Scalability For September 27, 2013
Introduction: Hey, it's HighScalability time: ( The WINLAB at Rutgers, with software defined radios tied into GENI. ) 384 cores & 32TB of RAM : Oracle's SPARC M6 Quotable Quotes: @jennyinc : 2003: "I replaced you with a set of very small shell scripts." 2013: "I replaced your scripts with a six-figure enterprise DevOps platform." @tomdale : OH: “Redis is so fast, why don’t we replace RAM with Redis?” @petrillic : OH "Promises/futures are the one-night stands of architectural constructs" nice #strangeloop @TwitterEng : "Java and Scala let Twitter readily share and modify its enormous codebase across a team of hundreds of developers." Lots of juicy numbers revealed at Structure:Europe : Netflix streams 114,000 years of video every month; Custom build Netflix boxes for its content-delivery network that contain between 100 and 150 terabytes of stor
same-blog 6 0.87644964 174 high scalability-2007-12-05-Product: Tugela Cache
7 0.86643398 1487 high scalability-2013-07-05-Stuff The Internet Says On Scalability For July 5, 2013
8 0.86613739 153 high scalability-2007-11-13-Friendster Lost Lead Because of a Failure to Scale
9 0.86334944 1247 high scalability-2012-05-18-Stuff The Internet Says On Scalability For May 18, 2012
11 0.85395586 1113 high scalability-2011-09-09-Stuff The Internet Says On Scalability For September 9, 2011
12 0.84795833 1424 high scalability-2013-03-15-Stuff The Internet Says On Scalability For March 15, 2013
13 0.82914716 485 high scalability-2009-01-05-Messaging is not just for investment banks
14 0.82087237 595 high scalability-2009-05-08-Publish-subscribe model does not scale?
15 0.80317104 126 high scalability-2007-10-20-Should you build your next website using 3tera's grid OS?
16 0.80105758 343 high scalability-2008-06-09-Apple's iPhone to Use a Centralized Push Based Notification Architecture
17 0.80095011 1369 high scalability-2012-12-10-Switch your databases to Flash storage. Now. Or you're doing it wrong.
18 0.80061716 1077 high scalability-2011-07-11-ATMCash Exploits Virtualization for Security - Immutability and Reversion
19 0.79843682 673 high scalability-2009-08-07-Strategy: Break Up the Memcache Dog Pile
20 0.79828167 696 high scalability-2009-09-07-Product: Infinispan - Open Source Data Grid