high_scalability high_scalability-2007 high_scalability-2007-104 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: SmugMug's CEO & Chief Geek Don MacAskill smugly (hard to resist) gushes over finally finding, after a long and arduous quest, their "best bang-for-the-buck storage array." It's the Dell MD300 . His in-depth explanation of why he prefers the MD3000 should help anyone with their own painful storage deliberations. His key points are: The price is right; DAS via SAS, 15 spindles at 15K rpm each, 512MB of mirrored battery-backed write cache; You can disable read caching; You can disable read-ahead prefetching; The stripe sizes are configurable up to 512KB; The controller ignores host-based flush commands by default; They support an ‘Enhanced JBOD’ mode. His reasoning for the desirability each option is astute and he even gives you the configuration options for carrying out the configuration. This is not your average CEO. Don also speculates that a three tier system using flash (system RAM + flash storage + RAID disks) is a possible future direction. Unfortunately, flash
sentIndex sentText sentNum sentScore
1 SmugMug's CEO & Chief Geek Don MacAskill smugly (hard to resist) gushes over finally finding, after a long and arduous quest, their "best bang-for-the-buck storage array. [sent-1, score-0.377]
2 His in-depth explanation of why he prefers the MD3000 should help anyone with their own painful storage deliberations. [sent-3, score-0.53]
3 His reasoning for the desirability each option is astute and he even gives you the configuration options for carrying out the configuration. [sent-5, score-0.412]
4 Don also speculates that a three tier system using flash (system RAM + flash storage + RAID disks) is a possible future direction. [sent-7, score-0.909]
5 Unfortunately, flash may not be the dream solution it has been thought to be. [sent-8, score-0.369]
6 StorageMojo talks about this in Flash vs disk at DISKCON 2007 . [sent-9, score-0.146]
wordName wordTfidf (topN-words)
[('disable', 0.301), ('flash', 0.267), ('arduous', 0.194), ('speculates', 0.194), ('breakthe', 0.182), ('das', 0.174), ('prefetching', 0.174), ('jbod', 0.168), ('macaskill', 0.168), ('spindles', 0.162), ('resist', 0.162), ('ignores', 0.158), ('sas', 0.158), ('enhanced', 0.154), ('geek', 0.154), ('prefers', 0.154), ('smugmug', 0.154), ('stripe', 0.154), ('flush', 0.151), ('mirrored', 0.151), ('storagemojo', 0.142), ('carrying', 0.138), ('quest', 0.131), ('painful', 0.129), ('reasoning', 0.122), ('rpm', 0.12), ('chief', 0.117), ('configurable', 0.113), ('ceo', 0.11), ('controller', 0.108), ('unfortunately', 0.103), ('dream', 0.102), ('commands', 0.098), ('sizes', 0.096), ('storage', 0.096), ('default', 0.095), ('raid', 0.091), ('finally', 0.087), ('tier', 0.085), ('finding', 0.083), ('disks', 0.083), ('explanation', 0.082), ('option', 0.078), ('price', 0.076), ('talks', 0.075), ('options', 0.074), ('vs', 0.071), ('anyone', 0.069), ('average', 0.067), ('points', 0.066)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 104 high scalability-2007-10-01-SmugMug Found their Perfect Storage Array
Introduction: SmugMug's CEO & Chief Geek Don MacAskill smugly (hard to resist) gushes over finally finding, after a long and arduous quest, their "best bang-for-the-buck storage array." It's the Dell MD300 . His in-depth explanation of why he prefers the MD3000 should help anyone with their own painful storage deliberations. His key points are: The price is right; DAS via SAS, 15 spindles at 15K rpm each, 512MB of mirrored battery-backed write cache; You can disable read caching; You can disable read-ahead prefetching; The stripe sizes are configurable up to 512KB; The controller ignores host-based flush commands by default; They support an ‘Enhanced JBOD’ mode. His reasoning for the desirability each option is astute and he even gives you the configuration options for carrying out the configuration. This is not your average CEO. Don also speculates that a three tier system using flash (system RAM + flash storage + RAID disks) is a possible future direction. Unfortunately, flash
2 0.14540535 1369 high scalability-2012-12-10-Switch your databases to Flash storage. Now. Or you're doing it wrong.
Introduction: This is a guest post by Brian Bulkowski , CTO and co-founder of Aerospike , a leading clustered NoSQL database, has worked in the area of high performance commodity systems since 1989. Why flash rules for databases The economics of flash memory are staggering. If you’re not using SSD, you are doing it wrong. Not quite true, but close. Some small applications fit entirely in memory – less than 100GB – great for in-memory solutions. There’s a place for rotational drives (HDD) in massive streaming analytics and petabytes of data. But for the vast space between, flash has become the only sensible option. For example, the Samsung 840 costs $180 for 250GB. The speed rating for this drive is rated by the manufacturer at 96,000 random 4K read IOPS, and 61,000 random 4K write IOPS. The Samsung 840 is not alone at this price performance. A 300GB Intel 320 is $450. An OCZ Vertex 4 256GB is $235, with the Intel being rated as slowest, but our internal testing showing
3 0.10778351 1114 high scalability-2011-09-13-Must see: 5 Steps to Scaling MongoDB (Or Any DB) in 8 Minutes
Introduction: Jared Rosoff concisely, effectively, entertainingly, and convincingly gives an 8 minute MongoDB tutorial on scaling MongoDB at Scale Out Camp . The ideas aren't just limited to MongoDB, they work for most any database: Optimize your queries; Know your working set size; Tune your file system; Choose the right disks; Shard. Here's an explanation of all 5 strategies: Optimize your queries . Computer science works. Complexity analysis works. A btree search is faster than a table scan. So analyze your queries. Use explain to see what your query is doing. If it is saying it's using a cursor then it's doing a table scan. That's slow. Look at the number of documents it looks at to satisfy a query. Look at how long it takes. Fix: add indexes. It doesn't matter if you are running on 1 or 100 servers. Know your working set size . Sticking memcache in front of your database is silly. You have lots of RAM, use it. Embed your cache in the database, which is how MongoDB works. Working set
4 0.10593037 144 high scalability-2007-11-07-What CDN would you recommend?
Introduction: Hi all, a I run a site that after a complete redesign have gotten a lot more traffic. The site provides free flash games, so the biggest traffic share goes to serving flash files (from about 100K and up to several megabytes in size each.) I currently host the entire site on a hosting provider that have no traffic limits. But since they are very cheap (yet have served me very well all the time with at least 99,9% uptime), I don't trust them in allowing me to continue consuming more and more bandwidth. I just guess I'm going to reach some internal limit they have on day, so I'm looking into moving all the flash content over to a content delivery network of some sort. Some recent traffic stats: August: 12 GB September: 22 GB October: 55 GB November: Currently 2,3 GB pr day on average, but it's rising.. I've been looking into Amazon S3, but have not decided on anything yet. So therefor I'm asking if there are any other provides I should consider, that operates withi
5 0.090951577 1207 high scalability-2012-03-12-Google: Taming the Long Latency Tail - When More Machines Equals Worse Results
Introduction: Likewise the current belief that, in the case of artificial machines the very large and the very small are equally feasible and lasting is a manifest error. Thus, for example, a small obelisk or column or other solid figure can certainly be laid down or set up without danger of breaking, while the large ones will go to pieces under the slightest provocation, and that purely on account of their own weight. -- Galileo Galileo observed how things broke if they were naively scaled up. Interestingly, Google noticed a similar pattern when building larger software systems using the same techniques used to build smaller systems. Luiz André Barroso , Distinguished Engineer at Google, talks about this fundamental property of scaling systems in his fascinating talk, Warehouse-Scale Computing: Entering the Teenage Decade . Google found the larger the scale the greater the impact of latency variability. When a request is implemented by work done in parallel, as is common with today's service
6 0.089598425 1279 high scalability-2012-07-09-Data Replication in NoSQL Databases
7 0.081737444 182 high scalability-2007-12-12-Oracle Can Do Read-Write Splitting Too
8 0.080652617 1508 high scalability-2013-08-28-Sean Hull's 20 Biggest Bottlenecks that Reduce and Slow Down Scalability
9 0.077188589 273 high scalability-2008-03-09-Best Practices for Speeding Up Your Web Site
10 0.069681197 786 high scalability-2010-03-02-Using the Ambient Cloud as an Application Runtime
11 0.066782229 239 high scalability-2008-02-04-Streaming Video on Amazon EC2?
12 0.066700846 674 high scalability-2009-08-07-The Canonical Cloud Architecture
13 0.06526868 1186 high scalability-2012-02-02-The Data-Scope Project - 6PB storage, 500GBytes-sec sequential IO, 20M IOPS, 130TFlops
14 0.064816758 1637 high scalability-2014-04-25-Stuff The Internet Says On Scalability For April 25th, 2014
15 0.063395083 656 high scalability-2009-07-16-Scalable Web Architectures and Application State
16 0.059251741 823 high scalability-2010-05-05-How will memristors change everything?
17 0.057947829 1521 high scalability-2013-09-23-Salesforce Architecture - How they Handle 1.3 Billion Transactions a Day
18 0.057942353 1129 high scalability-2011-09-30-Stuff The Internet Says On Scalability For September 30, 2011
19 0.057499494 194 high scalability-2007-12-26-Golden rule of web caching
20 0.05470277 1389 high scalability-2013-01-18-Stuff The Internet Says On Scalability For January 18, 2013
topicId topicWeight
[(0, 0.077), (1, 0.046), (2, -0.011), (3, -0.001), (4, -0.002), (5, 0.026), (6, 0.024), (7, -0.001), (8, 0.006), (9, 0.01), (10, 0.006), (11, -0.077), (12, 0.0), (13, 0.05), (14, -0.024), (15, 0.015), (16, -0.011), (17, 0.036), (18, -0.046), (19, 0.003), (20, -0.033), (21, 0.032), (22, -0.006), (23, 0.033), (24, -0.057), (25, -0.008), (26, 0.011), (27, -0.042), (28, -0.04), (29, -0.006), (30, -0.018), (31, -0.015), (32, 0.048), (33, -0.017), (34, -0.023), (35, 0.009), (36, 0.048), (37, 0.002), (38, 0.016), (39, -0.021), (40, 0.002), (41, -0.015), (42, 0.013), (43, -0.003), (44, -0.029), (45, 0.001), (46, -0.015), (47, -0.008), (48, 0.014), (49, -0.018)]
simIndex simValue blogId blogTitle
same-blog 1 0.96119666 104 high scalability-2007-10-01-SmugMug Found their Perfect Storage Array
Introduction: SmugMug's CEO & Chief Geek Don MacAskill smugly (hard to resist) gushes over finally finding, after a long and arduous quest, their "best bang-for-the-buck storage array." It's the Dell MD300 . His in-depth explanation of why he prefers the MD3000 should help anyone with their own painful storage deliberations. His key points are: The price is right; DAS via SAS, 15 spindles at 15K rpm each, 512MB of mirrored battery-backed write cache; You can disable read caching; You can disable read-ahead prefetching; The stripe sizes are configurable up to 512KB; The controller ignores host-based flush commands by default; They support an ‘Enhanced JBOD’ mode. His reasoning for the desirability each option is astute and he even gives you the configuration options for carrying out the configuration. This is not your average CEO. Don also speculates that a three tier system using flash (system RAM + flash storage + RAID disks) is a possible future direction. Unfortunately, flash
2 0.69677591 1066 high scalability-2011-06-22-It's the Fraking IOPS - 1 SSD is 44,000 IOPS, Hard Drive is 180
Introduction: Planning your next buildout and thinking SSDs are still far in the future? Still too expensive, too low density. Hard disks are cheap, familiar, and store lots of stuff. In this short and entertaining video Wikia's Artur Bergman wants to change your mind about SSDs. SSDs are for today, get with the math already. Here's Artur's logic: Wikia is all SSD in production. The new Wikia file servers have a theoretical read rate of ~10GB/sec sequential, 6GB/sec random and 1.2 million IOPs. If you can't do math or love the past, you love spinning rust. If you are awesome you love SSDs. SSDs are cheaper than drives using the most relevant metric: $/GB/IOPS. 1 SSD is 44,000 IOPS and one hard drive is 180 IOPS. Need 1 SSD instead of 50 hard drives. With 8 million files there's a 9 minute fsck. Full backup in 12 minutes (X-25M based). 4 GB/sec random read average latency 1 msec. 2.2 GB/sec random write average latency 1 msec. 50TBs of SSDs in one machine for $80,000. With the densi
3 0.68907911 1369 high scalability-2012-12-10-Switch your databases to Flash storage. Now. Or you're doing it wrong.
Introduction: This is a guest post by Brian Bulkowski , CTO and co-founder of Aerospike , a leading clustered NoSQL database, has worked in the area of high performance commodity systems since 1989. Why flash rules for databases The economics of flash memory are staggering. If you’re not using SSD, you are doing it wrong. Not quite true, but close. Some small applications fit entirely in memory – less than 100GB – great for in-memory solutions. There’s a place for rotational drives (HDD) in massive streaming analytics and petabytes of data. But for the vast space between, flash has become the only sensible option. For example, the Samsung 840 costs $180 for 250GB. The speed rating for this drive is rated by the manufacturer at 96,000 random 4K read IOPS, and 61,000 random 4K write IOPS. The Samsung 840 is not alone at this price performance. A 300GB Intel 320 is $450. An OCZ Vertex 4 256GB is $235, with the Intel being rated as slowest, but our internal testing showing
4 0.68779033 971 high scalability-2011-01-10-Riak's Bitcask - A Log-Structured Hash Table for Fast Key-Value Data
Introduction: How would you implement a key-value storage system if you were starting from scratch? The approach Basho settled on with Bitcask , their new backend for Riak, is an interesting combination of using RAM to store a hash map of file pointers to values and a log-structured file system for efficient writes. In this excellent Changelog interview , some folks from Basho describe Bitcask in more detail. The essential Bitcask: Keys are stored in memory for fast lookups. All keys must fit in RAM. Writes are append-only, which means writes are strictly sequential and do not require seeking. Writes are write-through. Every time a value is updated the data file on disk is appended and the in-memory key index is updated with the file pointer. Read queries are satisfied with O(1) random disk seeks. Latency is very predictable if all keys fit in memory because there's no random seeking around through a file. For reads, the file system cache in the kernel is used instead of writing a c
5 0.67899978 1114 high scalability-2011-09-13-Must see: 5 Steps to Scaling MongoDB (Or Any DB) in 8 Minutes
Introduction: Jared Rosoff concisely, effectively, entertainingly, and convincingly gives an 8 minute MongoDB tutorial on scaling MongoDB at Scale Out Camp . The ideas aren't just limited to MongoDB, they work for most any database: Optimize your queries; Know your working set size; Tune your file system; Choose the right disks; Shard. Here's an explanation of all 5 strategies: Optimize your queries . Computer science works. Complexity analysis works. A btree search is faster than a table scan. So analyze your queries. Use explain to see what your query is doing. If it is saying it's using a cursor then it's doing a table scan. That's slow. Look at the number of documents it looks at to satisfy a query. Look at how long it takes. Fix: add indexes. It doesn't matter if you are running on 1 or 100 servers. Know your working set size . Sticking memcache in front of your database is silly. You have lots of RAM, use it. Embed your cache in the database, which is how MongoDB works. Working set
6 0.67383051 1511 high scalability-2013-09-04-Wide Fast SATA: the Recipe for Hot Performance
7 0.67162073 1318 high scalability-2012-09-07-Stuff The Internet Says On Scalability For September 7, 2012
9 0.65590113 1475 high scalability-2013-06-13-Busting 4 Modern Hardware Myths - Are Memory, HDDs, and SSDs Really Random Access?
10 0.64917499 1279 high scalability-2012-07-09-Data Replication in NoSQL Databases
11 0.64639479 1291 high scalability-2012-07-25-Vertical Scaling Ascendant - How are SSDs Changing Architectures?
13 0.64116764 1384 high scalability-2013-01-09-The Story of How Turning Disk Into a Service Lead to a Deluge of Density
14 0.63882208 1407 high scalability-2013-02-15-Stuff The Internet Says On Scalability For February 15, 2013
15 0.62296641 174 high scalability-2007-12-05-Product: Tugela Cache
16 0.61291611 1129 high scalability-2011-09-30-Stuff The Internet Says On Scalability For September 30, 2011
17 0.61283922 1183 high scalability-2012-01-30-37signals Still Happily Scaling on Moore RAM and SSDs
19 0.61045337 1442 high scalability-2013-04-17-Tachyon - Fault Tolerant Distributed File System with 300 Times Higher Throughput than HDFS
20 0.60346866 1246 high scalability-2012-05-16-Big List of 20 Common Bottlenecks
topicId topicWeight
[(1, 0.061), (2, 0.168), (6, 0.351), (10, 0.12), (30, 0.062), (40, 0.054), (61, 0.029), (79, 0.048)]
simIndex simValue blogId blogTitle
same-blog 1 0.86578238 104 high scalability-2007-10-01-SmugMug Found their Perfect Storage Array
Introduction: SmugMug's CEO & Chief Geek Don MacAskill smugly (hard to resist) gushes over finally finding, after a long and arduous quest, their "best bang-for-the-buck storage array." It's the Dell MD300 . His in-depth explanation of why he prefers the MD3000 should help anyone with their own painful storage deliberations. His key points are: The price is right; DAS via SAS, 15 spindles at 15K rpm each, 512MB of mirrored battery-backed write cache; You can disable read caching; You can disable read-ahead prefetching; The stripe sizes are configurable up to 512KB; The controller ignores host-based flush commands by default; They support an ‘Enhanced JBOD’ mode. His reasoning for the desirability each option is astute and he even gives you the configuration options for carrying out the configuration. This is not your average CEO. Don also speculates that a three tier system using flash (system RAM + flash storage + RAID disks) is a possible future direction. Unfortunately, flash
2 0.85012436 710 high scalability-2009-09-20-PaxosLease: Diskless Paxos for Leases
Introduction: PaxosLease is a distributed algorithm for lease negotiation. It is based on Paxos, but does not require disk writes or clock synchrony. PaxosLease is used for master lease negotation in the open-source Keyspace replicated key-value store.
3 0.8283239 93 high scalability-2007-09-16-What software runs on this site?
Introduction: It's pretty slick! olla
4 0.76464826 832 high scalability-2010-05-31-Scalable federated security with Kerberos
Introduction: In my last post , I outlined considerations that need to be taken into account when choosing between a centralized and federated security model. So, how do we implement the chosen model? Based on a real-world case study, I will outline a Kerberos architecture that enables cutting-edge collaborative research through federated sharing of resources. Read more on BigDataMatters.com
5 0.71395606 529 high scalability-2009-03-10-Paper: Consensus Protocols: Paxos
Introduction: Update: Barbara Liskov’s Turing Award, and Byzantine Fault Tolerance . Henry Robinson has created an excellent series of articles on consensus protocols. We already covered his 2 Phase Commit article and he also has a 3 Phase Commit article showing how to handle 2PC under single node failures. But that is not enough! 3PC works well under node failures, but fails for network failures. So another consensus mechanism is needed that handles both network and node failures. And that's Paxos . Paxos correctly handles both types of failures, but it does this by becoming inaccessible if too many components fail. This is the "liveness" property of protocols. Paxos waits until the faults are fixed. Read queries can be handled, but updates will be blocked until the protocol thinks it can make forward progress. The liveness of Paxos is primarily dependent on network stability. In a distributed heterogeneous environment you are at risk of losing the ability to make updates. Users hate t
6 0.64135963 213 high scalability-2008-01-15-Does Sun Buying MySQL Change Your Scaling Strategy?
7 0.63548476 794 high scalability-2010-03-11-What would you like to ask Justin.tv?
8 0.57762462 1423 high scalability-2013-03-13-Iron.io Moved From Ruby to Go: 28 Servers Cut and Colossal Clusterf**ks Prevented
9 0.55315453 243 high scalability-2008-02-07-clusteradmin.blogspot.com - blog about building and administering clusters
10 0.53675145 1553 high scalability-2013-11-25-How To Make an Infinitely Scalable Relational Database Management System (RDBMS)
11 0.53670216 1036 high scalability-2011-05-06-Stuff The Internet Says On Scalability For May 6th, 2011
12 0.52481759 792 high scalability-2010-03-10-How FarmVille Scales - The Follow-up
13 0.5233326 1331 high scalability-2012-10-02-An Epic TripAdvisor Update: Why Not Run on the Cloud? The Grand Experiment.
14 0.52313417 1389 high scalability-2013-01-18-Stuff The Internet Says On Scalability For January 18, 2013
15 0.5200004 1371 high scalability-2012-12-12-Pinterest Cut Costs from $54 to $20 Per Hour by Automatically Shutting Down Systems
16 0.51979953 1631 high scalability-2014-04-14-How do you even do anything without using EBS?
17 0.51637262 1585 high scalability-2014-01-24-Stuff The Internet Says On Scalability For January 24th, 2014
18 0.5120874 157 high scalability-2007-11-16-Product: lbpool - Load Balancing JDBC Pool
19 0.51191318 1425 high scalability-2013-03-18-Beyond Threads and Callbacks - Application Architecture Pros and Cons