high_scalability high_scalability-2007 high_scalability-2007-125 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: File replication based on erasure codes can reduce total replicas size 2 times and more.
sentIndex sentText sentNum sentScore
1 File replication based on erasure codes can reduce total replicas size 2 times and more. [sent-1, score-2.655]
wordName wordTfidf (topN-words)
[('erasure', 0.554), ('codes', 0.506), ('replicas', 0.38), ('total', 0.264), ('reduce', 0.227), ('replication', 0.217), ('size', 0.216), ('file', 0.183), ('times', 0.168), ('based', 0.123)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 125 high scalability-2007-10-18-another approach to replication
Introduction: File replication based on erasure codes can reduce total replicas size 2 times and more.
2 0.41954526 1483 high scalability-2013-06-27-Paper: XORing Elephants: Novel Erasure Codes for Big Data
Introduction: Erasure codes are one of those seemingly magical mathematical creations that with the developments described in the paper XORing Elephants: Novel Erasure Codes for Big Data , are set to replace triple replication as the data storage protection mechanism of choice. The result says Robin Harris (StorageMojo) in an excellent article, Facebook’s advanced erasure codes : "WebCos will be able to store massive amounts of data more efficiently than ever before. Bad news: so will anyone else." Robin says with cheap disks triple replication made sense and was economical. With ever bigger BigData the overhead has become costly. But erasure codes have always suffered from unacceptably long time to repair times. This paper describes new Locally Repairable Codes (LRCs) that are efficiently repairable in disk I/O and bandwidth requirements: These systems are now designed to survive the loss of up to four storage elements – disks, servers, nodes or even entire data centers – without losing
3 0.12842001 19 high scalability-2007-07-16-Paper: Replication Under Scalable Hashing
Introduction: Replication Under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution From the abstract: Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or disks. RUSH algorithms distribute objects to servers according to user-specified server weighting. While all RUSH variants support addition of servers to the system, different variants have different characteristics with respect to lookup time in petabyte-scale systems, performance with mirroring (as opposed to redundancy codes), and storage server removal. All RUSH variants redistribute as few objects as possible when new servers are added or existing servers
4 0.10464962 1330 high scalability-2012-09-28-Stuff The Internet Says On Scalability For September 28, 2012
Introduction: It's HighScalability Time: Quotable Quotes: @dbasch : The world is full of "scalability engineers" who would die from an orgasm if their software ever saw 10,000 requests in a day. @mtnygard : “Scaling issues are always expressed as a queue backing up somewhere.” —@moonpolysoft #strangeloop @rbranson : If your data fits in main memory, you're doing it wrong. #strangeloop @peakscale : Using schemaless DBs an "overreaction" & "confuses the poor impl. of schemas with the value that schemas provide" @adrianco : GM: Performance analysis is complicated by your brain thinking LINEARLY about a computer system that is NONLINEAR. @littleidea : it's better to have infinite scalability and not need it, than to need infinite scalability and not have it Looks like Google is on the right track with their language understanding efforts. How hierarchical is language use : In this paper, we review evidence from the recen
5 0.090946764 1318 high scalability-2012-09-07-Stuff The Internet Says On Scalability For September 7, 2012
Introduction: It's HighScalability Time: Quotable Quotes: Where did all the supercomputers go? Inside Intel. @Jacattell : I love the smell of high scalability in the morning :-) @nkohari : Post on HN about GitHub scalability. Top comment? “…someone wasted valuable time making the dashboard look so pretty” Evolution of SoundCloud’s Architecture : The way we develop SoundCloud is to identify the points of scale then isolate and optimize the read and write paths individually, in anticipation of the next magnitude of growth. How We Build Our 60-Node (Almost Distributed Web Crawler . Semantics3 crawls 1-3 million pages a day at a cost of ~$3 a day (excluding storage costs) using micro-instances, Grearman, redis, perl, chef, and capistrano. Werner Vogels continues his 50 Shades of Programming book club with Back-to-Basics Weekend Reading - Granularity of locks . Highlight is a touching remembrance of Jim Gray. Speaking of locks and stories, the MySQL Per
6 0.087383762 368 high scalability-2008-08-17-Wuala - P2P Online Storage Cloud
7 0.085579649 507 high scalability-2009-02-03-Paper: Optimistic Replication
8 0.080286279 1407 high scalability-2013-02-15-Stuff The Internet Says On Scalability For February 15, 2013
9 0.074097738 963 high scalability-2010-12-23-Paper: CRDTs: Consistency without concurrency control
10 0.073705047 254 high scalability-2008-02-19-Hadoop Getting Closer to 1.0 Release
11 0.068657666 786 high scalability-2010-03-02-Using the Ambient Cloud as an Application Runtime
12 0.068481527 676 high scalability-2009-08-08-Yahoo!'s PNUTS Database: Too Hot, Too Cold or Just Right?
13 0.067195132 1633 high scalability-2014-04-16-Six Lessons Learned the Hard Way About Scaling a Million User System
14 0.066994503 1345 high scalability-2012-10-22-Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale
15 0.066669323 1278 high scalability-2012-07-06-Stuff The Internet Says On Scalability For July 6, 2012
16 0.065511279 1045 high scalability-2011-05-20-Stuff The Internet Says On Scalability For May 20, 2011
17 0.064180441 1041 high scalability-2011-05-15-Building a Database remote availability site
18 0.063712686 889 high scalability-2010-08-30-Pomegranate - Storing Billions and Billions of Tiny Little Files
19 0.062305558 1233 high scalability-2012-04-25-The Anatomy of Search Technology: blekko’s NoSQL database
20 0.061856207 1311 high scalability-2012-08-24-Stuff The Internet Says On Scalability For August 24, 2012
topicId topicWeight
[(0, 0.044), (1, 0.042), (2, -0.016), (3, -0.001), (4, -0.019), (5, 0.056), (6, 0.035), (7, -0.006), (8, -0.015), (9, 0.008), (10, 0.012), (11, -0.034), (12, 0.002), (13, -0.016), (14, 0.002), (15, 0.062), (16, -0.028), (17, 0.006), (18, -0.012), (19, 0.013), (20, 0.009), (21, 0.058), (22, -0.013), (23, 0.042), (24, -0.063), (25, -0.015), (26, 0.009), (27, -0.006), (28, -0.027), (29, -0.043), (30, -0.026), (31, -0.004), (32, -0.012), (33, 0.001), (34, -0.039), (35, 0.02), (36, 0.05), (37, -0.004), (38, -0.052), (39, -0.024), (40, 0.011), (41, -0.043), (42, 0.007), (43, -0.021), (44, -0.058), (45, 0.017), (46, 0.005), (47, 0.046), (48, 0.021), (49, 0.019)]
simIndex simValue blogId blogTitle
same-blog 1 0.98076105 125 high scalability-2007-10-18-another approach to replication
Introduction: File replication based on erasure codes can reduce total replicas size 2 times and more.
2 0.67595702 1483 high scalability-2013-06-27-Paper: XORing Elephants: Novel Erasure Codes for Big Data
Introduction: Erasure codes are one of those seemingly magical mathematical creations that with the developments described in the paper XORing Elephants: Novel Erasure Codes for Big Data , are set to replace triple replication as the data storage protection mechanism of choice. The result says Robin Harris (StorageMojo) in an excellent article, Facebook’s advanced erasure codes : "WebCos will be able to store massive amounts of data more efficiently than ever before. Bad news: so will anyone else." Robin says with cheap disks triple replication made sense and was economical. With ever bigger BigData the overhead has become costly. But erasure codes have always suffered from unacceptably long time to repair times. This paper describes new Locally Repairable Codes (LRCs) that are efficiently repairable in disk I/O and bandwidth requirements: These systems are now designed to survive the loss of up to four storage elements – disks, servers, nodes or even entire data centers – without losing
3 0.61197388 889 high scalability-2010-08-30-Pomegranate - Storing Billions and Billions of Tiny Little Files
Introduction: Pomegranate is a novel distributed file system built over distributed tabular storage that acts an awful lot like a NoSQL system. It's targeted at increasing the performance of tiny object access in order to support applications like online photo and micro-blog services, which require high concurrency, high throughput, and low latency. Their tests seem to indicate it works: We have demonstrate that file system over tabular storage performs well for highly concurrent access. In our test cluster, we observed linearly increased more than 100,000 aggregate read and write requests served per second ( RPS ). Rather than sitting atop the file system like almost every other K-V store, Pomegranate is baked into file system. The idea is that the file system API is common to every platform so it wouldn't require a separate API to use. Every application could use it out of the box. The features of Pomegranate are: It handles billions of small files efficiently, even in on
4 0.60136098 53 high scalability-2007-08-01-Product: MogileFS
Introduction: MogileFS is an open source distributed filesystem. Its properties and features include: Application level, No single point of failure, Automatic file replication, Better than RAID, Flat Namespace, Shared-Nothing, No RAID required, Local filesystem agnostic.
5 0.5770368 104 high scalability-2007-10-01-SmugMug Found their Perfect Storage Array
Introduction: SmugMug's CEO & Chief Geek Don MacAskill smugly (hard to resist) gushes over finally finding, after a long and arduous quest, their "best bang-for-the-buck storage array." It's the Dell MD300 . His in-depth explanation of why he prefers the MD3000 should help anyone with their own painful storage deliberations. His key points are: The price is right; DAS via SAS, 15 spindles at 15K rpm each, 512MB of mirrored battery-backed write cache; You can disable read caching; You can disable read-ahead prefetching; The stripe sizes are configurable up to 512KB; The controller ignores host-based flush commands by default; They support an ‘Enhanced JBOD’ mode. His reasoning for the desirability each option is astute and he even gives you the configuration options for carrying out the configuration. This is not your average CEO. Don also speculates that a three tier system using flash (system RAM + flash storage + RAID disks) is a possible future direction. Unfortunately, flash
6 0.56594574 368 high scalability-2008-08-17-Wuala - P2P Online Storage Cloud
7 0.56266546 103 high scalability-2007-09-28-Kosmos File System (KFS) is a New High End Google File System Option
8 0.54902351 971 high scalability-2011-01-10-Riak's Bitcask - A Log-Structured Hash Table for Fast Key-Value Data
9 0.54391211 19 high scalability-2007-07-16-Paper: Replication Under Scalable Hashing
10 0.52667439 278 high scalability-2008-03-16-Product: GlusterFS
11 0.52292132 756 high scalability-2009-12-30-Terrastore - Scalable, elastic, consistent document store.
12 0.52222812 50 high scalability-2007-07-31-BerkeleyDB & other distributed high performance key-value databases
13 0.50644296 1442 high scalability-2013-04-17-Tachyon - Fault Tolerant Distributed File System with 300 Times Higher Throughput than HDFS
14 0.49624699 1279 high scalability-2012-07-09-Data Replication in NoSQL Databases
15 0.49167389 20 high scalability-2007-07-16-Paper: The Clustered Storage Revolution
16 0.48728409 1236 high scalability-2012-04-30-Masstree - Much Faster than MongoDB, VoltDB, Redis, and Competitive with Memcached
17 0.4812791 705 high scalability-2009-09-16-Paper: A practical scalable distributed B-tree
18 0.47804493 1463 high scalability-2013-05-23-Paper: Calvin: Fast Distributed Transactions for Partitioned Database Systems
19 0.47797063 566 high scalability-2009-04-13-High Performance Web Pages – Real World Examples: Netflix Case Study
topicId topicWeight
[(2, 0.183), (73, 0.562)]
simIndex simValue blogId blogTitle
1 0.88672334 217 high scalability-2008-01-17-Load Balancing of web server traffic
Introduction: How to detect Congestion occurence in the network? Parameter of Load Balancer?
same-blog 2 0.7700296 125 high scalability-2007-10-18-another approach to replication
Introduction: File replication based on erasure codes can reduce total replicas size 2 times and more.
3 0.70741308 333 high scalability-2008-05-28-Webinar: Designing and Implementing Scalable Applications with Memcached and MySQL
Introduction: The following technical Webinar could be of interest to the community. WHO: Farhan "Frank" Mashraqi, Director of Business Operations and Technical Strategy, Fotolog Inc Monty Taylor, Senior Consultant, Sun Microsystems Jimmy Guerrero, Sr Product Marketing Manager, Sun Microsystems - Database Group WHAT: Designing and Implementing Scalable Applications with Memcached and MySQL web presentation. WHEN: Thursday, May 29, 2008, 10:00 am PST, 1:00 pm EST, 18:00 GMT The presentation will be approximately 45 minutes long followed by Q&A.; Check out the details here !
Introduction: It's time to do something a little different and for me that doesn't mean cutting off my hair and joining a monastery, nor does it mean buying a cherry red convertible (yet), it means doing a webinar! On December 14th, 2:00 PM - 3:00 PM EST, I'll be hosting What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications . The webinar is sponsored by VoltDB, but it will be completely vendor independent, as that's the only honor preserving and technically accurate way of doing these things. The webinar will run about 60 minutes, with 40 minutes of speechifying and 20 minutes for questions. The hashtag for the event on Twitter will be SQLNoSQL . I'll be monitoring that hashtag if you have any suggestions for the webinar or if you would like to ask questions during the webinar. The motivation for me to do the webinar was a talk I had with another audience member at the NoSQL Evening in Palo Alto . He said he came from a Java background and was confused ab
Introduction: It's time to do something a little different and for me that doesn't mean cutting off my hair and joining a monastery, nor does it mean buying a cherry red convertible (yet), it means doing a webinar! On December 14th, 2:00 PM - 3:00 PM EST, I'll be hosting What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications . The webinar is sponsored by VoltDB, but it will be completely vendor independent, as that's the only honor preserving and technically accurate way of doing these things. The webinar will run about 60 minutes, with 40 minutes of speechifying and 20 minutes for questions. The hashtag for the event on Twitter will be SQLNoSQL . I'll be monitoring that hashtag if you have any suggestions for the webinar or if you would like to ask questions during the webinar. The motivation for me to do the webinar was a talk I had with another audience member at the NoSQL Evening in Palo Alto . He said he came from a Java background and was confused ab
6 0.54850364 471 high scalability-2008-12-19-Gigaspaces curbs latency outliers with Java Real Time
7 0.5387876 1587 high scalability-2014-01-29-10 Things Bitly Should Have Monitored
8 0.49381578 1196 high scalability-2012-02-20-Berkeley DB Architecture - NoSQL Before NoSQL was Cool
9 0.47010213 1175 high scalability-2012-01-17-Paper: Feeding Frenzy: Selectively Materializing Users’ Event Feeds
10 0.42534661 795 high scalability-2010-03-16-1 Billion Reasons Why Adobe Chose HBase
11 0.416035 284 high scalability-2008-03-19-RAD Lab is Creating a Datacenter Operating System
12 0.39045292 33 high scalability-2007-07-26-ThemBid Architecture
13 0.38494593 980 high scalability-2011-01-28-Stuff The Internet Says On Scalability For January 28, 2011
14 0.36942622 192 high scalability-2007-12-25-IBMer Says LAMP Can't Scale
15 0.35783347 9 high scalability-2007-07-15-Blog: Occam’s Razor by Avinash Kaushik
16 0.34586582 986 high scalability-2011-02-10-Database Isolation Levels And Their Effects on Performance and Scalability
17 0.33784211 1642 high scalability-2014-05-02-Stuff The Internet Says On Scalability For May 2nd, 2014
18 0.33249477 1313 high scalability-2012-08-28-Making Hadoop Run Faster
19 0.30893159 56 high scalability-2007-08-03-Running Hadoop MapReduce on Amazon EC2 and Amazon S3
20 0.30893159 565 high scalability-2009-04-13-Benchmark for keeping data in browser in AJAX projects