high_scalability high_scalability-2010 high_scalability-2010-795 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Cosmin Lehene wrote two excellent articles on Adobe's experiences with HBase: Why we’re using HBase: Part 1 and Why we’re using HBase: Part 2 . Adobe needed a generic, real-time, structured data storage and processing system that could handle any data volume, with access times under 50ms, with no downtime and no data loss . The article goes into great detail about their experiences with HBase and their evaluation process, providing a "well reasoned impartial use case from a commercial user". It talks about failure handling, availability, write performance, read performance, random reads, sequential scans, and consistency. One of the knocks against HBase has been it's complexity, as it has many parts that need installation and configuration. All is not lost according to the Adobe team: HBase is more complex than other systems (you need Hadoop, Zookeeper, cluster machines have multiple roles). We believe that for HBase, this is not accidental complexity and that the argu
sentIndex sentText sentNum sentScore
1 Cosmin Lehene wrote two excellent articles on Adobe's experiences with HBase: Why we’re using HBase: Part 1 and Why we’re using HBase: Part 2 . [sent-1, score-0.23]
2 Adobe needed a generic, real-time, structured data storage and processing system that could handle any data volume, with access times under 50ms, with no downtime and no data loss . [sent-2, score-0.226]
3 The article goes into great detail about their experiences with HBase and their evaluation process, providing a "well reasoned impartial use case from a commercial user". [sent-3, score-0.707]
4 It talks about failure handling, availability, write performance, read performance, random reads, sequential scans, and consistency. [sent-4, score-0.197]
5 One of the knocks against HBase has been it's complexity, as it has many parts that need installation and configuration. [sent-5, score-0.219]
6 All is not lost according to the Adobe team: HBase is more complex than other systems (you need Hadoop, Zookeeper, cluster machines have multiple roles). [sent-6, score-0.329]
7 We believe that for HBase, this is not accidental complexity and that the argument that “HBase is not a good choice because it is complex” is irrelevant. [sent-7, score-0.345]
8 Relying on decoupled components plays nice with the Unix philosophy: do one thing and do it well. [sent-9, score-0.206]
9 Distributed storage is delegated to HDFS, so is distributed processing, cluster state goes to Zookeeper. [sent-10, score-0.327]
10 All these systems are developed and tested separately, and are good at what they do. [sent-11, score-0.075]
11 More than that, this allows you to scale your cluster on separate vectors. [sent-12, score-0.188]
12 This is not optimal, but it allows for incremental investment in either spindles, CPU or RAM. [sent-13, score-0.157]
13 Highly recommended, especially if you need some sort of balance to the recent gush of Cassandra articles. [sent-15, score-0.125]
wordName wordTfidf (topN-words)
[('hbase', 0.573), ('adobe', 0.373), ('accidental', 0.153), ('cosmin', 0.153), ('impartial', 0.153), ('knocks', 0.137), ('spindles', 0.128), ('outweigh', 0.128), ('delegated', 0.128), ('reasoned', 0.128), ('scans', 0.121), ('decoupled', 0.121), ('experiences', 0.119), ('articles', 0.111), ('cluster', 0.11), ('complexity', 0.108), ('separately', 0.103), ('recommended', 0.097), ('unix', 0.094), ('philosophy', 0.092), ('roles', 0.091), ('relying', 0.091), ('hdfs', 0.089), ('goes', 0.089), ('zookeeper', 0.086), ('plays', 0.085), ('argument', 0.084), ('installation', 0.082), ('commercial', 0.081), ('downtime', 0.079), ('incremental', 0.079), ('processing', 0.078), ('allows', 0.078), ('sequential', 0.077), ('tested', 0.075), ('complex', 0.074), ('according', 0.073), ('lost', 0.072), ('wrote', 0.072), ('evaluation', 0.07), ('structured', 0.069), ('volume', 0.067), ('detail', 0.067), ('advantages', 0.066), ('balance', 0.065), ('part', 0.064), ('optimal', 0.061), ('random', 0.061), ('recent', 0.06), ('talks', 0.059)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 795 high scalability-2010-03-16-1 Billion Reasons Why Adobe Chose HBase
Introduction: Cosmin Lehene wrote two excellent articles on Adobe's experiences with HBase: Why we’re using HBase: Part 1 and Why we’re using HBase: Part 2 . Adobe needed a generic, real-time, structured data storage and processing system that could handle any data volume, with access times under 50ms, with no downtime and no data loss . The article goes into great detail about their experiences with HBase and their evaluation process, providing a "well reasoned impartial use case from a commercial user". It talks about failure handling, availability, write performance, read performance, random reads, sequential scans, and consistency. One of the knocks against HBase has been it's complexity, as it has many parts that need installation and configuration. All is not lost according to the Adobe team: HBase is more complex than other systems (you need Hadoop, Zookeeper, cluster machines have multiple roles). We believe that for HBase, this is not accidental complexity and that the argu
Introduction: You may have read somewhere that Facebook has introduced a new Social Inbox integrating email, IM, SMS, text messages, on-site Facebook messages. All-in-all they need to store over 135 billion messages a month. Where do they store all that stuff? Facebook's Kannan Muthukkaruppan gives the surprise answer in The Underlying Technology of Messages : HBase . HBase beat out MySQL, Cassandra, and a few others. Why a surprise? Facebook created Cassandra and it was purpose built for an inbox type application, but they found Cassandra's eventual consistency model wasn't a good match for their new real-time Messages product. Facebook also has an extensive MySQL infrastructure , but they found performance suffered as data set and indexes grew larger. And they could have built their own, but they chose HBase. HBase is a scaleout table store supporting very high rates of row-level updates over massive amounts of data . Exactly what is needed for a Messaging system. HBase is also a colu
Introduction: This is a guest post by Doug Judd , original creator of Hypertable and the CEO of Hypertable, Inc. Hypertable delivers 2X better throughput in most tests -- HBase fails 41 and 167 billion record insert tests, overwhelmed by garbage collection -- Both systems deliver similar results for random read uniform test We recently conducted a test comparing the performance of Hypertable ( @hypertable ) version 0.9.5.5 to that of HBase ( @HBase ) version 0.90.4 (CDH3u2) running Zookeeper 3.3.4. In this post, we summarize the results and offer explanations for the discrepancies. For the full test report, see Hypertable vs. HBase II . Introduction Hypertable and HBase are both open source, scalable databases modeled after Google's proprietary Bigtable database. The primary difference between the two systems is that Hypertable is written in C++, while HBase is written in Java. We modeled this test after the one described in section 7 of the Bigtable paper and tuned both systems fo
4 0.2078957 1109 high scalability-2011-09-02-Stuff The Internet Says On Scalability For September 2, 2011
Introduction: Scale the modern way / No brush / No lather / No rub-in / Big tube 35 cents - Drug stores / HighScalability: 8868 Tweets per second during VMAs ; Facebook: 250 million photos uploaded each day ; Earth: 7 Billion People Strong Potent quotables: @kevinweil : Wow, 8868 Tweets per second last night during the #VMAs. And that's just the writes -- imagine how many reads we were doing! @tristanbergh : #NoSQL isn't cool, it's a working kludge of existing architectures, bowing to the current tech limits, not transcending them @krishnan : I would love to switch the backend infra to Amazon anytime but our top 20 customers will not allow us @ianozsvald : Learning about all the horrible things that happen when you don't plan (@socialtiesapp) for scalability. Trying to be creative now... After a particularly difficult Jeopardy match, Watson asked IBM to make him a new cognitive chip so he could conti
5 0.19230427 650 high scalability-2009-07-02-Product: Hbase
Introduction: Update 3: Presentation from the NoSQL Conference : slides , video . Update 2: Jim Wilson helps with the Understanding HBase and BigTable by explaining them from a "conceptual standpoint." Update: InfoQ interview: HBase Leads Discuss Hadoop, BigTable and Distributed Databases . "MapReduce (both Google's and Hadoop's) is ideal for processing huge amounts of data with sizes that would not fit in a traditional database. Neither is appropriate for transaction/single request processing." Hbase is the open source answer to BigTable, Google's highly scalable distributed database. It is built on top of Hadoop ( product ), which implements functionality similar to Google's GFS and Map/Reduce systems. Both Google's GFS and Hadoop's HDFS provide a mechanism to reliably store large amounts of data. However, there is not really a mechanism for organizing the data and accessing only the parts that are of interest to a particular application. Bigtable (and Hbase) provide a means for
6 0.15722568 1000 high scalability-2011-03-08-Medialets Architecture - Defeating the Daunting Mobile Device Data Deluge
7 0.15258594 1008 high scalability-2011-03-22-Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day
8 0.14138062 1020 high scalability-2011-04-12-Caching and Processing 2TB Mozilla Crash Reports in memory with Hazelcast
9 0.13550764 1191 high scalability-2012-02-13-Tumblr Architecture - 15 Billion Page Views a Month and Harder to Scale than Twitter
11 0.13399251 1461 high scalability-2013-05-20-The Tumblr Architecture Yahoo Bought for a Cool Billion Dollars
12 0.12839848 1297 high scalability-2012-08-03-Stuff The Internet Says On Scalability For August 3, 2012
13 0.12723155 1151 high scalability-2011-12-05-Stuff The Internet Says On Scalability For December 5, 2011
15 0.11412726 1375 high scalability-2012-12-21-Stuff The Internet Says On Scalability For December 21, 2012
16 0.11258804 1262 high scalability-2012-06-11-Monday Fun: Seven Databases in Song
17 0.10077517 666 high scalability-2009-07-30-Learn How to Think at Scale
18 0.1002063 350 high scalability-2008-07-15-ZooKeeper - A Reliable, Scalable Distributed Coordination System
19 0.098288015 848 high scalability-2010-06-25-Hot Scalability Links for June 25, 2010
20 0.09669055 1499 high scalability-2013-08-09-Stuff The Internet Says On Scalability For August 9, 2013
topicId topicWeight
[(0, 0.124), (1, 0.079), (2, -0.004), (3, 0.042), (4, 0.031), (5, 0.081), (6, 0.02), (7, -0.013), (8, 0.072), (9, 0.052), (10, 0.043), (11, 0.05), (12, 0.067), (13, -0.084), (14, -0.029), (15, 0.06), (16, 0.001), (17, -0.081), (18, -0.088), (19, -0.041), (20, -0.011), (21, 0.054), (22, -0.031), (23, -0.042), (24, -0.026), (25, -0.041), (26, 0.083), (27, 0.017), (28, -0.046), (29, -0.014), (30, -0.002), (31, 0.101), (32, 0.087), (33, -0.057), (34, 0.015), (35, 0.077), (36, 0.002), (37, 0.049), (38, 0.006), (39, -0.012), (40, 0.016), (41, 0.025), (42, 0.031), (43, 0.054), (44, -0.015), (45, 0.058), (46, 0.003), (47, 0.015), (48, 0.019), (49, -0.021)]
simIndex simValue blogId blogTitle
same-blog 1 0.94741893 795 high scalability-2010-03-16-1 Billion Reasons Why Adobe Chose HBase
Introduction: Cosmin Lehene wrote two excellent articles on Adobe's experiences with HBase: Why we’re using HBase: Part 1 and Why we’re using HBase: Part 2 . Adobe needed a generic, real-time, structured data storage and processing system that could handle any data volume, with access times under 50ms, with no downtime and no data loss . The article goes into great detail about their experiences with HBase and their evaluation process, providing a "well reasoned impartial use case from a commercial user". It talks about failure handling, availability, write performance, read performance, random reads, sequential scans, and consistency. One of the knocks against HBase has been it's complexity, as it has many parts that need installation and configuration. All is not lost according to the Adobe team: HBase is more complex than other systems (you need Hadoop, Zookeeper, cluster machines have multiple roles). We believe that for HBase, this is not accidental complexity and that the argu
Introduction: You may have read somewhere that Facebook has introduced a new Social Inbox integrating email, IM, SMS, text messages, on-site Facebook messages. All-in-all they need to store over 135 billion messages a month. Where do they store all that stuff? Facebook's Kannan Muthukkaruppan gives the surprise answer in The Underlying Technology of Messages : HBase . HBase beat out MySQL, Cassandra, and a few others. Why a surprise? Facebook created Cassandra and it was purpose built for an inbox type application, but they found Cassandra's eventual consistency model wasn't a good match for their new real-time Messages product. Facebook also has an extensive MySQL infrastructure , but they found performance suffered as data set and indexes grew larger. And they could have built their own, but they chose HBase. HBase is a scaleout table store supporting very high rates of row-level updates over massive amounts of data . Exactly what is needed for a Messaging system. HBase is also a colu
3 0.73672104 647 high scalability-2009-07-02-Hypertable is a New BigTable Clone that Runs on HDFS or KFS
Introduction: Update 3 : Presentation from the NoSQL conference : slides , video 1 , video 2 . Update 2 : The folks at Hypertable would like you to know that Hypertable is now officially sponsored by Baidu , China’s Leading Search Engine. As a sponsor of Hypertable, Baidu has committed an industrious team of engineers, numerous servers, and support resources to improve the quality and development of the open source technology. Update : InfoQ interview on Hypertable Lead Discusses Hadoop and Distributed Databases . Hypertable differs from HBase in that it is a higher performance implementation of Bigtable. Skrentablog gives the heads up on Hypertable , Zvents' open-source BigTable clone. It's written in C++ and can run on top of either HDFS or KFS. Performance looks encouraging at 28M rows of data inserted at a per-node write rate of 7mb/sec .
Introduction: This is a guest post by Doug Judd , original creator of Hypertable and the CEO of Hypertable, Inc. Hypertable delivers 2X better throughput in most tests -- HBase fails 41 and 167 billion record insert tests, overwhelmed by garbage collection -- Both systems deliver similar results for random read uniform test We recently conducted a test comparing the performance of Hypertable ( @hypertable ) version 0.9.5.5 to that of HBase ( @HBase ) version 0.90.4 (CDH3u2) running Zookeeper 3.3.4. In this post, we summarize the results and offer explanations for the discrepancies. For the full test report, see Hypertable vs. HBase II . Introduction Hypertable and HBase are both open source, scalable databases modeled after Google's proprietary Bigtable database. The primary difference between the two systems is that Hypertable is written in C++, while HBase is written in Java. We modeled this test after the one described in section 7 of the Bigtable paper and tuned both systems fo
5 0.63847792 1020 high scalability-2011-04-12-Caching and Processing 2TB Mozilla Crash Reports in memory with Hazelcast
Introduction: Mozilla processes TB's of Firefox crash reports daily using HBase, Hadoop, Python and Thrift protocol. The project is called Socorro , a system for collecting, processing, and displaying crash reports from clients. Today the Socorro application stores about 2.6 million crash reports per day. During peak traffic, it receives about 2.5K crashes per minute. In this article we are going to demonstrate a proof of concept showing how Mozilla could integrate Hazelcast into Socorro and achieve caching and processing 2TB of crash reports with 50 node Hazelcast cluster. The video for the demo is available here . Currently, Socorro has pythonic collectors, processors, and middleware that communicate with HBase via the Thrift protocol. One of the biggest limitations of the current architecture is that it is very sensitive to latency or outages on the HBase side. If the collectors cannot store an item in HBase then they will store it on local disk and it will not be accessible to th
6 0.62873191 650 high scalability-2009-07-02-Product: Hbase
7 0.6163137 1242 high scalability-2012-05-09-Cell Architectures
8 0.61229724 649 high scalability-2009-07-02-Product: Facebook's Cassandra - A Massive Distributed Store
9 0.6004945 1109 high scalability-2011-09-02-Stuff The Internet Says On Scalability For September 2, 2011
10 0.5927397 1461 high scalability-2013-05-20-The Tumblr Architecture Yahoo Bought for a Cool Billion Dollars
11 0.5920608 1191 high scalability-2012-02-13-Tumblr Architecture - 15 Billion Page Views a Month and Harder to Scale than Twitter
13 0.5869509 1297 high scalability-2012-08-03-Stuff The Internet Says On Scalability For August 3, 2012
14 0.58409852 1265 high scalability-2012-06-15-Stuff The Internet Says On Scalability For June 15, 2012
15 0.57687175 1042 high scalability-2011-05-17-Facebook: An Example Canonical Architecture for Scaling Billions of Messages
16 0.57163972 254 high scalability-2008-02-19-Hadoop Getting Closer to 1.0 Release
17 0.56479001 601 high scalability-2009-05-17-Product: Hadoop
18 0.55913758 651 high scalability-2009-07-02-Product: Project Voldemort - A Distributed Database
19 0.55518615 1076 high scalability-2011-07-08-Stuff The Internet Says On Scalability For July 8, 2011
20 0.55241919 732 high scalability-2009-10-29-Digg - Looking to the Future with Cassandra
topicId topicWeight
[(1, 0.067), (2, 0.226), (10, 0.083), (30, 0.032), (61, 0.143), (73, 0.244), (79, 0.095)]
simIndex simValue blogId blogTitle
1 0.93538195 125 high scalability-2007-10-18-another approach to replication
Introduction: File replication based on erasure codes can reduce total replicas size 2 times and more.
Introduction: It's time to do something a little different and for me that doesn't mean cutting off my hair and joining a monastery, nor does it mean buying a cherry red convertible (yet), it means doing a webinar! On December 14th, 2:00 PM - 3:00 PM EST, I'll be hosting What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications . The webinar is sponsored by VoltDB, but it will be completely vendor independent, as that's the only honor preserving and technically accurate way of doing these things. The webinar will run about 60 minutes, with 40 minutes of speechifying and 20 minutes for questions. The hashtag for the event on Twitter will be SQLNoSQL . I'll be monitoring that hashtag if you have any suggestions for the webinar or if you would like to ask questions during the webinar. The motivation for me to do the webinar was a talk I had with another audience member at the NoSQL Evening in Palo Alto . He said he came from a Java background and was confused ab
Introduction: It's time to do something a little different and for me that doesn't mean cutting off my hair and joining a monastery, nor does it mean buying a cherry red convertible (yet), it means doing a webinar! On December 14th, 2:00 PM - 3:00 PM EST, I'll be hosting What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications . The webinar is sponsored by VoltDB, but it will be completely vendor independent, as that's the only honor preserving and technically accurate way of doing these things. The webinar will run about 60 minutes, with 40 minutes of speechifying and 20 minutes for questions. The hashtag for the event on Twitter will be SQLNoSQL . I'll be monitoring that hashtag if you have any suggestions for the webinar or if you would like to ask questions during the webinar. The motivation for me to do the webinar was a talk I had with another audience member at the NoSQL Evening in Palo Alto . He said he came from a Java background and was confused ab
4 0.8792671 1587 high scalability-2014-01-29-10 Things Bitly Should Have Monitored
Introduction: Monitor, monitor, monitor. That's the advice every startup gives once they reach a certain size. But can you ever monitor enough? If you are Bitly and everyone will complain when you are down, probably not. Here are 10 Things We Forgot to Monitor from Bitly, along with good stories and copious amounts of code snippets. Well worth reading, especially after you've already started monitoring the lower hanging fruit. An interesting revelation from the article is that: We run bitly split across two data centers, one is a managed environment with DELL hardware, and the second is Amazon EC2. Fork Rate . A strange configuration issue caused processes to be created at a rate of several hundred a second rather than the expected 1-10/second. Flow control packets . A network configuration that honors flow control packets and isn’t configured to disable them, can temporarily cause dropped traffic. Swap In/Out Rate . Measure the right thing. It's the rate memory is swapped
same-blog 5 0.87094474 795 high scalability-2010-03-16-1 Billion Reasons Why Adobe Chose HBase
Introduction: Cosmin Lehene wrote two excellent articles on Adobe's experiences with HBase: Why we’re using HBase: Part 1 and Why we’re using HBase: Part 2 . Adobe needed a generic, real-time, structured data storage and processing system that could handle any data volume, with access times under 50ms, with no downtime and no data loss . The article goes into great detail about their experiences with HBase and their evaluation process, providing a "well reasoned impartial use case from a commercial user". It talks about failure handling, availability, write performance, read performance, random reads, sequential scans, and consistency. One of the knocks against HBase has been it's complexity, as it has many parts that need installation and configuration. All is not lost according to the Adobe team: HBase is more complex than other systems (you need Hadoop, Zookeeper, cluster machines have multiple roles). We believe that for HBase, this is not accidental complexity and that the argu
6 0.85046476 217 high scalability-2008-01-17-Load Balancing of web server traffic
7 0.84504205 1175 high scalability-2012-01-17-Paper: Feeding Frenzy: Selectively Materializing Users’ Event Feeds
8 0.83685952 1196 high scalability-2012-02-20-Berkeley DB Architecture - NoSQL Before NoSQL was Cool
9 0.82122618 980 high scalability-2011-01-28-Stuff The Internet Says On Scalability For January 28, 2011
10 0.81973439 33 high scalability-2007-07-26-ThemBid Architecture
11 0.79674029 1642 high scalability-2014-05-02-Stuff The Internet Says On Scalability For May 2nd, 2014
12 0.78965819 471 high scalability-2008-12-19-Gigaspaces curbs latency outliers with Java Real Time
13 0.77936643 709 high scalability-2009-09-19-Space Based Programming in .NET
14 0.77465892 181 high scalability-2007-12-11-Hosting and CDN for startup video sharing site
15 0.76964957 986 high scalability-2011-02-10-Database Isolation Levels And Their Effects on Performance and Scalability
16 0.76521957 1313 high scalability-2012-08-28-Making Hadoop Run Faster
17 0.76442689 1337 high scalability-2012-10-10-Antirez: You Need to Think in Terms of Organizing Your Data for Fetching
18 0.76196533 284 high scalability-2008-03-19-RAD Lab is Creating a Datacenter Operating System
19 0.759278 1183 high scalability-2012-01-30-37signals Still Happily Scaling on Moore RAM and SSDs
20 0.75785011 1291 high scalability-2012-07-25-Vertical Scaling Ascendant - How are SSDs Changing Architectures?