high_scalability high_scalability-2010 high_scalability-2010-795 knowledge-graph by maker-knowledge-mining

795 high scalability-2010-03-16-1 Billion Reasons Why Adobe Chose HBase


meta infos for this blog

Source: html

Introduction: Cosmin Lehene wrote two excellent articles on Adobe's experiences with HBase: Why we’re using HBase: Part 1 and Why we’re using HBase: Part 2 . Adobe needed a generic, real-time, structured data storage and processing system that could handle any data volume, with access times under 50ms, with no downtime and no data loss . The article goes into great detail about their experiences with HBase and their evaluation process, providing a "well reasoned impartial use case from a commercial user". It talks about failure handling, availability, write performance, read performance, random reads, sequential scans, and consistency.  One of the knocks against HBase has been it's complexity, as it has many parts that need installation and configuration. All is not lost according to the Adobe team: HBase is more complex than other systems (you need Hadoop, Zookeeper, cluster machines have multiple roles). We believe that for HBase, this is not accidental complexity and that the argu


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Cosmin Lehene wrote two excellent articles on Adobe's experiences with HBase: Why we’re using HBase: Part 1 and Why we’re using HBase: Part 2 . [sent-1, score-0.23]

2 Adobe needed a generic, real-time, structured data storage and processing system that could handle any data volume, with access times under 50ms, with no downtime and no data loss . [sent-2, score-0.226]

3 The article goes into great detail about their experiences with HBase and their evaluation process, providing a "well reasoned impartial use case from a commercial user". [sent-3, score-0.707]

4 It talks about failure handling, availability, write performance, read performance, random reads, sequential scans, and consistency. [sent-4, score-0.197]

5 One of the knocks against HBase has been it's complexity, as it has many parts that need installation and configuration. [sent-5, score-0.219]

6 All is not lost according to the Adobe team: HBase is more complex than other systems (you need Hadoop, Zookeeper, cluster machines have multiple roles). [sent-6, score-0.329]

7 We believe that for HBase, this is not accidental complexity and that the argument that “HBase is not a good choice because it is complex” is irrelevant. [sent-7, score-0.345]

8 Relying on decoupled components plays nice with the Unix philosophy: do one thing and do it well. [sent-9, score-0.206]

9 Distributed storage is delegated to HDFS, so is distributed processing, cluster state goes to Zookeeper. [sent-10, score-0.327]

10 All these systems are developed and tested separately, and are good at what they do. [sent-11, score-0.075]

11 More than that, this allows you to scale your cluster on separate vectors. [sent-12, score-0.188]

12 This is not optimal, but it  allows for incremental investment in either spindles, CPU or RAM. [sent-13, score-0.157]

13 Highly recommended, especially if you need some sort of balance to the recent gush of Cassandra articles. [sent-15, score-0.125]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('hbase', 0.573), ('adobe', 0.373), ('accidental', 0.153), ('cosmin', 0.153), ('impartial', 0.153), ('knocks', 0.137), ('spindles', 0.128), ('outweigh', 0.128), ('delegated', 0.128), ('reasoned', 0.128), ('scans', 0.121), ('decoupled', 0.121), ('experiences', 0.119), ('articles', 0.111), ('cluster', 0.11), ('complexity', 0.108), ('separately', 0.103), ('recommended', 0.097), ('unix', 0.094), ('philosophy', 0.092), ('roles', 0.091), ('relying', 0.091), ('hdfs', 0.089), ('goes', 0.089), ('zookeeper', 0.086), ('plays', 0.085), ('argument', 0.084), ('installation', 0.082), ('commercial', 0.081), ('downtime', 0.079), ('incremental', 0.079), ('processing', 0.078), ('allows', 0.078), ('sequential', 0.077), ('tested', 0.075), ('complex', 0.074), ('according', 0.073), ('lost', 0.072), ('wrote', 0.072), ('evaluation', 0.07), ('structured', 0.069), ('volume', 0.067), ('detail', 0.067), ('advantages', 0.066), ('balance', 0.065), ('part', 0.064), ('optimal', 0.061), ('random', 0.061), ('recent', 0.06), ('talks', 0.059)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 795 high scalability-2010-03-16-1 Billion Reasons Why Adobe Chose HBase

Introduction: Cosmin Lehene wrote two excellent articles on Adobe's experiences with HBase: Why we’re using HBase: Part 1 and Why we’re using HBase: Part 2 . Adobe needed a generic, real-time, structured data storage and processing system that could handle any data volume, with access times under 50ms, with no downtime and no data loss . The article goes into great detail about their experiences with HBase and their evaluation process, providing a "well reasoned impartial use case from a commercial user". It talks about failure handling, availability, write performance, read performance, random reads, sequential scans, and consistency.  One of the knocks against HBase has been it's complexity, as it has many parts that need installation and configuration. All is not lost according to the Adobe team: HBase is more complex than other systems (you need Hadoop, Zookeeper, cluster machines have multiple roles). We believe that for HBase, this is not accidental complexity and that the argu

2 0.38925591 943 high scalability-2010-11-16-Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month

Introduction: You may have read somewhere that Facebook has introduced a new Social Inbox  integrating email, IM, SMS,  text messages, on-site Facebook messages. All-in-all they need to store over 135 billion messages a month. Where do they store all that stuff? Facebook's Kannan Muthukkaruppan gives the surprise answer in The Underlying Technology of Messages : HBase . HBase beat out MySQL, Cassandra, and a few others. Why a surprise? Facebook created Cassandra and it was purpose built for an inbox type application, but they found Cassandra's eventual consistency model wasn't a good match for their new real-time Messages product. Facebook also has an extensive MySQL infrastructure , but they found performance suffered as data set and indexes grew larger. And they could have built their own, but they chose HBase. HBase is a scaleout table store supporting very high rates of row-level updates over massive amounts of data . Exactly what is needed for a Messaging system. HBase is also a colu

3 0.28570682 1189 high scalability-2012-02-07-Hypertable Routs HBase in Performance Test -- HBase Overwhelmed by Garbage Collection

Introduction: This is a guest post by Doug Judd , original creator of Hypertable and the CEO of Hypertable, Inc. Hypertable delivers 2X better throughput in most tests -- HBase fails 41 and 167 billion record insert tests, overwhelmed by garbage collection -- Both systems deliver similar results for random read uniform test We recently conducted a test comparing the performance of Hypertable ( @hypertable ) version 0.9.5.5 to that of HBase ( @HBase ) version 0.90.4 (CDH3u2) running Zookeeper 3.3.4.  In this post, we summarize the results and offer explanations for the discrepancies. For the full test report, see Hypertable vs. HBase II . Introduction Hypertable and HBase are both open source, scalable databases modeled after Google's proprietary Bigtable database.  The primary difference between the two systems is that Hypertable is written in C++, while HBase is written in Java.  We modeled this test after the one described in section 7 of the Bigtable paper and tuned both systems fo

4 0.2078957 1109 high scalability-2011-09-02-Stuff The Internet Says On Scalability For September 2, 2011

Introduction: Scale the modern way / No brush / No lather / No rub-in / Big tube 35 cents - Drug stores / HighScalability: 8868 Tweets per second during VMAs ; Facebook: 250 million photos uploaded each day ; Earth: 7 Billion People Strong Potent quotables: @kevinweil : Wow, 8868 Tweets per second last night during the #VMAs. And that's just the writes -- imagine how many reads we were doing! @tristanbergh : #NoSQL isn't cool, it's a working kludge of existing architectures, bowing to the current tech limits, not transcending them @krishnan : I would love to switch the backend infra to Amazon anytime but our top 20 customers will not allow us  @ianozsvald : Learning about all the horrible things that happen when you don't plan (@socialtiesapp) for scalability. Trying to be creative now... After a particularly difficult Jeopardy match, Watson asked IBM to make him a new  cognitive chip  so he could conti

5 0.19230427 650 high scalability-2009-07-02-Product: Hbase

Introduction: Update 3: Presentation from the NoSQL Conference : slides , video . Update 2: Jim Wilson helps with the Understanding HBase and BigTable by explaining them from a "conceptual standpoint." Update: InfoQ interview: HBase Leads Discuss Hadoop, BigTable and Distributed Databases . "MapReduce (both Google's and Hadoop's) is ideal for processing huge amounts of data with sizes that would not fit in a traditional database. Neither is appropriate for transaction/single request processing." Hbase is the open source answer to BigTable, Google's highly scalable distributed database. It is built on top of Hadoop ( product ), which implements functionality similar to Google's GFS and Map/Reduce systems.  Both Google's GFS and Hadoop's HDFS provide a mechanism to reliably store large amounts of data. However, there is not really a mechanism for organizing the data and accessing only the parts that are of interest to a particular application. Bigtable (and Hbase) provide a means for

6 0.15722568 1000 high scalability-2011-03-08-Medialets Architecture - Defeating the Daunting Mobile Device Data Deluge

7 0.15258594 1008 high scalability-2011-03-22-Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day

8 0.14138062 1020 high scalability-2011-04-12-Caching and Processing 2TB Mozilla Crash Reports in memory with Hazelcast

9 0.13550764 1191 high scalability-2012-02-13-Tumblr Architecture - 15 Billion Page Views a Month and Harder to Scale than Twitter

10 0.13550764 1360 high scalability-2012-11-19-Gone Fishin': Tumblr Architecture - 15 Billion Page Views A Month And Harder To Scale Than Twitter

11 0.13399251 1461 high scalability-2013-05-20-The Tumblr Architecture Yahoo Bought for a Cool Billion Dollars

12 0.12839848 1297 high scalability-2012-08-03-Stuff The Internet Says On Scalability For August 3, 2012

13 0.12723155 1151 high scalability-2011-12-05-Stuff The Internet Says On Scalability For December 5, 2011

14 0.12708786 1586 high scalability-2014-01-28-How Next Big Sound Tracks Over a Trillion Song Plays, Likes, and More Using a Version Control System for Hadoop Data

15 0.11412726 1375 high scalability-2012-12-21-Stuff The Internet Says On Scalability For December 21, 2012

16 0.11258804 1262 high scalability-2012-06-11-Monday Fun: Seven Databases in Song

17 0.10077517 666 high scalability-2009-07-30-Learn How to Think at Scale

18 0.1002063 350 high scalability-2008-07-15-ZooKeeper - A Reliable, Scalable Distributed Coordination System

19 0.098288015 848 high scalability-2010-06-25-Hot Scalability Links for June 25, 2010

20 0.09669055 1499 high scalability-2013-08-09-Stuff The Internet Says On Scalability For August 9, 2013


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.124), (1, 0.079), (2, -0.004), (3, 0.042), (4, 0.031), (5, 0.081), (6, 0.02), (7, -0.013), (8, 0.072), (9, 0.052), (10, 0.043), (11, 0.05), (12, 0.067), (13, -0.084), (14, -0.029), (15, 0.06), (16, 0.001), (17, -0.081), (18, -0.088), (19, -0.041), (20, -0.011), (21, 0.054), (22, -0.031), (23, -0.042), (24, -0.026), (25, -0.041), (26, 0.083), (27, 0.017), (28, -0.046), (29, -0.014), (30, -0.002), (31, 0.101), (32, 0.087), (33, -0.057), (34, 0.015), (35, 0.077), (36, 0.002), (37, 0.049), (38, 0.006), (39, -0.012), (40, 0.016), (41, 0.025), (42, 0.031), (43, 0.054), (44, -0.015), (45, 0.058), (46, 0.003), (47, 0.015), (48, 0.019), (49, -0.021)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94741893 795 high scalability-2010-03-16-1 Billion Reasons Why Adobe Chose HBase

Introduction: Cosmin Lehene wrote two excellent articles on Adobe's experiences with HBase: Why we’re using HBase: Part 1 and Why we’re using HBase: Part 2 . Adobe needed a generic, real-time, structured data storage and processing system that could handle any data volume, with access times under 50ms, with no downtime and no data loss . The article goes into great detail about their experiences with HBase and their evaluation process, providing a "well reasoned impartial use case from a commercial user". It talks about failure handling, availability, write performance, read performance, random reads, sequential scans, and consistency.  One of the knocks against HBase has been it's complexity, as it has many parts that need installation and configuration. All is not lost according to the Adobe team: HBase is more complex than other systems (you need Hadoop, Zookeeper, cluster machines have multiple roles). We believe that for HBase, this is not accidental complexity and that the argu

2 0.75280648 943 high scalability-2010-11-16-Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month

Introduction: You may have read somewhere that Facebook has introduced a new Social Inbox  integrating email, IM, SMS,  text messages, on-site Facebook messages. All-in-all they need to store over 135 billion messages a month. Where do they store all that stuff? Facebook's Kannan Muthukkaruppan gives the surprise answer in The Underlying Technology of Messages : HBase . HBase beat out MySQL, Cassandra, and a few others. Why a surprise? Facebook created Cassandra and it was purpose built for an inbox type application, but they found Cassandra's eventual consistency model wasn't a good match for their new real-time Messages product. Facebook also has an extensive MySQL infrastructure , but they found performance suffered as data set and indexes grew larger. And they could have built their own, but they chose HBase. HBase is a scaleout table store supporting very high rates of row-level updates over massive amounts of data . Exactly what is needed for a Messaging system. HBase is also a colu

3 0.73672104 647 high scalability-2009-07-02-Hypertable is a New BigTable Clone that Runs on HDFS or KFS

Introduction: Update 3 : Presentation from the NoSQL conference : slides , video 1 , video 2 . Update 2 : The folks at Hypertable would like you to know that Hypertable is now officially sponsored by Baidu , China’s Leading Search Engine. As a sponsor of Hypertable, Baidu has committed an industrious team of engineers, numerous servers, and support resources to improve the quality and development of the open source technology. Update : InfoQ interview on Hypertable Lead Discusses Hadoop and Distributed Databases . Hypertable differs from HBase in that it is a higher performance implementation of Bigtable. Skrentablog gives the heads up on Hypertable , Zvents' open-source BigTable clone. It's written in C++ and can run on top of either HDFS or KFS. Performance looks encouraging at 28M rows of data inserted at a per-node write rate of 7mb/sec .

4 0.7017957 1189 high scalability-2012-02-07-Hypertable Routs HBase in Performance Test -- HBase Overwhelmed by Garbage Collection

Introduction: This is a guest post by Doug Judd , original creator of Hypertable and the CEO of Hypertable, Inc. Hypertable delivers 2X better throughput in most tests -- HBase fails 41 and 167 billion record insert tests, overwhelmed by garbage collection -- Both systems deliver similar results for random read uniform test We recently conducted a test comparing the performance of Hypertable ( @hypertable ) version 0.9.5.5 to that of HBase ( @HBase ) version 0.90.4 (CDH3u2) running Zookeeper 3.3.4.  In this post, we summarize the results and offer explanations for the discrepancies. For the full test report, see Hypertable vs. HBase II . Introduction Hypertable and HBase are both open source, scalable databases modeled after Google's proprietary Bigtable database.  The primary difference between the two systems is that Hypertable is written in C++, while HBase is written in Java.  We modeled this test after the one described in section 7 of the Bigtable paper and tuned both systems fo

5 0.63847792 1020 high scalability-2011-04-12-Caching and Processing 2TB Mozilla Crash Reports in memory with Hazelcast

Introduction: Mozilla processes TB's of Firefox crash reports daily using HBase, Hadoop, Python and Thrift protocol. The project is called Socorro , a system for collecting, processing, and displaying crash reports from clients. Today the Socorro application stores about 2.6 million crash reports per day. During peak traffic, it receives about 2.5K crashes per minute.  In this article we are going to demonstrate a proof of concept showing how Mozilla could integrate Hazelcast into Socorro and achieve caching and processing 2TB of crash reports with 50 node Hazelcast cluster. The video for the demo is available here .   Currently, Socorro has pythonic collectors, processors, and middleware that communicate with HBase via the Thrift protocol. One of the biggest limitations of the current architecture is that it is very sensitive to latency or outages on the HBase side. If the collectors cannot store an item in HBase then they will store it on local disk and it will not be accessible to th

6 0.62873191 650 high scalability-2009-07-02-Product: Hbase

7 0.6163137 1242 high scalability-2012-05-09-Cell Architectures

8 0.61229724 649 high scalability-2009-07-02-Product: Facebook's Cassandra - A Massive Distributed Store

9 0.6004945 1109 high scalability-2011-09-02-Stuff The Internet Says On Scalability For September 2, 2011

10 0.5927397 1461 high scalability-2013-05-20-The Tumblr Architecture Yahoo Bought for a Cool Billion Dollars

11 0.5920608 1191 high scalability-2012-02-13-Tumblr Architecture - 15 Billion Page Views a Month and Harder to Scale than Twitter

12 0.5920608 1360 high scalability-2012-11-19-Gone Fishin': Tumblr Architecture - 15 Billion Page Views A Month And Harder To Scale Than Twitter

13 0.5869509 1297 high scalability-2012-08-03-Stuff The Internet Says On Scalability For August 3, 2012

14 0.58409852 1265 high scalability-2012-06-15-Stuff The Internet Says On Scalability For June 15, 2012

15 0.57687175 1042 high scalability-2011-05-17-Facebook: An Example Canonical Architecture for Scaling Billions of Messages

16 0.57163972 254 high scalability-2008-02-19-Hadoop Getting Closer to 1.0 Release

17 0.56479001 601 high scalability-2009-05-17-Product: Hadoop

18 0.55913758 651 high scalability-2009-07-02-Product: Project Voldemort - A Distributed Database

19 0.55518615 1076 high scalability-2011-07-08-Stuff The Internet Says On Scalability For July 8, 2011

20 0.55241919 732 high scalability-2009-10-29-Digg - Looking to the Future with Cassandra


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.067), (2, 0.226), (10, 0.083), (30, 0.032), (61, 0.143), (73, 0.244), (79, 0.095)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.93538195 125 high scalability-2007-10-18-another approach to replication

Introduction: File replication based on erasure codes can reduce total replicas size 2 times and more.

2 0.9303962 945 high scalability-2010-11-18-Announcing My Webinar on December 14th: What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications

Introduction: It's time to do something a little different and for me that doesn't mean cutting off my hair and joining a monastery, nor does it mean buying a cherry red convertible (yet), it means doing a webinar! On December 14th, 2:00 PM - 3:00 PM EST, I'll be hosting  What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications . The webinar is sponsored by VoltDB, but it will be completely vendor independent, as that's the only honor preserving and technically accurate way of doing these things. The webinar will run about 60 minutes, with 40 minutes of speechifying and 20 minutes for questions. The hashtag for the event on Twitter will be SQLNoSQL . I'll be monitoring that hashtag if you have any suggestions for the webinar or if you would like to ask questions during the webinar.  The motivation for me to do the webinar was a talk I had with another audience member at the NoSQL Evening in Palo Alto . He said he came from a Java background and was confused ab

3 0.93039501 957 high scalability-2010-12-13-Still Time to Attend My Webinar Tomorrow: What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications

Introduction: It's time to do something a little different and for me that doesn't mean cutting off my hair and joining a monastery, nor does it mean buying a cherry red convertible (yet), it means doing a webinar! On December 14th, 2:00 PM - 3:00 PM EST, I'll be hosting  What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications . The webinar is sponsored by VoltDB, but it will be completely vendor independent, as that's the only honor preserving and technically accurate way of doing these things. The webinar will run about 60 minutes, with 40 minutes of speechifying and 20 minutes for questions. The hashtag for the event on Twitter will be SQLNoSQL . I'll be monitoring that hashtag if you have any suggestions for the webinar or if you would like to ask questions during the webinar.  The motivation for me to do the webinar was a talk I had with another audience member at the NoSQL Evening in Palo Alto . He said he came from a Java background and was confused ab

4 0.8792671 1587 high scalability-2014-01-29-10 Things Bitly Should Have Monitored

Introduction: Monitor, monitor, monitor. That's the advice every startup gives once they reach a certain size. But can you ever monitor enough? If you are Bitly and everyone will complain when you are down, probably not. Here are  10 Things We Forgot to Monitor  from Bitly, along with good stories and copious amounts of code snippets. Well worth reading, especially after you've already started monitoring the lower hanging fruit. An interesting revelation from the article is that: We run bitly split across two data centers, one is a managed environment with DELL hardware, and the second is Amazon EC2.   Fork Rate . A strange configuration issue caused processes to be created at a rate of several hundred a second rather than the expected 1-10/second.  Flow control packets .  A network configuration that honors flow control packets and isn’t configured to disable them, can temporarily cause dropped traffic. Swap In/Out Rate . Measure the right thing. It's the rate memory is swapped

same-blog 5 0.87094474 795 high scalability-2010-03-16-1 Billion Reasons Why Adobe Chose HBase

Introduction: Cosmin Lehene wrote two excellent articles on Adobe's experiences with HBase: Why we’re using HBase: Part 1 and Why we’re using HBase: Part 2 . Adobe needed a generic, real-time, structured data storage and processing system that could handle any data volume, with access times under 50ms, with no downtime and no data loss . The article goes into great detail about their experiences with HBase and their evaluation process, providing a "well reasoned impartial use case from a commercial user". It talks about failure handling, availability, write performance, read performance, random reads, sequential scans, and consistency.  One of the knocks against HBase has been it's complexity, as it has many parts that need installation and configuration. All is not lost according to the Adobe team: HBase is more complex than other systems (you need Hadoop, Zookeeper, cluster machines have multiple roles). We believe that for HBase, this is not accidental complexity and that the argu

6 0.85046476 217 high scalability-2008-01-17-Load Balancing of web server traffic

7 0.84504205 1175 high scalability-2012-01-17-Paper: Feeding Frenzy: Selectively Materializing Users’ Event Feeds

8 0.83685952 1196 high scalability-2012-02-20-Berkeley DB Architecture - NoSQL Before NoSQL was Cool

9 0.82122618 980 high scalability-2011-01-28-Stuff The Internet Says On Scalability For January 28, 2011

10 0.81973439 33 high scalability-2007-07-26-ThemBid Architecture

11 0.79674029 1642 high scalability-2014-05-02-Stuff The Internet Says On Scalability For May 2nd, 2014

12 0.78965819 471 high scalability-2008-12-19-Gigaspaces curbs latency outliers with Java Real Time

13 0.77936643 709 high scalability-2009-09-19-Space Based Programming in .NET

14 0.77465892 181 high scalability-2007-12-11-Hosting and CDN for startup video sharing site

15 0.76964957 986 high scalability-2011-02-10-Database Isolation Levels And Their Effects on Performance and Scalability

16 0.76521957 1313 high scalability-2012-08-28-Making Hadoop Run Faster

17 0.76442689 1337 high scalability-2012-10-10-Antirez: You Need to Think in Terms of Organizing Your Data for Fetching

18 0.76196533 284 high scalability-2008-03-19-RAD Lab is Creating a Datacenter Operating System

19 0.759278 1183 high scalability-2012-01-30-37signals Still Happily Scaling on Moore RAM and SSDs

20 0.75785011 1291 high scalability-2012-07-25-Vertical Scaling Ascendant - How are SSDs Changing Architectures?