high_scalability high_scalability-2009 high_scalability-2009-658 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: This article not about Mariah Carey, or its song. It's about Storing System, Database. First let's describe what means by odds: In my social network, I found 93% of the mainstream developers sanctify the database, or at least consider it in any data persistence challenge as the ultimate, superhero, and undefeatable solution. I think this problem come from the education, personally, and some companies also I think it's involved in this. To start to fix this bad thinking, we all should agree in the following points: Every challenge have its own solutions, so whatever you want to save/persistent, there are always many solutions. For example the Web search engines, such as: Google, Kngine, Yahoo, Bing don't use database at all instead we use Indexes (Index file) for better performance. The Database in general whatever the vendor it's slow compared with other solutions such as: Key-Value storing system, Index file, DHT. The Database currently employ Relation Data model
sentIndex sentText sentNum sentScore
1 First let's describe what means by odds: In my social network, I found 93% of the mainstream developers sanctify the database, or at least consider it in any data persistence challenge as the ultimate, superhero, and undefeatable solution. [sent-3, score-0.411]
2 To start to fix this bad thinking, we all should agree in the following points: Every challenge have its own solutions, so whatever you want to save/persistent, there are always many solutions. [sent-5, score-0.181]
3 For example the Web search engines, such as: Google, Kngine, Yahoo, Bing don't use database at all instead we use Indexes (Index file) for better performance. [sent-6, score-0.391]
4 The Database in general whatever the vendor it's slow compared with other solutions such as: Key-Value storing system, Index file, DHT. [sent-7, score-0.176]
5 The Database currently employ Relation Data model, or Object relational data model, so don't convince yourself to save non-relation data into relation data model store system such as: Database. [sent-8, score-0.753]
6 The Database system architecture didn't changed very much in last 30 years, and it's content a lot of limits, and fails in its performance, scalability character. [sent-9, score-0.24]
7 If you don't believe me check out this papers: The End of an Architectural Era (It's Time for a Complete Rewrite) Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks I hope if you agreed with me in the previous points. [sent-10, score-0.159]
8 In the other hand there many of scenarios should use Database, such as: Customer database, Address book, ERP, etc. [sent-13, score-0.174]
9 If you start agreed with me, you likely want ask: But what we can use beside or instead of Databases? [sent-15, score-0.408]
10 There are a lot of tools that fallowing CAP, BASE model, instead of ACID model. [sent-16, score-0.079]
11 In other hand CAP model is about: Consistency: Your data is correct all the time. [sent-24, score-0.307]
12 Availability: You can read and write and write your data all the time. [sent-26, score-0.281]
13 Partition Tolerance: If one or more nodes fails the system still works and becomes consistent when the system comes on-line. [sent-27, score-0.372]
14 Everyone who builds big applications builds them on CAP. [sent-30, score-0.226]
15 For example in any in-memory or in-disk caching system you will never need all the Database features. [sent-32, score-0.132]
16 Today there are a lot of: column oriented, and key-value oriented systems. [sent-34, score-0.388]
17 But first let's describe Column oriented: A column-oriented is a database management system (DBMS) which stores its content by column rather than by row. [sent-35, score-0.811]
18 This has advantages for databases such as data warehouses and library catalogues, where aggregates are computed over large numbers of similar data items. [sent-36, score-0.522]
19 This approach is contrasted with row-oriented databases and with correlation databases, which use a value-based storage structure. [sent-37, score-0.363]
20 Distributed key-value stores: Voldemort Dynomite Redis Distributed column stores (Bigtable-like systems): Cassandra Hbase Hypertable Something a little different: CouchDB Resource: Yahoo! [sent-39, score-0.358]
wordName wordTfidf (topN-words)
[('cap', 0.268), ('column', 0.238), ('acid', 0.232), ('kngine', 0.203), ('describe', 0.163), ('agreed', 0.159), ('database', 0.158), ('relation', 0.154), ('oriented', 0.15), ('yahoo', 0.145), ('system', 0.132), ('model', 0.121), ('stores', 0.12), ('builds', 0.113), ('fails', 0.108), ('contrasted', 0.108), ('erp', 0.108), ('correctdurability', 0.108), ('databaseisolation', 0.108), ('getthe', 0.108), ('nothingconsistency', 0.108), ('resisters', 0.108), ('odds', 0.102), ('superhero', 0.102), ('databases', 0.101), ('whatever', 0.1), ('url', 0.099), ('hand', 0.097), ('write', 0.096), ('dryad', 0.093), ('beside', 0.093), ('data', 0.089), ('serially', 0.088), ('warehouses', 0.084), ('bing', 0.084), ('aggregates', 0.082), ('index', 0.082), ('challenge', 0.081), ('pretend', 0.081), ('file', 0.08), ('convince', 0.079), ('atomicity', 0.079), ('instead', 0.079), ('mainstream', 0.078), ('correlation', 0.077), ('computed', 0.077), ('education', 0.077), ('use', 0.077), ('storing', 0.076), ('let', 0.075)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 658 high scalability-2009-07-17-Against all the odds
Introduction: This article not about Mariah Carey, or its song. It's about Storing System, Database. First let's describe what means by odds: In my social network, I found 93% of the mainstream developers sanctify the database, or at least consider it in any data persistence challenge as the ultimate, superhero, and undefeatable solution. I think this problem come from the education, personally, and some companies also I think it's involved in this. To start to fix this bad thinking, we all should agree in the following points: Every challenge have its own solutions, so whatever you want to save/persistent, there are always many solutions. For example the Web search engines, such as: Google, Kngine, Yahoo, Bing don't use database at all instead we use Indexes (Index file) for better performance. The Database in general whatever the vendor it's slow compared with other solutions such as: Key-Value storing system, Index file, DHT. The Database currently employ Relation Data model
2 0.38211784 589 high scalability-2009-05-05-Drop ACID and Think About Data
Introduction: The abstract for the talk given by Bob Ippolito, co-founder and CTO of Mochi Media, Inc: Building large systems on top of a traditional single-master RDBMS data storage layer is no longer good enough. This talk explores the landscape of new technologies available today to augment your data layer to improve performance and reliability. Is your application a good fit for caches, bloom filters, bitmap indexes, column stores, distributed key/value stores, or document databases? Learn how they work (in theory and practice) and decide for yourself. Bob does an excellent job highlighting different products and the key concepts to understand when pondering the wide variety of new database offerings. It's unlikely you'll be able to say oh, this is the database for me after watching the presentation, but you will be much better informed on your options. And I imagine slightly confused as to what to do :-) An interesting observation in the talk is that the more robust products are internal
3 0.17387931 890 high scalability-2010-09-01-Paper: The Case for Determinism in Database Systems
Introduction: Can you have your ACID cake and eat your distributed database too? Yes explains Daniel Abadi, Assistant Professor of Computer Science at Yale University, in an epic post, The problems with ACID, and how to fix them without going NoSQL , coauthored with Alexander Thomson , on their paper The Case for Determinism in Database Systems . We've already seen VoltDB offer the best of both worlds, this sounds like a completely different approach. The solution, they propose, is: ...an architecture and execution model that avoids deadlock, copes with failures without aborting transactions, and achieves high concurrency. The paper contains full details, but the basic idea is to use ordered locking coupled with optimistic lock location prediction, while exploiting deterministic systems' nice replication properties in the case of failures. The problem they are trying to solve is: In our opinion, the NoSQL decision to give up on ACID is the lazy solution to these scala
4 0.17307708 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
Introduction: It's a truism that we should choose the right tool for the job . Everyone says that. And who can disagree? The problem is this is not helpful advice without being able to answer more specific questions like: What jobs are the tools good at? Will they work on jobs like mine? Is it worth the risk to try something new when all my people know something else and we have a deadline to meet? How can I make all the tools work together? In the NoSQL space this kind of real-world data is still a bit vague. When asked, vendors tend to give very general answers like NoSQL is good for BigData or key-value access. What does that mean for for the developer in the trenches faced with the task of solving a specific problem and there are a dozen confusing choices and no obvious winner? Not a lot. It's often hard to take that next step and imagine how their specific problems could be solved in a way that's worth taking the trouble and risk. Let's change that. What problems are you using NoSQL to sol
5 0.15782312 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
Introduction: We are on the edge of two potent technological changes: Clouds and Memory Based Architectures. This evolution will rip open a chasm where new players can enter and prosper. Google is the master of disk. You can't beat them at a game they perfected. Disk based databases like SimpleDB and BigTable are complicated beasts, typical last gasp products of any aging technology before a change. The next era is the age of Memory and Cloud which will allow for new players to succeed. The tipping point will be soon. Let's take a short trip down web architecture lane: It's 1993: Yahoo runs on FreeBSD, Apache, Perl scripts and a SQL database It's 1995: Scale-up the database. It's 1998: LAMP It's 1999: Stateless + Load Balanced + Database + SAN It's 2001: In-memory data-grid. It's 2003: Add a caching layer. It's 2004: Add scale-out and partitioning. It's 2005: Add asynchronous job scheduling and maybe a distributed file system. It's 2007: Move it all into the cloud. It's 2008: C
6 0.15334269 1064 high scalability-2011-06-20-35+ Use Cases for Choosing Your Next NoSQL Database
7 0.15163195 630 high scalability-2009-06-14-kngine 'Knowledge Engine' milestone 2
9 0.1470719 950 high scalability-2010-11-30-NoCAP – Part III – GigaSpaces clustering explained..
10 0.14203614 96 high scalability-2007-09-18-Amazon Architecture
11 0.14015603 327 high scalability-2008-05-27-How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale
12 0.13836232 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
13 0.13426526 1008 high scalability-2011-03-22-Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day
14 0.1330016 259 high scalability-2008-02-25-Any Suggestions for the Architecture Template?
15 0.1330016 260 high scalability-2008-02-25-Architecture Template Advice Needed
16 0.13256411 746 high scalability-2009-11-26-Kngine Snippet Search New Indexing Technology
17 0.13254772 342 high scalability-2008-06-08-Search fast in million rows
18 0.12932983 787 high scalability-2010-03-03-Hot Scalability Links for March 3, 2010
19 0.12891707 448 high scalability-2008-11-22-Google Architecture
20 0.12345831 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
topicId topicWeight
[(0, 0.243), (1, 0.135), (2, -0.012), (3, 0.029), (4, 0.068), (5, 0.139), (6, -0.038), (7, -0.066), (8, 0.016), (9, 0.026), (10, 0.035), (11, 0.016), (12, -0.12), (13, -0.029), (14, 0.082), (15, -0.005), (16, -0.001), (17, -0.004), (18, 0.04), (19, -0.027), (20, 0.042), (21, 0.024), (22, 0.051), (23, 0.004), (24, -0.02), (25, -0.075), (26, 0.02), (27, 0.023), (28, -0.009), (29, 0.03), (30, -0.031), (31, 0.04), (32, 0.012), (33, 0.054), (34, 0.014), (35, -0.022), (36, -0.051), (37, 0.039), (38, 0.011), (39, -0.027), (40, 0.025), (41, -0.04), (42, 0.012), (43, 0.031), (44, -0.089), (45, 0.032), (46, 0.039), (47, 0.001), (48, 0.011), (49, 0.049)]
simIndex simValue blogId blogTitle
same-blog 1 0.9514879 658 high scalability-2009-07-17-Against all the odds
Introduction: This article not about Mariah Carey, or its song. It's about Storing System, Database. First let's describe what means by odds: In my social network, I found 93% of the mainstream developers sanctify the database, or at least consider it in any data persistence challenge as the ultimate, superhero, and undefeatable solution. I think this problem come from the education, personally, and some companies also I think it's involved in this. To start to fix this bad thinking, we all should agree in the following points: Every challenge have its own solutions, so whatever you want to save/persistent, there are always many solutions. For example the Web search engines, such as: Google, Kngine, Yahoo, Bing don't use database at all instead we use Indexes (Index file) for better performance. The Database in general whatever the vendor it's slow compared with other solutions such as: Key-Value storing system, Index file, DHT. The Database currently employ Relation Data model
2 0.85717916 589 high scalability-2009-05-05-Drop ACID and Think About Data
Introduction: The abstract for the talk given by Bob Ippolito, co-founder and CTO of Mochi Media, Inc: Building large systems on top of a traditional single-master RDBMS data storage layer is no longer good enough. This talk explores the landscape of new technologies available today to augment your data layer to improve performance and reliability. Is your application a good fit for caches, bloom filters, bitmap indexes, column stores, distributed key/value stores, or document databases? Learn how they work (in theory and practice) and decide for yourself. Bob does an excellent job highlighting different products and the key concepts to understand when pondering the wide variety of new database offerings. It's unlikely you'll be able to say oh, this is the database for me after watching the presentation, but you will be much better informed on your options. And I imagine slightly confused as to what to do :-) An interesting observation in the talk is that the more robust products are internal
3 0.80930728 676 high scalability-2009-08-08-Yahoo!'s PNUTS Database: Too Hot, Too Cold or Just Right?
Introduction: So far every massively scalable database is a bundle of compromises. For some the weak guarantees of Amazon's eventual consistency model are too cold. For many the strong guarantees of standard RDBMS distributed transactions are too hot. Google App Engine tries to get it just right with entity groups . Yahoo! is also trying to get is just right by offering per-record timeline consistency, which hopes to serve up a heaping bowl of rich database functionality and low latency at massive scale : We describe PNUTS [Platform for Nimble Universal Table Storage], a massively parallel and geographically distributed database system for Yahoo!’s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of con-current requests including updates and queries, and novel per-record consistency guarantees. It is a hosted, centrally managed, and geographically distributed service, and utilizes automated load-balancing and failover to redu
4 0.79758853 1092 high scalability-2011-08-04-Jim Starkey is Creating a Brave New World by Rethinking Databases for the Cloud
Introduction: Jim Starkey , founder of NuoDB , in this thread on the Cloud Computing group, delivers a masterful post on why he thinks the relational model is the best overall compromise amongst the different options, why NewSQL can free itself from the limitations of legacy SQL architectures, and how this creates a brave new lock free world.... I'll [Jim Starkey] go into more detail later in the post for those who care, but the executive summary goes like this: Network latency is relatively high and human attention span is relatively low. So human facing computer systems have to perform their work in a small number of trips between the client and the database server. But the human condition leads inexorably to data complexity. There are really only two strategies to manage this problem. One is to use coarse granularity storage, glombing together related data into a single blob and letting intelligence on the client make sense of it. The other is storing fine granularity data on the s
5 0.79620409 799 high scalability-2010-03-23-Digg: 4000% Performance Increase by Sorting in PHP Rather than MySQL
Introduction: O'Reilly Radar's James Turner conducted a very informative interview with Joe Stump, current CTO of SimpleGeo and former lead architect at Digg , in which Joe makes some of his usually insightful comments on his experience using Cassandra vs MySQL. As Digg started out with a MySQL oriented architecture and has recently been moving full speed to Cassandra, his observations on some of their lessons learned and the motivation for the move are especially valuable. Here are some of the key takeaways you find useful: Precompute on writes, make reads fast . This is an oldie as a scaling strategy, but it's valuable to see how SimpleGeo is applying it to their problem of finding entities within a certain geographical region. Using Cassandra they've built two clusters: one for indexes and one for records. The records cluster, as you might imagine, is a simple data lookup. The index cluster has a carefully constructed key for every lookup scenario. The indexes are computed on the wr
6 0.78162587 1062 high scalability-2011-06-15-101 Questions to Ask When Considering a NoSQL Database
7 0.7778095 733 high scalability-2009-10-29-Paper: No Relation: The Mixed Blessings of Non-Relational Databases
10 0.7688601 1017 high scalability-2011-04-06-Netflix: Run Consistency Checkers All the time to Fixup Transactions
11 0.76855618 1064 high scalability-2011-06-20-35+ Use Cases for Choosing Your Next NoSQL Database
12 0.76388353 1553 high scalability-2013-11-25-How To Make an Infinitely Scalable Relational Database Management System (RDBMS)
13 0.76362997 705 high scalability-2009-09-16-Paper: A practical scalable distributed B-tree
14 0.7623567 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
15 0.76105088 65 high scalability-2007-08-16-Scaling Secret #2: Denormalizing Your Way to Speed and Profit
16 0.75798941 784 high scalability-2010-02-25-Paper: High Performance Scalable Data Stores
17 0.74817729 741 high scalability-2009-11-16-Building Scalable Systems Using Data as a Composite Material
18 0.7471326 1327 high scalability-2012-09-21-Stuff The Internet Says On Scalability For September 21, 2012
19 0.74594069 1345 high scalability-2012-10-22-Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale
20 0.74096048 890 high scalability-2010-09-01-Paper: The Case for Determinism in Database Systems
topicId topicWeight
[(0, 0.122), (1, 0.137), (2, 0.257), (8, 0.016), (10, 0.026), (61, 0.121), (77, 0.023), (79, 0.155), (85, 0.049), (94, 0.015)]
simIndex simValue blogId blogTitle
1 0.96009225 1057 high scalability-2011-06-10-Stuff The Internet Says On Scalability For June 10, 2011
Introduction: Submitted for your scaling pleasure: Achievements: Every day, Amazon Web Services adds enough new capacity to support all of Amazon.com’s global infrastructure through the company’s first 5 years, when it was $2.7 billion annual revenue. From Cloud Computing Is Driving Infrastructure Innovation by James Hamilton. Where's is all that money be spent? Facilities, servers, power, and popcorn. Evernote hits 10 million users . StackExchange hits 1 million users . No lawsuits expected in either case. Neural waves of brain . The brain's waves drive computation, sort of, in a 5 million core, 9 Hz computer. Scaling, Scaling, Scaled: textPlus Turns Two, Hits 10 Billion Messages Sent Milestone Quotes of a quotable essence: robinduckett : FACT: You are not a web developer if you need third party services which handle scalability so you can "focus on the programming". Twitter’s Bain : Facebook May Have More Scale, We Have More Engagement she
2 0.95985132 1443 high scalability-2013-04-19-Stuff The Internet Says On Scalability For April 19, 2013
Introduction: Hey, it's HighScalability time: ( Ukrainian daredevil scaling buildings) Two Trillion Objects, 1.1 Million Requests / Second : S3; 1.4TB/s : Titan supercomputer has world’s fastest storage; four billion hours : Netflix streaming in last 3 months; $1.2B : Google's Q1 infrastructure spend Quotable Quotes: Google : We'll track EVERY task on EVERY data center server Stacey Higginbotham : All in all in the last five years the world has gained 54 Tbps of new capacity. @seveas : Scalability 103: Hardware sucks. Software sucks. Everything *will* break, prepare for failure of any component of your system. bloodredsun : The long and short of it is that Cassandra is a fantastic system for write heavy situations. What it is not good at are read heavy situations where deterministic low latency is required, which is pretty much what the pinterest guys were dealing with. @viktorklang : "The e-mail message could not be delivered because the user's
3 0.95797002 1534 high scalability-2013-10-18-Stuff The Internet Says On Scalability For October 18th, 2013
Introduction: Hey, it's HighScalability time: Test your sense of scale. Is this image of something microscopic or macroscopic? Find out . $3.5 million : Per Episode Cost of Breaking Bad Quotable Quotes: @GammaCounter : "There are 400 billion trees in the Amazon River basin, close to the number of stars in the Milky Way galaxy." @rbranson : Virtualization has near-zero overhead, unless the VM spends most of it's time copying between RAM and network… like memcached or haproxy. @HackerNewsOnion : Programming is 1% inspiration, 99% trying to get your environment working. @aneel : "roundtrips, not bandwidth, is now often the bottleneck for most applications" @jamesurquhart : Not to mention the fact that auto-scaling should happen above IaaS layer. Think multi-cloud. Sheref Mansy : A machine keeps sort of chugging away, without worrying about its environment. But a living system has to. V.D. Veksler : it just came to my attention that Javascri
same-blog 4 0.94911379 658 high scalability-2009-07-17-Against all the odds
Introduction: This article not about Mariah Carey, or its song. It's about Storing System, Database. First let's describe what means by odds: In my social network, I found 93% of the mainstream developers sanctify the database, or at least consider it in any data persistence challenge as the ultimate, superhero, and undefeatable solution. I think this problem come from the education, personally, and some companies also I think it's involved in this. To start to fix this bad thinking, we all should agree in the following points: Every challenge have its own solutions, so whatever you want to save/persistent, there are always many solutions. For example the Web search engines, such as: Google, Kngine, Yahoo, Bing don't use database at all instead we use Indexes (Index file) for better performance. The Database in general whatever the vendor it's slow compared with other solutions such as: Key-Value storing system, Index file, DHT. The Database currently employ Relation Data model
5 0.94764197 1037 high scalability-2011-05-10-Viddler Architecture - 7 Million Embeds a Day and 1500 Req-Sec Peak
Introduction: Viddler is in the high quality Video as a Service business for a customer who wants to pay a fixed cost, be done with it, and just have it work. Similar to Blip and Ooyala, more focussed on business than YouTube. They serve thousands of business customers, including high traffic websites like FailBlog, Engadget, and Gawker. Viddler is a good case to learn from because they are a small company trying to provide a challenging service in a crowded field. We are catching them just as they transitioning from a startup that began in one direction, as a YouTube competitor, and pivoted into a slightly larger company focussed on paying business customers. Transition is the key word for Viddler: transitioning from a free YouTube clone to a high quality paid service. Transitioning from a few colo sites that didn't work well to a new higher quality datacenter. Transitioning from an architecture that was typical of a startup to one that features redundancy, high availability, and automation. Tr
6 0.93668693 1020 high scalability-2011-04-12-Caching and Processing 2TB Mozilla Crash Reports in memory with Hazelcast
7 0.9364934 589 high scalability-2009-05-05-Drop ACID and Think About Data
8 0.93363297 1131 high scalability-2011-10-24-StackExchange Architecture Updates - Running Smoothly, Amazon 4x More Expensive
9 0.93331337 602 high scalability-2009-05-17-Scaling Django Web Apps by Mike Malone
10 0.93307251 687 high scalability-2009-08-24-How Google Serves Data from Multiple Datacenters
11 0.93295038 1259 high scalability-2012-06-07-3 Secrets to Lightning Fast Mobile Design at Instagram
12 0.93175656 1460 high scalability-2013-05-17-Stuff The Internet Says On Scalability For May 17, 2013
13 0.93156433 671 high scalability-2009-08-05-Stack Overflow Architecture
14 0.93090433 1600 high scalability-2014-02-21-Stuff The Internet Says On Scalability For February 21st, 2014
15 0.93033791 1428 high scalability-2013-03-22-Stuff The Internet Says On Scalability For March 22, 2013
16 0.93027824 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
17 0.93023676 672 high scalability-2009-08-06-An Unorthodox Approach to Database Design : The Coming of the Shard
19 0.92926902 1112 high scalability-2011-09-07-What Google App Engine Price Changes Say About the Future of Web Architecture
20 0.92903912 1177 high scalability-2012-01-19-Is it time to get rid of the Linux OS model in the cloud?