high_scalability high_scalability-2010 high_scalability-2010-805 knowledge-graph by maker-knowledge-mining

805 high scalability-2010-04-06-Strategy: Make it Really Fast vs Do the Work Up Front

meta infos for this blog

Source: html

Introduction: In Cool spatial algos with Neo4j: Part 1 - Routing with A* in Ruby Peter Neubauer not only does a fantastic job explaining a complicated routing algorithm using the graph database Neo4j , but he surfaces an interesting architectural conundrum: make it really fast so work can be done on the reads or do all the work on the writes so the reads are really fast . The money quote pointing out the competing options is: [Being] able to do these calculations in sub-second speeds on graphs of millions of roads and waypoints makes it possible in many cases to abandon the normal approach of precomputing indexes with K/V stores and be able to put routing into the critical path with the possibility to adapt to the live conditions and build highly personalized and dynamic spatial services. The poster boys for the precompute strategy is SimpleGeo , a startup that is building a "scaling infrastructure for geodata." Their strategy for handling geodata is to use Cassandra and bui

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The poster boys for the precompute strategy is SimpleGeo , a startup that is building a "scaling infrastructure for geodata. [sent-3, score-0.247]

2 " Their strategy for handling geodata is to use Cassandra and build two clusters: one for indexes and one for records. [sent-4, score-0.22]

3 The index cluster has a carefully constructed key for every lookup scenario. [sent-6, score-0.405]

4 The indexes are computed on the write, so reads are very fast. [sent-7, score-0.324]

5 What I think Peter is saying is because a graph database represents the problem in such a natural way and graph navigation is so fast, it becomes possible to run even large complex queries in real-time. [sent-10, score-0.979]

6 Before you answer, let's first ponder: is the graph database solution really solving the same problem as SimpleGeo is solving? [sent-13, score-0.604]

7 In this configuration the graph database is serving more like a specialized analytics database. [sent-15, score-0.477]

8 What SimpleGeo wanted is a system supporting very high write loads, be an operational no-brainer, be highly available, and perform well. [sent-16, score-0.187]

9 A graph database constrained to a single node simply can't compete in this space. [sent-19, score-0.548]

10 Using a lot of up front smarts they've created a system that can handle very high write loads and it performs amazingly well for reads. [sent-21, score-0.304]

11 Consider a system handling very high write loads while sequentially scanning very large tables to satisfy requests. [sent-25, score-0.293]

12 The backplane of that box will be saturated in a heartbeat. [sent-26, score-0.174]

13 SSD doesn't make large problems small and doesn't make complex queries over large datasets magically solvable either. [sent-27, score-0.272]

14 SimpleGeo could keep graph databases as their index nodes. [sent-29, score-0.544]

15 There's a lag between their data cluster and their index cluster anyway, so the index cluster could be a graph cluster. [sent-30, score-1.135]

16 Or perhaps they could keep a graph cluster in parallel for ad hoc queries. [sent-31, score-0.774]

17 What I'm not sure about is how a graph larger than RAM will perform or how it will perform in a replicated load balanced situation within the same datacenter and then across multiple datacenters. [sent-32, score-0.585]

18 Mapping all queries to a key-value lookup is a very robust approach. [sent-33, score-0.217]

19 Given all the hubbub it's interesting that a graph database and not a relational database turns out to be the speed demon. [sent-35, score-0.664]

20 It almost makes one think that in the same way there are optimal human body types for each sport (top level sprinters are muscular, higher proportion of fast twitch fibers, for example), there may be optimal data structures for solving specific problems. [sent-36, score-0.56]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('graph', 0.391), ('simplegeo', 0.291), ('index', 0.153), ('hoc', 0.148), ('cluster', 0.146), ('spatial', 0.146), ('reads', 0.133), ('loads', 0.13), ('ssd', 0.127), ('solving', 0.127), ('indexes', 0.119), ('routing', 0.117), ('queries', 0.111), ('lookup', 0.106), ('geodata', 0.101), ('conundrum', 0.101), ('fibers', 0.101), ('hubbub', 0.101), ('ponder', 0.101), ('surfaces', 0.101), ('twitch', 0.101), ('perform', 0.097), ('sport', 0.095), ('touse', 0.095), ('boys', 0.09), ('roads', 0.09), ('saturated', 0.09), ('write', 0.09), ('ad', 0.089), ('suggesting', 0.087), ('gis', 0.087), ('muscular', 0.087), ('database', 0.086), ('solvable', 0.084), ('backplane', 0.084), ('smarts', 0.084), ('geo', 0.084), ('articlesscaling', 0.082), ('optimal', 0.081), ('poster', 0.08), ('abandon', 0.08), ('pointing', 0.078), ('magically', 0.077), ('precompute', 0.077), ('stump', 0.077), ('proportion', 0.075), ('personalized', 0.073), ('sequentially', 0.073), ('computed', 0.072), ('constrained', 0.071)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 805 high scalability-2010-04-06-Strategy: Make it Really Fast vs Do the Work Up Front

2 0.29352927 628 high scalability-2009-06-13-Neo4j - a Graph Database that Kicks Buttox

Introduction: Update: Social networks in the database: using a graph database . A nice post on representing, traversing, and performing other common social network operations using a graph database. If you are Digg or LinkedIn you can build your own speedy graph database to represent your complex social network relationships. For those of more modest means Neo4j , a graph database, is a good alternative. A graph is a collection nodes (things) and edges (relationships) that connect pairs of nodes. Slap properties (key-value pairs) on nodes and relationships and you have a surprisingly powerful way to represent most anything you can think of. In a graph database "relationships are first-class citizens. They connect two nodes and both nodes and relationships can hold an arbitrary amount of key-value pairs. So you can look at a graph database as a key-value store, with full support for relationships." A graph looks something like: For more lovely examples take a look at the Graph Image Gal

3 0.24238093 799 high scalability-2010-03-23-Digg: 4000% Performance Increase by Sorting in PHP Rather than MySQL

Introduction: O'Reilly Radar's James Turner conducted a very informative interview with Joe Stump, current CTO of SimpleGeo and former lead architect at Digg , in which Joe makes some of his usually insightful comments on his experience using Cassandra vs MySQL. As Digg started out with a MySQL oriented architecture and has recently been moving full speed to Cassandra, his observations on some of their lessons learned and the motivation for the move are especially valuable. Here are some of the key takeaways you find useful: Precompute on writes, make reads fast . This is an oldie as a scaling strategy, but it's valuable to see how SimpleGeo is applying it to their problem of finding entities within a certain geographical region. Using Cassandra they've built two clusters: one for indexes and one for records. The records cluster, as you might imagine, is a simple data lookup. The index cluster has a carefully constructed key for every lookup scenario. The indexes are computed on the wr

4 0.24142624 801 high scalability-2010-03-30-Running Large Graph Algorithms - Evaluation of Current State-of-the-Art and Lessons Learned

Introduction: On the surface nothing appears more different than soft data and hard raw materials like iron. Then isn’t it ironic , in the Alanis Morissette sense, that in this Age of Information, great wealth still lies hidden deep beneath piles of stuff? It's so strange how directly digging for dollars in data parallels the great wealth producing models of the Industrial Revolution. The piles of stuff is the Internet. It takes lots of prospecting to find the right stuff. Mighty web crawling machines tirelessly collect stuff, bringing it into their huge maws, then depositing load after load into rack after rack of distributed file system machines. Then armies of still other machines take this stuff and strip out the valuable raw materials, which in the Information Age, are endless bytes of raw data. Link clicks, likes, page views, content, head lines, searches, inbound links, outbound links, search clicks, hashtags, friends, purchases: anything and everything you do on the Internet is a valu

5 0.24019064 626 high scalability-2009-06-10-Paper: Graph Databases and the Future of Large-Scale Knowledge Management

Introduction: Relational databases, document databases, and distributed hash tables get most of the hype these days, but there's another option: graph databases. Back to the future it seems. Here's a really interesting paper by Marko A. Rodriguez introducing the graph model and it's extension to representing the world wide web of data. Modern day open source and commercial graph databases can store on the order of 1 billion relationships with some databases reaching the 10 billion mark. These developments are making the graph database practical for applications that require large-scale knowledge structures. Moreover, with the Web of Data standards set forth by the Linked Data community, it is possible to interlink graph databases across the web into a giant global knowledge structure. This talk will discuss graph databases, their underlying data model, their querying mechanisms, and the benefits of the graph data structure for modeling and analysis.

6 0.21709955 621 high scalability-2009-06-06-Graph server

7 0.21629845 1406 high scalability-2013-02-14-When all the Program's a Graph - Prismatic's Plumbing Library

8 0.19296141 1088 high scalability-2011-07-27-Making Hadoop 1000x Faster for Graph Problems

9 0.17071907 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?

10 0.15916947 827 high scalability-2010-05-14-Hot Scalability Links for May 14, 2010

11 0.15079576 1064 high scalability-2011-06-20-35+ Use Cases for Choosing Your Next NoSQL Database

12 0.1495547 1136 high scalability-2011-11-03-Paper: G2 : A Graph Processing System for Diagnosing Distributed Systems

13 0.14862254 1285 high scalability-2012-07-18-Disks Ain't Dead Yet: GraphChi - a disk-based large-scale graph computation

14 0.14230844 766 high scalability-2010-01-26-Product: HyperGraphDB - A Graph Database

15 0.13685113 993 high scalability-2011-02-22-Is Node.js Becoming a Part of the Stack? SimpleGeo Says Yes.

16 0.13557501 64 high scalability-2007-08-10-How do we make a large real-time search engine?

17 0.12859856 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O

18 0.12459908 1293 high scalability-2012-07-30-Prismatic Architecture - Using Machine Learning on Social Networks to Figure Out What You Should Read on the Web

19 0.11975721 842 high scalability-2010-06-16-Hot Scalability Links for June 16, 2010

20 0.11851771 797 high scalability-2010-03-19-Hot Scalability Links for March 19, 2010

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.196), (1, 0.126), (2, -0.042), (3, 0.028), (4, 0.031), (5, 0.12), (6, -0.046), (7, -0.017), (8, 0.047), (9, 0.048), (10, 0.093), (11, 0.007), (12, -0.095), (13, -0.065), (14, 0.011), (15, -0.009), (16, -0.032), (17, 0.193), (18, 0.052), (19, 0.157), (20, -0.133), (21, -0.069), (22, -0.032), (23, -0.076), (24, -0.062), (25, 0.088), (26, -0.019), (27, 0.058), (28, 0.025), (29, -0.003), (30, -0.075), (31, -0.008), (32, -0.005), (33, -0.013), (34, -0.009), (35, 0.115), (36, -0.018), (37, -0.069), (38, -0.027), (39, -0.044), (40, -0.02), (41, -0.003), (42, 0.008), (43, 0.022), (44, -0.006), (45, 0.005), (46, 0.025), (47, -0.057), (48, 0.033), (49, -0.05)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96640152 805 high scalability-2010-04-06-Strategy: Make it Really Fast vs Do the Work Up Front

2 0.89827847 628 high scalability-2009-06-13-Neo4j - a Graph Database that Kicks Buttox

3 0.8859486 1406 high scalability-2013-02-14-When all the Program's a Graph - Prismatic's Plumbing Library

Introduction: At some point as a programmer you might have the insight/fear that all programming is just doing stuff to other stuff. Then you may observe after coding the same stuff over again that stuff in a program often takes the form of interacting patterns of flows. Then you may think hey, a program isn't only useful for coding datastructures, but a program is a kind of datastructure and that with a meta level jump you could program a program in terms of flows over data and flow over other flows. That's the kind of stuff Prismatic is making available in the Graph extension to their plumbing package ( code examples ), which is described in an excellent post: Graph: Abstractions for Structured Computation . You may remember Prismatic from previous profile we did on HighScalability: Prismatic Architecture - Using Machine Learning On Social Networks To Figure Out What You Should Read On The Web . We learned how Prismatic, an interest driven content suggestion service, builds programs in

4 0.88469076 1285 high scalability-2012-07-18-Disks Ain't Dead Yet: GraphChi - a disk-based large-scale graph computation

Introduction: GraphChi uses a Parallel Sliding Windows method which can: process a graph with mutable edge values efficiently from disk, with only a small number of non-sequential disk accesses, while supporting the asynchronous model of computation. The result is graphs with billions of edges can be processed on just a single machine. It uses a vertex-centric computation model similar to Pregel , which supports iterative algorithims as apposed to the batch style of MapReduce. Streaming graph updates are supported. About GraphChi, Carlos Guestrin, codirector of Carnegie Mellon's Select Lab, says : A Mac Mini running GraphChi can analyze Twitter's social graph from 2010—which contains 40 million users and 1.2 billion connections—in 59 minutes. "The previous published result on this problem took 400 minutes using a cluster of about 1,000 computers Related Articles Aapo Kyrola Home Page Your Laptop Can Now Analyze Big Data by JOHN PAVLUS Example Applications Runn

5 0.86981153 626 high scalability-2009-06-10-Paper: Graph Databases and the Future of Large-Scale Knowledge Management

6 0.86331463 766 high scalability-2010-01-26-Product: HyperGraphDB - A Graph Database

7 0.83200502 1136 high scalability-2011-11-03-Paper: G2 : A Graph Processing System for Diagnosing Distributed Systems

8 0.83009911 801 high scalability-2010-03-30-Running Large Graph Algorithms - Evaluation of Current State-of-the-Art and Lessons Learned

9 0.82191926 155 high scalability-2007-11-15-Video: Dryad: A general-purpose distributed execution platform

10 0.78318119 631 high scalability-2009-06-15-Large-scale Graph Computing at Google

11 0.74710643 827 high scalability-2010-05-14-Hot Scalability Links for May 14, 2010

12 0.74642938 1088 high scalability-2011-07-27-Making Hadoop 1000x Faster for Graph Problems

13 0.72969377 621 high scalability-2009-06-06-Graph server

14 0.68876612 842 high scalability-2010-06-16-Hot Scalability Links for June 16, 2010

15 0.67804837 722 high scalability-2009-10-15-Hot Scalability Links for Oct 15 2009

16 0.67468846 58 high scalability-2007-08-04-Product: Cacti

17 0.64708859 1530 high scalability-2013-10-11-Stuff The Internet Says On Scalability For October 11th, 2013

18 0.63869965 1283 high scalability-2012-07-13-Stuff The Internet Says On Scalability For July 13, 2012

19 0.63277292 1365 high scalability-2012-11-30-Stuff The Internet Says On Scalability For November 30, 2012

20 0.62773329 1263 high scalability-2012-06-13-Why My Soap Film is Better than Your Hadoop Cluster

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.187), (2, 0.235), (10, 0.029), (25, 0.159), (30, 0.011), (47, 0.027), (61, 0.102), (77, 0.026), (79, 0.078), (85, 0.049), (94, 0.024)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.94313884 992 high scalability-2011-02-18-Stuff The Internet Says On Scalability For February 18, 2011

Introduction: Submitted for your reading pleasure on this cold and rainy Friday... Quotable Quotes: CarryMillsap : You can't hardware yourself out of a performance problem you softwared yourself into. @juokaz : schema-less databases doesn't mean data should have no structure Scalability Porn: 3 Months To The First Million Users, Just 6 Weeks To The Second Million For Instagram S tudy by the USC Annenberg School for Communication & Journalism estimates: in 2007, humankind was able to store 2.9 × 1020 optimally compressed bytes, communicate almost 2 × 1021 bytes, and carry out 6.4 × 1018 instructions per second on general-purpose computers. Hadoop has hit a scalability limit at a whopping 4,000 machines and are looking to create the next generation architecture . Their target is clusters of 10,000 machines and 200,000 cores. The fundamental idea of the re-architecture is to divide the two major functions of the Job Tracker, resource management and job sc

2 0.93837768 346 high scalability-2008-06-28-ID generation schemes

Introduction: Hi, Generating unique ids is a common requirements in many projects. Generally, this responsibility is given to Database layer. By using sequences or some other technique. This is a problem for horizontal scalability. What are the Guid generation schemes used in high scalable web sites generally? I have seen use java's SecureRandom class to generate Guid. What are the other methods generally used? Thanks Unmesh

3 0.93178368 987 high scalability-2011-02-10-Dispelling the New SSL Myth

Introduction: Warning, this post is a bit vendor FUDy, but SSL is an important topic and it does bring up some issues worth arguing about. Hacker News has a good discussion of the article. Adam Langley started it all with his article Overclocking SSL and has made a rebuttal to the F5 article in Still not computationally expensive . My car is eight years old this year. It has less than 30,000 miles on it. Yes, you heard that right, less than 30,000 miles. I don’t drive my car very often because, well, my commute is a short trip down two flights of stairs. I don’t need to go very far when I do drive it’s only ten miles or so round trip to the grocery store. So from my perspective, gas isn’t really very expensive. I may use a tank of gas a month, which works out to … well, it’s really not even worth mentioning the cost. But for someone who commutes every day – especially someone who commutes a long-distance every day – gas is expensive. It’s a significant expense every month for them and th

4 0.93142456 253 high scalability-2008-02-19-Building a email communication system

Introduction: hi, the website i work for is looking to build a email system that can handle a fair few emails (up to a hundred thousand a day). These comprise emails like registration emails, newsletters, lots of user triggered emails and overnight emails. At present we queue them in SQL and feed them into an smtp server on one of our web servers when the queue drops below a certain level. this has caused our mail system to crash as well as hammer our DB server (shared!!!). We have got an architecture of what we want to build but thought there might be something we could buy off the shelf that allowed us to keep templated emails, lists of recipients, schedule sends etc and report on it. We can't find anything What do big websites like amazon etc use or people a little smaller but who still send loads of mail (flickr, ebuyer, or other ecommerce sites) Cheers tarqs

5 0.93086427 412 high scalability-2008-10-14-Sun N1 Grid Engine Software and the Tokyo Institute of Technology Super Computer Grid

Introduction: One of the world's leading technical institutes, the Tokyo Institute of Technology (Tokyo Tech) created the fastest supercomputer in Asia, and one of the largest outside of the United States. Using Sun x64 servers and data servers deployed in a grid architecture, Tokyo Tech built a cost-effective, flexible supercomputer that meets the demands of compute- and data-intensive applications. Built in just 35 days, the TSUBAME grid includes hundreds of systems incorporating thousands of processor cores and terabytes of memory, and delivers 47.38 trillion1 floating-point operations per second (TeraFLOPS) of sustained LINPACK benchmark performance and 1.1 petabyte of storage to users running common off-the-shelf applications. Based on the deployment architecture, the grid is expected to reach 100 TeraFLOPS in the future. This Sun BluePrints article provides an overview of the Tokyo Tech grid, named TSUBAME. The third in a series of Sun BluePrints articles on the TSUBAME grid, this document pro

same-blog 6 0.92649138 805 high scalability-2010-04-06-Strategy: Make it Really Fast vs Do the Work Up Front

7 0.91963565 203 high scalability-2008-01-07-How Ruby on Rails Survived a 550k Pageview Digging

8 0.91845727 419 high scalability-2008-10-15-The Tokyo Institute of Technology Supercomputer Grid: Architecture and Performance Overview

9 0.91683948 1607 high scalability-2014-03-07-Stuff The Internet Says On Scalability For March 7th, 2014

10 0.9122712 870 high scalability-2010-08-02-7 Scaling Strategies Facebook Used to Grow to 500 Million Users

11 0.9093048 1214 high scalability-2012-03-23-Stuff The Internet Says On Scalability For March 23, 2012

12 0.90286148 770 high scalability-2010-02-03-NoSQL Means Never Having to Store Blobs Again

13 0.90204591 671 high scalability-2009-08-05-Stack Overflow Architecture

14 0.89574921 808 high scalability-2010-04-12-Poppen.de Architecture

15 0.89498276 1087 high scalability-2011-07-26-Web 2.0 Killed the Middleware Star

16 0.89468729 696 high scalability-2009-09-07-Product: Infinispan - Open Source Data Grid

17 0.89384639 998 high scalability-2011-03-03-Stack Overflow Architecture Update - Now at 95 Million Page Views a Month

18 0.89368391 313 high scalability-2008-05-02-Friends for Sale Architecture - A 300 Million Page View-Month Facebook RoR App

19 0.89362341 1586 high scalability-2014-01-28-How Next Big Sound Tracks Over a Trillion Song Plays, Likes, and More Using a Version Control System for Hadoop Data

20 0.89300752 579 high scalability-2009-04-24-Heroku - Simultaneously Develop and Deploy Automatically Scalable Rails Applications in the Cloud