high_scalability high_scalability-2009 high_scalability-2009-631 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: To continue the graph theme Google has got into the act and released information on Pregel . Pregel does not appear to be a new type of potato chip. Pregel is instead a scalable infrastructure... ...to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges' states, and mutate the graph's topology. Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding. Pregel's applicability is harder to quantify, but so far we haven't come across a type of graph or a practical graph computing problem which is not solvable with Pregel. It computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing PageRank, for example, takes only about 15 lines of code. Developers
sentIndex sentText sentNum sentScore
1 To continue the graph theme Google has got into the act and released information on Pregel . [sent-1, score-0.433]
2 Pregel does not appear to be a new type of potato chip. [sent-2, score-0.34]
3 In Pregel, programs are expressed as a sequence of iterations. [sent-10, score-0.231]
4 In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges' states, and mutate the graph's topology. [sent-11, score-0.921]
5 Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding. [sent-12, score-0.444]
6 Pregel's applicability is harder to quantify, but so far we haven't come across a type of graph or a practical graph computing problem which is not solvable with Pregel. [sent-13, score-0.788]
7 It computes over large graphs much faster than alternatives, and the application programming interface is easy to use. [sent-14, score-0.229]
8 Implementing PageRank, for example, takes only about 15 lines of code. [sent-15, score-0.054]
9 Developers of dozens of Pregel applications within Google have found that "thinking like a vertex," which is the essence of programming in Pregel, is intuitive. [sent-16, score-0.226]
10 Pregel does not appear to be publicly available, so it's not clear what the purpose of the announcement could be. [sent-17, score-0.394]
wordName wordTfidf (topN-words)
[('pregel', 0.664), ('vertices', 0.335), ('vertex', 0.21), ('graph', 0.182), ('iteration', 0.166), ('edges', 0.165), ('appear', 0.141), ('potato', 0.122), ('applicability', 0.122), ('mutate', 0.116), ('computes', 0.112), ('solvable', 0.108), ('announcement', 0.1), ('pagerank', 0.098), ('expressed', 0.097), ('publicly', 0.095), ('outgoing', 0.095), ('messages', 0.095), ('quantify', 0.093), ('essence', 0.084), ('mine', 0.083), ('extension', 0.082), ('gmail', 0.079), ('sequence', 0.079), ('type', 0.077), ('alternatives', 0.077), ('dozens', 0.074), ('states', 0.073), ('independently', 0.072), ('theme', 0.072), ('modify', 0.069), ('programming', 0.068), ('harder', 0.062), ('act', 0.061), ('receive', 0.06), ('released', 0.06), ('continue', 0.058), ('purpose', 0.058), ('billions', 0.057), ('previous', 0.056), ('practical', 0.055), ('programs', 0.055), ('google', 0.055), ('implementing', 0.054), ('lines', 0.054), ('sent', 0.053), ('limit', 0.052), ('graphs', 0.049), ('wide', 0.047), ('range', 0.047)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 631 high scalability-2009-06-15-Large-scale Graph Computing at Google
Introduction: To continue the graph theme Google has got into the act and released information on Pregel . Pregel does not appear to be a new type of potato chip. Pregel is instead a scalable infrastructure... ...to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges' states, and mutate the graph's topology. Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding. Pregel's applicability is harder to quantify, but so far we haven't come across a type of graph or a practical graph computing problem which is not solvable with Pregel. It computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing PageRank, for example, takes only about 15 lines of code. Developers
2 0.14444059 1136 high scalability-2011-11-03-Paper: G2 : A Graph Processing System for Diagnosing Distributed Systems
Introduction: One of the problems in building distributed systems is figuring out what the heck is going on. Usually endless streams of log files are consulted like ancients using entrails to divine the will of the Gods. To rise above these ancient practices we must rise another level of abstraction and that's the approach described in a Microsoft research paper: G2: A Graph Processing System for Diagnosing Distributed Systems , which uses execution graphs that model runtime events and their correlations in distributed systems . The problem with these schemes is viewing applications, written by programmers in low level code, as execution graphs. But we're heading in this direction in any case. To program a warehouse or an internet sized computer we'll have to write at higher levels of abstraction so code can be executed transparently at runtime on these giant distributed computers. There are many advantages to this approach, fault diagnosis and performance monitoring are just one of the wins
3 0.14425801 860 high scalability-2010-07-17-Hot Scalability Links for July 17, 2010
Introduction: And by hot I also mean temperature. Summer has arrived. It's sizzling here in Silicon Valley. Thank you air conditioning! Scale the web by appointing a Crawler Czar? Tom Foremski has the idea that Google should open up their index so sites wouldn't have to endure the constant pounding by ravenous crawler bots. Don MacAskill of SmugMug estimates 50% of our web server CPU resources are spent serving crawlers. What a waste. How this would all work with real-time feeds, paid feeds (Twitter, movies, ...), etc. is unknown, but does it make sense for all that money to be spent on extracting the same data over and over again? Tweets of Gold: jamesurquhart : Key to applications is architecture. Key for infrastructure supporting archs is configurability. Configurability==features . tjake : People who choose their datastore based oh hearsay and not their own evaluation are doomed . b6n : No global lock ever goes unpunished. MichaelSurt
4 0.14207318 1198 high scalability-2012-02-24-Stuff The Internet Says On Scalability For February 24, 2012
Introduction: This is not your father's HighScalability: 13,000 times the world’s GDP : Cost of the Death Star Quotable quotes: @chrissalzman : Scalability is the enemy of right now. @resatsch : I like our IT team: "We used Redis before Youporn did it" @virtual_bill : Mixing flash and spinning disk to balance cost is like strapping a rocket to a turtle. @jaksprats : HDDs got slower at random access as they got bigger, cuz disk seeks stayed almost the same, similar phenomenon w/ Flash Priam, king of Troy, begat a daughter, Cassandra, and Netflix, king of true distributed Amazon infrastructure, begat a co-processor for Cassandra, Priam , used for Backup and recovery, Bootstrapping, Centralized configuration management, and RESTful monitoring and metrics. This is why Troy was never actually destroyed, it was simply backedup in-situ to another region. Evernote is everfaithful to SQL because SQL gives it all the ACID it needs to keep its billion Note
Introduction: On the surface nothing appears more different than soft data and hard raw materials like iron. Then isn’t it ironic , in the Alanis Morissette sense, that in this Age of Information, great wealth still lies hidden deep beneath piles of stuff? It's so strange how directly digging for dollars in data parallels the great wealth producing models of the Industrial Revolution. The piles of stuff is the Internet. It takes lots of prospecting to find the right stuff. Mighty web crawling machines tirelessly collect stuff, bringing it into their huge maws, then depositing load after load into rack after rack of distributed file system machines. Then armies of still other machines take this stuff and strip out the valuable raw materials, which in the Information Age, are endless bytes of raw data. Link clicks, likes, page views, content, head lines, searches, inbound links, outbound links, search clicks, hashtags, friends, purchases: anything and everything you do on the Internet is a valu
6 0.12904878 628 high scalability-2009-06-13-Neo4j - a Graph Database that Kicks Buttox
7 0.1280596 1195 high scalability-2012-02-17-Stuff The Internet Says On Scalability For February 17, 2012
8 0.12180328 626 high scalability-2009-06-10-Paper: Graph Databases and the Future of Large-Scale Knowledge Management
9 0.11737522 1406 high scalability-2013-02-14-When all the Program's a Graph - Prismatic's Plumbing Library
10 0.10689414 1163 high scalability-2011-12-23-Stuff The Internet Says On Scalability For December 23, 2011
11 0.10586238 621 high scalability-2009-06-06-Graph server
12 0.10211203 1530 high scalability-2013-10-11-Stuff The Internet Says On Scalability For October 11th, 2013
13 0.098471008 1285 high scalability-2012-07-18-Disks Ain't Dead Yet: GraphChi - a disk-based large-scale graph computation
14 0.093342878 155 high scalability-2007-11-15-Video: Dryad: A general-purpose distributed execution platform
15 0.092756845 1088 high scalability-2011-07-27-Making Hadoop 1000x Faster for Graph Problems
16 0.089421749 805 high scalability-2010-04-06-Strategy: Make it Really Fast vs Do the Work Up Front
17 0.084584087 802 high scalability-2010-04-01-Hot Scalability Links for April 1, 2010
18 0.082480237 827 high scalability-2010-05-14-Hot Scalability Links for May 14, 2010
19 0.075031884 936 high scalability-2010-11-09-Facebook Uses Non-Stored Procedures to Update Social Graphs
20 0.068424433 1607 high scalability-2014-03-07-Stuff The Internet Says On Scalability For March 7th, 2014
topicId topicWeight
[(0, 0.067), (1, 0.038), (2, 0.01), (3, 0.052), (4, 0.018), (5, 0.026), (6, -0.025), (7, 0.024), (8, 0.003), (9, 0.06), (10, 0.053), (11, 0.006), (12, -0.012), (13, -0.062), (14, 0.003), (15, -0.05), (16, -0.031), (17, 0.109), (18, 0.079), (19, 0.066), (20, -0.078), (21, -0.041), (22, -0.022), (23, -0.048), (24, -0.014), (25, 0.072), (26, 0.002), (27, 0.048), (28, 0.059), (29, -0.015), (30, -0.026), (31, -0.071), (32, 0.014), (33, -0.023), (34, -0.024), (35, 0.044), (36, 0.005), (37, -0.044), (38, 0.014), (39, 0.019), (40, 0.018), (41, 0.004), (42, 0.006), (43, -0.003), (44, -0.027), (45, -0.006), (46, -0.004), (47, 0.021), (48, 0.002), (49, -0.026)]
simIndex simValue blogId blogTitle
same-blog 1 0.98365158 631 high scalability-2009-06-15-Large-scale Graph Computing at Google
Introduction: To continue the graph theme Google has got into the act and released information on Pregel . Pregel does not appear to be a new type of potato chip. Pregel is instead a scalable infrastructure... ...to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges' states, and mutate the graph's topology. Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding. Pregel's applicability is harder to quantify, but so far we haven't come across a type of graph or a practical graph computing problem which is not solvable with Pregel. It computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing PageRank, for example, takes only about 15 lines of code. Developers
2 0.87260056 1406 high scalability-2013-02-14-When all the Program's a Graph - Prismatic's Plumbing Library
Introduction: At some point as a programmer you might have the insight/fear that all programming is just doing stuff to other stuff. Then you may observe after coding the same stuff over again that stuff in a program often takes the form of interacting patterns of flows. Then you may think hey, a program isn't only useful for coding datastructures, but a program is a kind of datastructure and that with a meta level jump you could program a program in terms of flows over data and flow over other flows. That's the kind of stuff Prismatic is making available in the Graph extension to their plumbing package ( code examples ), which is described in an excellent post: Graph: Abstractions for Structured Computation . You may remember Prismatic from previous profile we did on HighScalability: Prismatic Architecture - Using Machine Learning On Social Networks To Figure Out What You Should Read On The Web . We learned how Prismatic, an interest driven content suggestion service, builds programs in
3 0.86203152 1285 high scalability-2012-07-18-Disks Ain't Dead Yet: GraphChi - a disk-based large-scale graph computation
Introduction: GraphChi uses a Parallel Sliding Windows method which can: process a graph with mutable edge values efficiently from disk, with only a small number of non-sequential disk accesses, while supporting the asynchronous model of computation. The result is graphs with billions of edges can be processed on just a single machine. It uses a vertex-centric computation model similar to Pregel , which supports iterative algorithims as apposed to the batch style of MapReduce. Streaming graph updates are supported. About GraphChi, Carlos Guestrin, codirector of Carnegie Mellon's Select Lab, says : A Mac Mini running GraphChi can analyze Twitter's social graph from 2010—which contains 40 million users and 1.2 billion connections—in 59 minutes. "The previous published result on this problem took 400 minutes using a cluster of about 1,000 computers Related Articles Aapo Kyrola Home Page Your Laptop Can Now Analyze Big Data by JOHN PAVLUS Example Applications Runn
4 0.8362211 1136 high scalability-2011-11-03-Paper: G2 : A Graph Processing System for Diagnosing Distributed Systems
Introduction: One of the problems in building distributed systems is figuring out what the heck is going on. Usually endless streams of log files are consulted like ancients using entrails to divine the will of the Gods. To rise above these ancient practices we must rise another level of abstraction and that's the approach described in a Microsoft research paper: G2: A Graph Processing System for Diagnosing Distributed Systems , which uses execution graphs that model runtime events and their correlations in distributed systems . The problem with these schemes is viewing applications, written by programmers in low level code, as execution graphs. But we're heading in this direction in any case. To program a warehouse or an internet sized computer we'll have to write at higher levels of abstraction so code can be executed transparently at runtime on these giant distributed computers. There are many advantages to this approach, fault diagnosis and performance monitoring are just one of the wins
5 0.80083257 626 high scalability-2009-06-10-Paper: Graph Databases and the Future of Large-Scale Knowledge Management
Introduction: Relational databases, document databases, and distributed hash tables get most of the hype these days, but there's another option: graph databases. Back to the future it seems. Here's a really interesting paper by Marko A. Rodriguez introducing the graph model and it's extension to representing the world wide web of data. Modern day open source and commercial graph databases can store on the order of 1 billion relationships with some databases reaching the 10 billion mark. These developments are making the graph database practical for applications that require large-scale knowledge structures. Moreover, with the Web of Data standards set forth by the Linked Data community, it is possible to interlink graph databases across the web into a giant global knowledge structure. This talk will discuss graph databases, their underlying data model, their querying mechanisms, and the benefits of the graph data structure for modeling and analysis.
6 0.78795165 628 high scalability-2009-06-13-Neo4j - a Graph Database that Kicks Buttox
7 0.73355037 621 high scalability-2009-06-06-Graph server
8 0.72032398 155 high scalability-2007-11-15-Video: Dryad: A general-purpose distributed execution platform
9 0.71557325 766 high scalability-2010-01-26-Product: HyperGraphDB - A Graph Database
10 0.6913625 805 high scalability-2010-04-06-Strategy: Make it Really Fast vs Do the Work Up Front
11 0.66108203 801 high scalability-2010-03-30-Running Large Graph Algorithms - Evaluation of Current State-of-the-Art and Lessons Learned
12 0.6519618 827 high scalability-2010-05-14-Hot Scalability Links for May 14, 2010
13 0.61856943 1088 high scalability-2011-07-27-Making Hadoop 1000x Faster for Graph Problems
14 0.61069351 842 high scalability-2010-06-16-Hot Scalability Links for June 16, 2010
15 0.56978983 722 high scalability-2009-10-15-Hot Scalability Links for Oct 15 2009
16 0.56878144 1512 high scalability-2013-09-05-Paper: MillWheel: Fault-Tolerant Stream Processing at Internet Scale
17 0.53928035 973 high scalability-2011-01-14-Stuff The Internet Says On Scalability For January 14, 2011
18 0.53392279 1195 high scalability-2012-02-17-Stuff The Internet Says On Scalability For February 17, 2012
19 0.51782304 223 high scalability-2008-01-25-Google: Introduction to Distributed System Design
20 0.51386106 58 high scalability-2007-08-04-Product: Cacti
topicId topicWeight
[(1, 0.162), (2, 0.179), (17, 0.355), (30, 0.069), (61, 0.075), (94, 0.02)]
simIndex simValue blogId blogTitle
same-blog 1 0.91419363 631 high scalability-2009-06-15-Large-scale Graph Computing at Google
Introduction: To continue the graph theme Google has got into the act and released information on Pregel . Pregel does not appear to be a new type of potato chip. Pregel is instead a scalable infrastructure... ...to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges' states, and mutate the graph's topology. Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding. Pregel's applicability is harder to quantify, but so far we haven't come across a type of graph or a practical graph computing problem which is not solvable with Pregel. It computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing PageRank, for example, takes only about 15 lines of code. Developers
2 0.91063637 543 high scalability-2009-03-17-Sun to Announce Open Cloud APIs at CommunityOne
Introduction: One of the key items Sun will be talking about in today's cloud computing announcement (at 9AM EST/6AM PST) will be Sun's opening of the APIs that we'll use for the Sun Cloud. We're making these available so that those who are interested will be able to review and comment on these APIs. Continuing our commitment to openness, we're making these APIs available via the Creative Commons Version 3.0 license. ...
3 0.86941564 506 high scalability-2009-02-03-10 More Rules for Even Faster Websites
Introduction: Update: How-To Minimize Load Time for Fast User Experiences . Shows how to analyze the bottlenecks preventing websites and blogs from loading quickly and how to resolve them. 80-90% of the end-user response time is spent on the frontend, so it makes sense to concentrate efforts there before heroically rewriting the backend. Take a shower before buying a Porsche, if you know what I mean. Steve Souders, author of High Performance Websites and Yslow , has ten more best practices to speed up your website : Split the initial payload Load scripts without blocking Don’t scatter scripts Split dominant content domains Make static content cookie-free Reduce cookie weight Minify CSS Optimize images Use iframes sparingly To www or not to www Sadly, according to String Theory, there are only 26.7 rules left, so get them while they're still in our dimension. Here are slides on the first few rules. Love the speeding dog slide. That's exactly what my dog looks like trav
4 0.81370819 1467 high scalability-2013-05-30-Google Finds NUMA Up to 20% Slower for Gmail and Websearch
Introduction: When you have a large population of servers you have both the opportunity and the incentive to perform interesting studies. Authors from Google and the University of California in Optimizing Google’s Warehouse Scale Computers: The NUMA Experience conducted such a study, taking a look at how jobs run on clusters of machines using a NUMA architecture. Since NUMA is common on server class machines it's a topic of general interest for those looking to maximize machine utilization across clusters. Some of the results are surprising: The methodology of how to attribute such fine performance variations to NUMA effects within such a complex system is perhaps more interesting than the results themselves. Well worth reading just for that story. The performance swing due to NUMA is up to 15% on AMD Barcelona for Gmail backend and 20% on Intel Westmere for Web-search frontend. Memory locality is not always King. Because of the interaction between NUMA and cache sharing/contention it
5 0.77420932 1225 high scalability-2012-04-09-Why My Slime Mold is Better than Your Hadoop Cluster
Introduction: Update : Organism without a brain creates external memories for navigation shows slime mold is even cooler than originally thought, storing a record of where it's been using slime: The authors conclude, the slime isn't just the mold's calling card. Instead, it's a way of marking the environment so that the organism can sense where it's been, and not expend effort on searches that won't pay off. Although the situation isn't an exact parallel, the authors make a comparison to the pheromone trails used by ants. In After Life: The Strange Science Of Decay there’s a truly incredible sequence of gorgeously shot video showing how creeping slime mold solves mazes and performs other other amazing feats of computation. Take a look at what simple one celled organisms can do: The whole video is really well done and shockingly revelatory. It’s the story of decay, how atoms created during the Big Bang and through countless supernova explosions are continually rearranged an
6 0.76926935 956 high scalability-2010-12-08-How To Get Experience Working With Large Datasets
7 0.75094688 465 high scalability-2008-12-14-Scaling MySQL on a 256-way T5440 server using Solaris ZFS and Java 1.7
8 0.74760163 1393 high scalability-2013-01-24-NoSQL Parody: say No! No! and No!
9 0.73295462 869 high scalability-2010-07-30-Hot Scalability Links for July 30, 2010
10 0.70779455 199 high scalability-2008-01-01-S3 for image storing
11 0.69250733 1584 high scalability-2014-01-22-How would you build the next Internet? Loons, Drones, Copters, Satellites, or Something Else?
12 0.65592748 427 high scalability-2008-10-22-Server load balancing architectures, Part 2: Application-level load balancing
13 0.64224184 507 high scalability-2009-02-03-Paper: Optimistic Replication
14 0.63110775 877 high scalability-2010-08-12-Designing Web Applications for Scalability
15 0.63080001 1333 high scalability-2012-10-04-LinkedIn Moved from Rails to Node: 27 Servers Cut and Up to 20x Faster
16 0.62465447 467 high scalability-2008-12-16-[ANN] New Open Source Cache System
17 0.60604465 765 high scalability-2010-01-25-Let's Welcome our Neo-Feudal Overlords
18 0.59995335 623 high scalability-2009-06-10-Dealing with multi-partition transactions in a distributed KV solution
19 0.57885021 963 high scalability-2010-12-23-Paper: CRDTs: Consistency without concurrency control
20 0.57760406 611 high scalability-2009-05-31-Need help on Site loading & database optimization - URGENT