high_scalability high_scalability-2010 high_scalability-2010-827 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Lots of good ones this week... Scalability, Availability & Stability Patterns . Jonas Boner has 197 slides covering a very wide range of scalability topics. One stop scalability shopping. Horizontal Scalability via Transient, Shardable, and Share-Nothing Resources . Heroku's Adam Wiggins shares what they've learned about scaling based on their experiences building a cloud platform and the hundreds of apps running on it. He describes the next generation architecture he thinks all software should follow in the future. Scalability of the Hadoop Distributed File System . Konstantin V. Shvachko writes a great post analyzing if the limitations imposed on a distributed file system by the single-node namespace server architecture can support 100,000 clients and petabytes of files. Cassandra by Example . Eric Evans created a nice Cassandra tutorial using building a Twitter clone as an example. Many people want to see more data modeling examples. Here you are. UpSizeR: Synthet
sentIndex sentText sentNum sentScore
1 Jonas Boner has 197 slides covering a very wide range of scalability topics. [sent-5, score-0.076]
2 Heroku's Adam Wiggins shares what they've learned about scaling based on their experiences building a cloud platform and the hundreds of apps running on it. [sent-8, score-0.183]
3 He describes the next generation architecture he thinks all software should follow in the future. [sent-9, score-0.131]
4 Shvachko writes a great post analyzing if the limitations imposed on a distributed file system by the single-node namespace server architecture can support 100,000 clients and petabytes of files. [sent-12, score-0.217]
5 Eric Evans created a nice Cassandra tutorial using building a Twitter clone as an example. [sent-14, score-0.177]
6 L earn from academic literature about how the mapreduce parallel model and hadoop implementation is used to solve algorithmic problems . [sent-27, score-0.623]
7 But personally, I don’t think Mongo’s ability to scale out has gotten enough attention. [sent-29, score-0.095]
8 Ayend Rahien - I believe that while you can shard a graph database, it place a very lot limit on the type of graph walking queries you can make . [sent-31, score-0.687]
9 David Shoemaker & Jamie Turner show how to make a scalable Flickr Killr using Python. [sent-33, score-0.089]
10 Using caching and optimization techniques to improve performance of the Ensembl website . [sent-34, score-0.121]
11 Solutions included optimization of the Apache web server, introduction of caching technologies and widespread implementation of AJAX code. [sent-35, score-0.419]
12 Today, we're going to show you how easy it is to use the Facebook Graph API to mash up data from Facebook with data in a locally hosted graph database! [sent-43, score-0.511]
wordName wordTfidf (topN-words)
[('graph', 0.292), ('academic', 0.191), ('agile', 0.14), ('hadoop', 0.139), ('vadim', 0.138), ('turner', 0.138), ('jamie', 0.138), ('jonas', 0.138), ('kristian', 0.138), ('describes', 0.131), ('mash', 0.13), ('potato', 0.13), ('wiggins', 0.13), ('widespread', 0.13), ('shardable', 0.124), ('evans', 0.124), ('rahien', 0.124), ('optimization', 0.121), ('assist', 0.119), ('clark', 0.119), ('facebook', 0.112), ('literature', 0.11), ('imposed', 0.11), ('namespace', 0.107), ('toadvertisea', 0.105), ('usfor', 0.105), ('walking', 0.103), ('scaling', 0.101), ('dns', 0.1), ('methodology', 0.1), ('gotten', 0.095), ('clone', 0.095), ('mongo', 0.095), ('algorithmic', 0.094), ('personally', 0.093), ('pleasecontact', 0.093), ('transient', 0.092), ('adam', 0.091), ('asset', 0.09), ('show', 0.089), ('implementation', 0.089), ('wan', 0.088), ('procedure', 0.084), ('eric', 0.083), ('tutorial', 0.083), ('building', 0.082), ('checks', 0.08), ('included', 0.079), ('api', 0.077), ('slides', 0.076)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 827 high scalability-2010-05-14-Hot Scalability Links for May 14, 2010
Introduction: Lots of good ones this week... Scalability, Availability & Stability Patterns . Jonas Boner has 197 slides covering a very wide range of scalability topics. One stop scalability shopping. Horizontal Scalability via Transient, Shardable, and Share-Nothing Resources . Heroku's Adam Wiggins shares what they've learned about scaling based on their experiences building a cloud platform and the hundreds of apps running on it. He describes the next generation architecture he thinks all software should follow in the future. Scalability of the Hadoop Distributed File System . Konstantin V. Shvachko writes a great post analyzing if the limitations imposed on a distributed file system by the single-node namespace server architecture can support 100,000 clients and petabytes of files. Cassandra by Example . Eric Evans created a nice Cassandra tutorial using building a Twitter clone as an example. Many people want to see more data modeling examples. Here you are. UpSizeR: Synthet
2 0.2113574 628 high scalability-2009-06-13-Neo4j - a Graph Database that Kicks Buttox
Introduction: Update: Social networks in the database: using a graph database . A nice post on representing, traversing, and performing other common social network operations using a graph database. If you are Digg or LinkedIn you can build your own speedy graph database to represent your complex social network relationships. For those of more modest means Neo4j , a graph database, is a good alternative. A graph is a collection nodes (things) and edges (relationships) that connect pairs of nodes. Slap properties (key-value pairs) on nodes and relationships and you have a surprisingly powerful way to represent most anything you can think of. In a graph database "relationships are first-class citizens. They connect two nodes and both nodes and relationships can hold an arbitrary amount of key-value pairs. So you can look at a graph database as a key-value store, with full support for relationships." A graph looks something like: For more lovely examples take a look at the Graph Image Gal
3 0.20322275 626 high scalability-2009-06-10-Paper: Graph Databases and the Future of Large-Scale Knowledge Management
Introduction: Relational databases, document databases, and distributed hash tables get most of the hype these days, but there's another option: graph databases. Back to the future it seems. Here's a really interesting paper by Marko A. Rodriguez introducing the graph model and it's extension to representing the world wide web of data. Modern day open source and commercial graph databases can store on the order of 1 billion relationships with some databases reaching the 10 billion mark. These developments are making the graph database practical for applications that require large-scale knowledge structures. Moreover, with the Web of Data standards set forth by the Linked Data community, it is possible to interlink graph databases across the web into a giant global knowledge structure. This talk will discuss graph databases, their underlying data model, their querying mechanisms, and the benefits of the graph data structure for modeling and analysis.
4 0.19904131 1088 high scalability-2011-07-27-Making Hadoop 1000x Faster for Graph Problems
Introduction: Dr. Daniel Abadi, author of the DBMS Musings blog and Cofounder of Hadapt , which offers a product improving Hadoop performance by 50x on relational data, is now taking his talents to graph data in Hadoop's tremendous inefficiency on graph data management (and how to avoid it) , which shares the secrets of getting Hadoop to perform 1000x better on graph data. TL;DR: Analysing graph data is at the heart of important data mining problems . Hadoop is the tool of choice for many of these problems. Hadoop style MapReduce works best on KeyValue processing, not graph processing, and can be well over a factor of 1000 less efficient than it needs to be. Hadoop inefficiency has consequences in real world. Inefficiencies on graph data problems like improving power utilization, minimizing carbon emissions, and improving product designs, leads to a lot value being left on the table in the form of negative environmental consequences, increased server costs, increased data center spa
Introduction: On the surface nothing appears more different than soft data and hard raw materials like iron. Then isn’t it ironic , in the Alanis Morissette sense, that in this Age of Information, great wealth still lies hidden deep beneath piles of stuff? It's so strange how directly digging for dollars in data parallels the great wealth producing models of the Industrial Revolution. The piles of stuff is the Internet. It takes lots of prospecting to find the right stuff. Mighty web crawling machines tirelessly collect stuff, bringing it into their huge maws, then depositing load after load into rack after rack of distributed file system machines. Then armies of still other machines take this stuff and strip out the valuable raw materials, which in the Information Age, are endless bytes of raw data. Link clicks, likes, page views, content, head lines, searches, inbound links, outbound links, search clicks, hashtags, friends, purchases: anything and everything you do on the Internet is a valu
6 0.17108238 621 high scalability-2009-06-06-Graph server
7 0.15916947 805 high scalability-2010-04-06-Strategy: Make it Really Fast vs Do the Work Up Front
8 0.15439363 1406 high scalability-2013-02-14-When all the Program's a Graph - Prismatic's Plumbing Library
9 0.14791004 819 high scalability-2010-04-30-Hot Scalability Links for April 30, 2010
10 0.13714923 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
11 0.13678445 797 high scalability-2010-03-19-Hot Scalability Links for March 19, 2010
12 0.13243799 806 high scalability-2010-04-08-Hot Scalability Links for April 8, 2010
13 0.11965884 1064 high scalability-2011-06-20-35+ Use Cases for Choosing Your Next NoSQL Database
14 0.11671118 1530 high scalability-2013-10-11-Stuff The Internet Says On Scalability For October 11th, 2013
15 0.113905 766 high scalability-2010-01-26-Product: HyperGraphDB - A Graph Database
16 0.11364112 601 high scalability-2009-05-17-Product: Hadoop
17 0.11353302 842 high scalability-2010-06-16-Hot Scalability Links for June 16, 2010
18 0.11340734 811 high scalability-2010-04-16-Hot Scalability Links for April 16, 2010
19 0.11325686 722 high scalability-2009-10-15-Hot Scalability Links for Oct 15 2009
topicId topicWeight
[(0, 0.221), (1, 0.07), (2, 0.018), (3, 0.031), (4, 0.063), (5, 0.085), (6, -0.083), (7, -0.029), (8, 0.071), (9, 0.142), (10, 0.075), (11, -0.019), (12, 0.005), (13, -0.096), (14, -0.047), (15, -0.032), (16, 0.024), (17, 0.155), (18, 0.016), (19, 0.107), (20, -0.081), (21, 0.048), (22, 0.012), (23, -0.07), (24, -0.022), (25, 0.102), (26, 0.034), (27, 0.022), (28, 0.068), (29, 0.019), (30, -0.04), (31, -0.015), (32, 0.029), (33, -0.031), (34, -0.024), (35, 0.064), (36, -0.021), (37, -0.046), (38, -0.021), (39, -0.034), (40, -0.024), (41, -0.005), (42, 0.018), (43, -0.043), (44, 0.002), (45, -0.003), (46, -0.028), (47, 0.067), (48, -0.015), (49, -0.033)]
simIndex simValue blogId blogTitle
same-blog 1 0.96170884 827 high scalability-2010-05-14-Hot Scalability Links for May 14, 2010
Introduction: Lots of good ones this week... Scalability, Availability & Stability Patterns . Jonas Boner has 197 slides covering a very wide range of scalability topics. One stop scalability shopping. Horizontal Scalability via Transient, Shardable, and Share-Nothing Resources . Heroku's Adam Wiggins shares what they've learned about scaling based on their experiences building a cloud platform and the hundreds of apps running on it. He describes the next generation architecture he thinks all software should follow in the future. Scalability of the Hadoop Distributed File System . Konstantin V. Shvachko writes a great post analyzing if the limitations imposed on a distributed file system by the single-node namespace server architecture can support 100,000 clients and petabytes of files. Cassandra by Example . Eric Evans created a nice Cassandra tutorial using building a Twitter clone as an example. Many people want to see more data modeling examples. Here you are. UpSizeR: Synthet
2 0.79945368 628 high scalability-2009-06-13-Neo4j - a Graph Database that Kicks Buttox
Introduction: Update: Social networks in the database: using a graph database . A nice post on representing, traversing, and performing other common social network operations using a graph database. If you are Digg or LinkedIn you can build your own speedy graph database to represent your complex social network relationships. For those of more modest means Neo4j , a graph database, is a good alternative. A graph is a collection nodes (things) and edges (relationships) that connect pairs of nodes. Slap properties (key-value pairs) on nodes and relationships and you have a surprisingly powerful way to represent most anything you can think of. In a graph database "relationships are first-class citizens. They connect two nodes and both nodes and relationships can hold an arbitrary amount of key-value pairs. So you can look at a graph database as a key-value store, with full support for relationships." A graph looks something like: For more lovely examples take a look at the Graph Image Gal
3 0.78754276 1406 high scalability-2013-02-14-When all the Program's a Graph - Prismatic's Plumbing Library
Introduction: At some point as a programmer you might have the insight/fear that all programming is just doing stuff to other stuff. Then you may observe after coding the same stuff over again that stuff in a program often takes the form of interacting patterns of flows. Then you may think hey, a program isn't only useful for coding datastructures, but a program is a kind of datastructure and that with a meta level jump you could program a program in terms of flows over data and flow over other flows. That's the kind of stuff Prismatic is making available in the Graph extension to their plumbing package ( code examples ), which is described in an excellent post: Graph: Abstractions for Structured Computation . You may remember Prismatic from previous profile we did on HighScalability: Prismatic Architecture - Using Machine Learning On Social Networks To Figure Out What You Should Read On The Web . We learned how Prismatic, an interest driven content suggestion service, builds programs in
4 0.78043985 626 high scalability-2009-06-10-Paper: Graph Databases and the Future of Large-Scale Knowledge Management
Introduction: Relational databases, document databases, and distributed hash tables get most of the hype these days, but there's another option: graph databases. Back to the future it seems. Here's a really interesting paper by Marko A. Rodriguez introducing the graph model and it's extension to representing the world wide web of data. Modern day open source and commercial graph databases can store on the order of 1 billion relationships with some databases reaching the 10 billion mark. These developments are making the graph database practical for applications that require large-scale knowledge structures. Moreover, with the Web of Data standards set forth by the Linked Data community, it is possible to interlink graph databases across the web into a giant global knowledge structure. This talk will discuss graph databases, their underlying data model, their querying mechanisms, and the benefits of the graph data structure for modeling and analysis.
5 0.77749717 1285 high scalability-2012-07-18-Disks Ain't Dead Yet: GraphChi - a disk-based large-scale graph computation
Introduction: GraphChi uses a Parallel Sliding Windows method which can: process a graph with mutable edge values efficiently from disk, with only a small number of non-sequential disk accesses, while supporting the asynchronous model of computation. The result is graphs with billions of edges can be processed on just a single machine. It uses a vertex-centric computation model similar to Pregel , which supports iterative algorithims as apposed to the batch style of MapReduce. Streaming graph updates are supported. About GraphChi, Carlos Guestrin, codirector of Carnegie Mellon's Select Lab, says : A Mac Mini running GraphChi can analyze Twitter's social graph from 2010—which contains 40 million users and 1.2 billion connections—in 59 minutes. "The previous published result on this problem took 400 minutes using a cluster of about 1,000 computers Related Articles Aapo Kyrola Home Page Your Laptop Can Now Analyze Big Data by JOHN PAVLUS Example Applications Runn
6 0.77583003 1088 high scalability-2011-07-27-Making Hadoop 1000x Faster for Graph Problems
8 0.75788283 631 high scalability-2009-06-15-Large-scale Graph Computing at Google
9 0.75289577 766 high scalability-2010-01-26-Product: HyperGraphDB - A Graph Database
10 0.75274897 1136 high scalability-2011-11-03-Paper: G2 : A Graph Processing System for Diagnosing Distributed Systems
11 0.75203812 805 high scalability-2010-04-06-Strategy: Make it Really Fast vs Do the Work Up Front
12 0.73743039 842 high scalability-2010-06-16-Hot Scalability Links for June 16, 2010
13 0.71912616 621 high scalability-2009-06-06-Graph server
14 0.69904494 155 high scalability-2007-11-15-Video: Dryad: A general-purpose distributed execution platform
15 0.69101095 722 high scalability-2009-10-15-Hot Scalability Links for Oct 15 2009
16 0.68063051 973 high scalability-2011-01-14-Stuff The Internet Says On Scalability For January 14, 2011
17 0.67056513 860 high scalability-2010-07-17-Hot Scalability Links for July 17, 2010
18 0.65288627 1530 high scalability-2013-10-11-Stuff The Internet Says On Scalability For October 11th, 2013
19 0.64963394 797 high scalability-2010-03-19-Hot Scalability Links for March 19, 2010
20 0.64192408 1385 high scalability-2013-01-11-Stuff The Internet Says On Scalability For January 11, 2013
topicId topicWeight
[(1, 0.153), (2, 0.209), (10, 0.016), (56, 0.025), (61, 0.07), (79, 0.052), (94, 0.403)]
simIndex simValue blogId blogTitle
1 0.99369287 115 high scalability-2007-10-07-Using ThreadLocal to pass context information around in web applications
Introduction: Hi, In java web servers, each http request is handled by a thread in thread pool. So for a Servlet handling the request, a thread is assigned. It is tempting (and very convinient) to keep context information in the threadlocal variable. I recently had a requirement where we need to assign logged in user id and timestamp to request sent to web services. Because we already had the code in place, it was extremely difficult to change the method signatures to pass user id everywhere. The solution I thought is class ReferenceIdGenerator { public static setReferenceId(String login) { threadLocal.set(login + System.currentMillis()); } public static String getReferenceId() { return threadLocal.get(); } private static ThreadLocal threadLocal = new ThreadLocal(); } class MySevlet { void service(.....) { HttpSession session = request.getSession(false); String userId = session.get("userId"); ReferenceIdGenerator.setRefernceId(userId
2 0.98731399 605 high scalability-2009-05-22-Distributed content system with bandwidth balancing
Introduction: I am looking for a way to distribute files over servers in different physical locations. My main concern is that I have bandwidth limitations on each location, and wish to spread the bandwidth load evenly. Atm. I just have 1:1 copies of the files on all servers, and have the application pick a random server to serve the file as a temp fix... It's a small video streaming service. I want to spoonfeed the stream to the client with a max bandwidth output, and support seek. At present I use php to limit the network stream, and read the file at a given offset sendt as a get parameter from the player for seek. It's psuedo streaming, but it works. I have been looking at MogileFS, which would solve the storage part. With MogileFS I can make use of my current php solution as it supports lighttpd and apache (with mod_rewrite or similar). However I don't see how I can apply MogileFS to check for bandwidth % usage? Any reccomendations for how I can solve this?
3 0.98582721 1601 high scalability-2014-02-25-Peter Norvig's 9 Master Steps to Improving a Program
Introduction: Inspired by a xkcd comic , Peter Norvig , Director of Research at Google and all around interesting and nice guy, has created an above par code kata involving a regex program that demonstrates the core inner loop of many successful systems profiled on HighScalability. The original code is at xkcd 1313: Regex Golf , which comes up with an algorithm to find a short regex that matches the winners and not the losers from two arbitrary lists. The Python code is readable, the process is TDDish, and the problem, which sounds simple, but soon explodes into regex weirdness, as does most regex code. If you find regular expressions confusing you'll definitely benefit from Peter's deliberate strategy for finding a regex. The post demonstrating the iterated improvement of the program is at xkcd 1313: Regex Golf (Part 2: Infinite Problems) . As with most first solutions it wasn't optimal. To improve the program Peter recommends the following steps: Profiling : Figure out wher
Introduction: The reference configurations described in this blueprint are starting points for building Sun Customer Ready HPC Clusters configured with Sun Fire X2100 M2 and X2200 M2 servers. The configurations define how Sun Systems Group products can be configured in a typical grid rack deployment. This document describes configurations in detail using Sun Fire X2100 M2 and X2200 M2 servers with a Gigabit Ethernet data fabric, as well as configurations using Sun Fire X2200 M2 servers with a high-speed InfiniBand fabric. These configurations focus on single rack solutions, with external connections through uplink ports of the switches. These reference configurations have been architected using Sun's expertise gained in actual, real-world installations. Within certain constraints, as described in the later sections, the system can be tailored to the customer needs. Certain system components described in this document are only available through Sun's factory integration. Although the information
5 0.96838605 1305 high scalability-2012-08-16-Paper: A Provably Correct Scalable Concurrent Skip List
Introduction: In MemSQL Architecture we learned one of the core strategies MemSQL uses to achieve their need for speed is lock-free skip lists. Skip lists are used to efficiently handle range queries. Making the skip-lists lock-free helps eliminate contention and make writes fast. If this all sounds a little pie-in-the-sky then here's a very good paper on the subject that might help make it clearer: A Provably Correct Scalable Concurrent Skip List . From the abstract: We propose a new concurrent skip list algorithm distinguished by a combination of simplicity and scalability. The algorithm employs optimistic synchronization, searching without acquiring locks, followed by short lock-based validation before adding or removing nodes. It also logically removes an item before physically unlinking it. Unlike some other concurrent skip list algorithms, this algorithm preserves the skiplist properties at all times, which facilitates reasoning about its correctness. Experimental evidence shows that
6 0.96293068 559 high scalability-2009-04-07-Six Lessons Learned Deploying a Large-scale Infrastructure in Amazon EC2
8 0.94650745 834 high scalability-2010-06-01-Web Speed Can Push You Off of Google Search Rankings! What Can You Do?
9 0.91784811 241 high scalability-2008-02-05-SLA monitoring
10 0.91569197 970 high scalability-2011-01-06-BankSimple Mini-Architecture - Using a Next Generation Toolchain
11 0.91456413 1025 high scalability-2011-04-16-The NewSQL Market Breakdown
12 0.91278899 1412 high scalability-2013-02-25-SongPop Scales to 1 Million Active Users on GAE, Showing PaaS is not Passé
same-blog 13 0.91023886 827 high scalability-2010-05-14-Hot Scalability Links for May 14, 2010
14 0.90835893 1084 high scalability-2011-07-22-Stuff The Internet Says On Scalability For July 22, 2011
15 0.89662826 78 high scalability-2007-09-01-2 tier switch selection for colocation
16 0.87680876 1223 high scalability-2012-04-06-Stuff The Internet Says On Scalability For April 6, 2012
17 0.86932957 1174 high scalability-2012-01-13-Stuff The Internet Says On Scalability For January 13, 2012
18 0.86587036 266 high scalability-2008-03-04-Manage Downtime Risk by Connecting Multiple Data Centers into a Secure Virtual LAN
19 0.8624683 1023 high scalability-2011-04-14-Strategy: Cache Application Start State to Reduce Spin-up Times
20 0.86211789 863 high scalability-2010-07-22-How can we spark the movement of research out of the Ivory Tower and into production?