high_scalability high_scalability-2008 high_scalability-2008-223 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Update: Google added videos on Cluster Computing and MapReduce . There are five lectures: Introduction, MapReduce, Distributed File Systems, Clustering Algorithms, and Graph Algorithms . Advanced website design depends on deep distributed system design knowledge. Where do you get this knowledge? Try Google. They have a a whole Code for Educators program with tutorials and lectures on AJAX programming, distributed systems, and web security. Looks pretty nice.
sentIndex sentText sentNum sentScore
1 Update: Google added videos on Cluster Computing and MapReduce . [sent-1, score-0.302]
2 There are five lectures: Introduction, MapReduce, Distributed File Systems, Clustering Algorithms, and Graph Algorithms . [sent-2, score-0.177]
3 Advanced website design depends on deep distributed system design knowledge. [sent-3, score-0.924]
4 They have a a whole Code for Educators program with tutorials and lectures on AJAX programming, distributed systems, and web security. [sent-6, score-1.114]
wordName wordTfidf (topN-words)
[('lectures', 0.668), ('tutorials', 0.222), ('ajax', 0.213), ('clustering', 0.194), ('depends', 0.185), ('introduction', 0.182), ('videos', 0.179), ('distributed', 0.177), ('five', 0.177), ('design', 0.154), ('knowledge', 0.145), ('mapreduce', 0.143), ('graph', 0.136), ('algorithms', 0.132), ('looks', 0.125), ('nice', 0.123), ('added', 0.123), ('pretty', 0.122), ('deep', 0.115), ('systems', 0.11), ('try', 0.103), ('update', 0.103), ('programming', 0.102), ('website', 0.1), ('file', 0.095), ('computing', 0.091), ('google', 0.082), ('web', 0.047), ('get', 0.046), ('system', 0.039)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 223 high scalability-2008-01-25-Google: Introduction to Distributed System Design
Introduction: Update: Google added videos on Cluster Computing and MapReduce . There are five lectures: Introduction, MapReduce, Distributed File Systems, Clustering Algorithms, and Graph Algorithms . Advanced website design depends on deep distributed system design knowledge. Where do you get this knowledge? Try Google. They have a a whole Code for Educators program with tutorials and lectures on AJAX programming, distributed systems, and web security. Looks pretty nice.
2 0.36486992 948 high scalability-2010-11-24-Great Introductory Video on Scalability from Harvard Computer Science
Introduction: Professor David Malan gives a very good lecture on scalability for dynamic websites. It's not highly technical, it's an extension course, but it's a great introduction to a wide variety of topics. I really like his teaching style. He continually asks questions, prompts for input, and gives accessible explanations. Some of the topics covered: vertical scaling; horizontal scaling; PHP acceleration; load balancing: DNS, L7, sticky sessions, load balancers; caching; MySQL: replication, load balancing, partitioning, high availability. Watch it on Academic Earth This is one lecture in a series of 13 lectures on building dynamic websites. Students learn how to: build dynamic websites with Ajax and with Linux , Apache , MySQL , and PHP ( LAMP ); set up domain names with DNS ; structure pages with XHTML and CSS how to program in JavaScript and PHP ; configure Apache and MySQL ; design and query databases with SQL ; use Ajax with both XML and JSON ;
3 0.35927197 362 high scalability-2008-08-11-Distributed Computing & Google Infrastructure
Introduction: A couple of videos about distributed computing with direct reference on Google infrastructure. You will get acquainted with: --MapReduce the software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on commodity hardware --GFS and the way it stores it's data into 64mb chunks --Bigtable which is the simple implementation of a non-relational database at Google Cluster Computing and MapReduce Lectures 1-5 .
4 0.13449576 590 high scalability-2009-05-06-Art of Distributed
Introduction: Art of Distributed Part 1: Rethinking about distributed computing models I ‘m getting a lot of questions lately about the distributed computing, especially distributed computing model, and MapReduce, such as: What is MapReduce? Can MapReduce fit in all situations? How we can compares it with other technologies such as Grid Computing? And what is the best solution to our situation? So I decide to write about the distributed computing article in two parts. First one about the distributed computing model and what is the difference between them. In the second part I will discuss the reliability, and distributed storage systems. Download the article in PDF format. Download the article in MS Word format. I wait for your comments, and questions, and I will answer it in part two.
5 0.10941868 626 high scalability-2009-06-10-Paper: Graph Databases and the Future of Large-Scale Knowledge Management
Introduction: Relational databases, document databases, and distributed hash tables get most of the hype these days, but there's another option: graph databases. Back to the future it seems. Here's a really interesting paper by Marko A. Rodriguez introducing the graph model and it's extension to representing the world wide web of data. Modern day open source and commercial graph databases can store on the order of 1 billion relationships with some databases reaching the 10 billion mark. These developments are making the graph database practical for applications that require large-scale knowledge structures. Moreover, with the Web of Data standards set forth by the Linked Data community, it is possible to interlink graph databases across the web into a giant global knowledge structure. This talk will discuss graph databases, their underlying data model, their querying mechanisms, and the benefits of the graph data structure for modeling and analysis.
6 0.10645368 628 high scalability-2009-06-13-Neo4j - a Graph Database that Kicks Buttox
8 0.10281667 827 high scalability-2010-05-14-Hot Scalability Links for May 14, 2010
9 0.094786867 309 high scalability-2008-04-23-Behind The Scenes of Google Scalability
10 0.086333647 124 high scalability-2007-10-16-How Scalable are Single Page Ajax Apps?
11 0.084576093 1088 high scalability-2011-07-27-Making Hadoop 1000x Faster for Graph Problems
12 0.084395714 1406 high scalability-2013-02-14-When all the Program's a Graph - Prismatic's Plumbing Library
13 0.082167745 601 high scalability-2009-05-17-Product: Hadoop
14 0.082088135 483 high scalability-2009-01-04-Paper: MapReduce: Simplified Data Processing on Large Clusters
15 0.081638396 1375 high scalability-2012-12-21-Stuff The Internet Says On Scalability For December 21, 2012
16 0.079028815 1067 high scalability-2011-06-24-Stuff The Internet Says On Scalability For June 24, 2011
17 0.076562822 1194 high scalability-2012-02-16-A Super Short on the Youporn Stack - 300K QPS and 100 Million Page Views Per Day
18 0.075253539 621 high scalability-2009-06-06-Graph server
19 0.072086647 1049 high scalability-2011-05-31-Awesome List of Advanced Distributed Systems Papers
20 0.071298532 734 high scalability-2009-10-30-Hot Scalabilty Links for October 30 2009
topicId topicWeight
[(0, 0.1), (1, 0.029), (2, 0.011), (3, 0.067), (4, 0.004), (5, 0.054), (6, -0.039), (7, 0.002), (8, 0.019), (9, 0.147), (10, 0.014), (11, -0.062), (12, -0.013), (13, -0.082), (14, 0.035), (15, -0.064), (16, -0.037), (17, 0.045), (18, 0.102), (19, 0.026), (20, -0.018), (21, 0.012), (22, -0.086), (23, -0.04), (24, -0.043), (25, 0.031), (26, 0.06), (27, 0.09), (28, -0.059), (29, 0.017), (30, -0.013), (31, -0.022), (32, 0.021), (33, 0.001), (34, -0.045), (35, -0.05), (36, 0.042), (37, -0.038), (38, 0.023), (39, 0.024), (40, -0.044), (41, -0.038), (42, 0.039), (43, 0.042), (44, -0.022), (45, -0.091), (46, -0.023), (47, 0.052), (48, 0.011), (49, -0.041)]
simIndex simValue blogId blogTitle
same-blog 1 0.96398515 223 high scalability-2008-01-25-Google: Introduction to Distributed System Design
Introduction: Update: Google added videos on Cluster Computing and MapReduce . There are five lectures: Introduction, MapReduce, Distributed File Systems, Clustering Algorithms, and Graph Algorithms . Advanced website design depends on deep distributed system design knowledge. Where do you get this knowledge? Try Google. They have a a whole Code for Educators program with tutorials and lectures on AJAX programming, distributed systems, and web security. Looks pretty nice.
2 0.71801573 362 high scalability-2008-08-11-Distributed Computing & Google Infrastructure
Introduction: A couple of videos about distributed computing with direct reference on Google infrastructure. You will get acquainted with: --MapReduce the software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on commodity hardware --GFS and the way it stores it's data into 64mb chunks --Bigtable which is the simple implementation of a non-relational database at Google Cluster Computing and MapReduce Lectures 1-5 .
3 0.69945103 497 high scalability-2009-01-19-Papers: Readings in Distributed Systems
Introduction: Marton Trencseni has collected a wonderful list of different papers on distributed systems. He's organized them into the following sections: The Google Papers, Distributed Filesystems, Non-relational Distributed Databases, The Lamport Papers, and Implementation Issues. Many old favorites on the list and some that are likely new to you. My new favorite is "Frangipani: A Scalable Distributed File System." How can you not love "Frangipani" as a word?
4 0.6623134 1136 high scalability-2011-11-03-Paper: G2 : A Graph Processing System for Diagnosing Distributed Systems
Introduction: One of the problems in building distributed systems is figuring out what the heck is going on. Usually endless streams of log files are consulted like ancients using entrails to divine the will of the Gods. To rise above these ancient practices we must rise another level of abstraction and that's the approach described in a Microsoft research paper: G2: A Graph Processing System for Diagnosing Distributed Systems , which uses execution graphs that model runtime events and their correlations in distributed systems . The problem with these schemes is viewing applications, written by programmers in low level code, as execution graphs. But we're heading in this direction in any case. To program a warehouse or an internet sized computer we'll have to write at higher levels of abstraction so code can be executed transparently at runtime on these giant distributed computers. There are many advantages to this approach, fault diagnosis and performance monitoring are just one of the wins
5 0.65952623 309 high scalability-2008-04-23-Behind The Scenes of Google Scalability
Introduction: The recent Data-Intensive Computing Symposium brought together experts in system design, programming, parallel algorithms, data management, scientific applications, and information-based applications to better understand existing capabilities in the development and application of large-scale computing systems, and to explore future opportunities. Google Fellow Jeff Dean had a very interesting presentation on Handling Large Datasets at Google: Current Systems and Future Directions. He discussed: • Hardware infrastructure • Distributed systems infrastructure: –Scheduling system –GFS –BigTable –MapReduce • Challenges and Future Directions –Infrastructure that spans all datacenters –More automation It is really like a "How does Google work" presentation in ~60 slides? Check out the slides and the video !
6 0.64406908 734 high scalability-2009-10-30-Hot Scalabilty Links for October 30 2009
7 0.63804263 631 high scalability-2009-06-15-Large-scale Graph Computing at Google
8 0.62834275 590 high scalability-2009-05-06-Art of Distributed
9 0.61379927 850 high scalability-2010-06-30-Paper: GraphLab: A New Framework For Parallel Machine Learning
10 0.59068847 1512 high scalability-2013-09-05-Paper: MillWheel: Fault-Tolerant Stream Processing at Internet Scale
11 0.56039947 592 high scalability-2009-05-06-DyradLINQ
12 0.55559677 483 high scalability-2009-01-04-Paper: MapReduce: Simplified Data Processing on Large Clusters
13 0.54616296 1049 high scalability-2011-05-31-Awesome List of Advanced Distributed Systems Papers
14 0.54350239 973 high scalability-2011-01-14-Stuff The Internet Says On Scalability For January 14, 2011
16 0.53293419 871 high scalability-2010-08-04-Dremel: Interactive Analysis of Web-Scale Datasets - Data as a Programming Paradigm
17 0.52495426 801 high scalability-2010-03-30-Running Large Graph Algorithms - Evaluation of Current State-of-the-Art and Lessons Learned
18 0.5244174 1406 high scalability-2013-02-14-When all the Program's a Graph - Prismatic's Plumbing Library
19 0.52203965 842 high scalability-2010-06-16-Hot Scalability Links for June 16, 2010
20 0.52197015 827 high scalability-2010-05-14-Hot Scalability Links for May 14, 2010
topicId topicWeight
[(2, 0.818)]
simIndex simValue blogId blogTitle
1 1.0 56 high scalability-2007-08-03-Running Hadoop MapReduce on Amazon EC2 and Amazon S3
Introduction: Excellent article on using Hadoop in Amazon's services environment to solve real problems for very little money. It's excellent because it shows how the stack works together and it actually seems like something a real human could do.
2 1.0 565 high scalability-2009-04-13-Benchmark for keeping data in browser in AJAX projects
Introduction: Hi, We are using AJAX and see a lot of opportunity to keep session state on client browser with javascript objects. Is there any benchmark about how much data you can generally keep in javascript objects in browser? Thanks, Unmesh
3 0.99950212 436 high scalability-2008-11-02-Strategy: How to Manage Sessions Using Memcached
Introduction: Dormando shows an enlightened middle way for storing sessions in cache and the database. Sessions are a perfect cache candidate because they are transient, smallish, and since they are usually accessed on every page access removing all that load from the database is a good thing. But as Dormando points out session caches have problems. If you remove expiration times from the cache and you run out of memory then no more logins. If a cache server fails or needs to be upgrade then you just logged out a bunch of potentially angry users. The middle ground Dormando proposes is using both the cache and the database: Reads : read from the cache first, then the database. Typical cache logic. Writes : write to memcached every time, write to the database every N seconds (assuming the data has changed). There's a small chance of data loss, but you've still greatly reduced the database load while providing reliability. Nice solution.
same-blog 4 0.99886572 223 high scalability-2008-01-25-Google: Introduction to Distributed System Design
Introduction: Update: Google added videos on Cluster Computing and MapReduce . There are five lectures: Introduction, MapReduce, Distributed File Systems, Clustering Algorithms, and Graph Algorithms . Advanced website design depends on deep distributed system design knowledge. Where do you get this knowledge? Try Google. They have a a whole Code for Educators program with tutorials and lectures on AJAX programming, distributed systems, and web security. Looks pretty nice.
5 0.99551612 836 high scalability-2010-06-04-Strategy: Cache Larger Chunks - Cache Hit Rate is a Bad Indicator
Introduction: Isn't the secret to fast, scalable websites to cache everything ? Caching, if not the secret sauce of many a website, is it at least a popular condiment. But not so fast says Peter Zaitsev in Beyond great cache hit ratio . The point Peter makes is that we read about websites like Amazon and Facebook that can literally make hundreds of calls to satisfy a user request. Even if you have an awesome cache hit ratio, pages can still be slow because making and processing all those requests takes time. The solution is to remove requests all together . You do this by caching larger blocks so you have to make fewer requests. The post has a lot of good advice worth reading: 1) Make non cacheable blocks as small as possible, 2) Maximize amount of uses of the cache item, 3) Control invalidation, 4) Multi-Get.
6 0.99529332 878 high scalability-2010-08-12-Strategy: Terminate SSL Connections in Hardware and Reduce Server Count by 40%
7 0.99433541 911 high scalability-2010-09-30-More Troubles with Caching
8 0.98756862 594 high scalability-2009-05-08-Eight Best Practices for Building Scalable Systems
9 0.98663294 455 high scalability-2008-12-01-MySQL Database Scale-out and Replication for High Growth Businesses
10 0.97804427 205 high scalability-2008-01-10-Letting Clients Know What's Changed: Push Me or Pull Me?
11 0.97735423 1155 high scalability-2011-12-12-Netflix: Developing, Deploying, and Supporting Software According to the Way of the Cloud
12 0.96354365 50 high scalability-2007-07-31-BerkeleyDB & other distributed high performance key-value databases
13 0.96271199 967 high scalability-2011-01-03-Stuff The Internet Says On Scalability For January 3, 2010
14 0.95821565 417 high scalability-2008-10-15-Outside.in Scales Up with Engine Yard and moving from PHP to Ruby on Rails
15 0.9578017 723 high scalability-2009-10-16-Paper: Scaling Online Social Networks without Pains
16 0.95563251 844 high scalability-2010-06-18-Paper: The Declarative Imperative: Experiences and Conjectures in Distributed Logic
17 0.94911659 1190 high scalability-2012-02-10-Stuff The Internet Says On Scalability For February 10, 2012
18 0.9485026 1199 high scalability-2012-02-27-Zen and the Art of Scaling - A Koan and Epigram Approach
19 0.94751209 1591 high scalability-2014-02-05-Little’s Law, Scalability and Fault Tolerance: The OS is your bottleneck. What you can do?
20 0.94230986 1006 high scalability-2011-03-17-Are long VM instance spin-up times in the cloud costing you money?