high_scalability high_scalability-2008 high_scalability-2008-362 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: A couple of videos about distributed computing with direct reference on Google infrastructure. You will get acquainted with: --MapReduce the software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on commodity hardware --GFS and the way it stores it's data into 64mb chunks --Bigtable which is the simple implementation of a non-relational database at Google Cluster Computing and MapReduce Lectures 1-5 .
sentIndex sentText sentNum sentScore
1 A couple of videos about distributed computing with direct reference on Google infrastructure. [sent-1, score-1.089]
wordName wordTfidf (topN-words)
[('acquainted', 0.437), ('lectures', 0.377), ('terabyte', 0.294), ('computations', 0.249), ('reference', 0.214), ('computing', 0.206), ('direct', 0.202), ('videos', 0.202), ('couple', 0.198), ('greater', 0.188), ('google', 0.185), ('commodity', 0.184), ('implemented', 0.171), ('sets', 0.164), ('stores', 0.161), ('mapreduce', 0.161), ('framework', 0.142), ('implementation', 0.141), ('parallel', 0.123), ('support', 0.088), ('simple', 0.087), ('software', 0.071), ('large', 0.067), ('distributed', 0.067), ('data', 0.065), ('way', 0.065), ('database', 0.053), ('get', 0.052)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 362 high scalability-2008-08-11-Distributed Computing & Google Infrastructure
Introduction: A couple of videos about distributed computing with direct reference on Google infrastructure. You will get acquainted with: --MapReduce the software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on commodity hardware --GFS and the way it stores it's data into 64mb chunks --Bigtable which is the simple implementation of a non-relational database at Google Cluster Computing and MapReduce Lectures 1-5 .
2 0.35927197 223 high scalability-2008-01-25-Google: Introduction to Distributed System Design
Introduction: Update: Google added videos on Cluster Computing and MapReduce . There are five lectures: Introduction, MapReduce, Distributed File Systems, Clustering Algorithms, and Graph Algorithms . Advanced website design depends on deep distributed system design knowledge. Where do you get this knowledge? Try Google. They have a a whole Code for Educators program with tutorials and lectures on AJAX programming, distributed systems, and web security. Looks pretty nice.
3 0.158796 590 high scalability-2009-05-06-Art of Distributed
Introduction: Art of Distributed Part 1: Rethinking about distributed computing models I ‘m getting a lot of questions lately about the distributed computing, especially distributed computing model, and MapReduce, such as: What is MapReduce? Can MapReduce fit in all situations? How we can compares it with other technologies such as Grid Computing? And what is the best solution to our situation? So I decide to write about the distributed computing article in two parts. First one about the distributed computing model and what is the difference between them. In the second part I will discuss the reliability, and distributed storage systems. Download the article in PDF format. Download the article in MS Word format. I wait for your comments, and questions, and I will answer it in part two.
4 0.1540271 948 high scalability-2010-11-24-Great Introductory Video on Scalability from Harvard Computer Science
Introduction: Professor David Malan gives a very good lecture on scalability for dynamic websites. It's not highly technical, it's an extension course, but it's a great introduction to a wide variety of topics. I really like his teaching style. He continually asks questions, prompts for input, and gives accessible explanations. Some of the topics covered: vertical scaling; horizontal scaling; PHP acceleration; load balancing: DNS, L7, sticky sessions, load balancers; caching; MySQL: replication, load balancing, partitioning, high availability. Watch it on Academic Earth This is one lecture in a series of 13 lectures on building dynamic websites. Students learn how to: build dynamic websites with Ajax and with Linux , Apache , MySQL , and PHP ( LAMP ); set up domain names with DNS ; structure pages with XHTML and CSS how to program in JavaScript and PHP ; configure Apache and MySQL ; design and query databases with SQL ; use Ajax with both XML and JSON ;
5 0.12445892 483 high scalability-2009-01-04-Paper: MapReduce: Simplified Data Processing on Large Clusters
Introduction: Update: MapReduce and PageRank Notes from Remzi Arpaci-Dusseau's Fall 2008 class . Collects interesting facts about MapReduce and PageRank. For example, the history of the solution to searching for the term "flu" is traced through multiple generations of technology. With Google entering the cloud space with Google AppEngine and a maturing Hadoop product, the MapReduce scaling approach might finally become a standard programmer practice. This is the best paper on the subject and is an excellent primer on a content-addressable memory future. Some interesting stats from the paper: Google executes 100k MapReduce jobs each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory. One common criticism ex-Googlers have is that it takes months to get up and be productive in the Google environment. Hopefully a way will be found to lower the learning curve a
6 0.11352886 445 high scalability-2008-11-14-Useful Cloud Computing Blogs
7 0.11203898 448 high scalability-2008-11-22-Google Architecture
8 0.10713919 601 high scalability-2009-05-17-Product: Hadoop
9 0.10405335 1194 high scalability-2012-02-16-A Super Short on the Youporn Stack - 300K QPS and 100 Million Page Views Per Day
10 0.10164188 414 high scalability-2008-10-15-Hadoop - A Primer
11 0.09591525 325 high scalability-2008-05-25-How do you explain cloud computing to your grandma?
12 0.092253238 15 high scalability-2007-07-16-Blog: MySQL Performance Blog - Everything about MySQL Performance.
13 0.09179607 1067 high scalability-2011-06-24-Stuff The Internet Says On Scalability For June 24, 2011
14 0.091257349 211 high scalability-2008-01-13-Google Reveals New MapReduce Stats
15 0.0894721 612 high scalability-2009-05-31-Parallel Programming for real-world
16 0.088615529 309 high scalability-2008-04-23-Behind The Scenes of Google Scalability
17 0.088462874 882 high scalability-2010-08-18-Misco: A MapReduce Framework for Mobile Systems - Start of the Ambient Cloud?
18 0.085374303 376 high scalability-2008-09-03-MapReduce framework Disco
19 0.085067056 666 high scalability-2009-07-30-Learn How to Think at Scale
20 0.083006606 871 high scalability-2010-08-04-Dremel: Interactive Analysis of Web-Scale Datasets - Data as a Programming Paradigm
topicId topicWeight
[(0, 0.092), (1, 0.055), (2, 0.045), (3, 0.091), (4, -0.029), (5, 0.057), (6, 0.016), (7, -0.02), (8, 0.035), (9, 0.126), (10, 0.009), (11, -0.068), (12, 0.02), (13, -0.014), (14, 0.041), (15, -0.042), (16, -0.1), (17, -0.053), (18, 0.091), (19, -0.009), (20, 0.022), (21, 0.016), (22, -0.042), (23, -0.02), (24, 0.039), (25, 0.028), (26, 0.052), (27, 0.091), (28, -0.055), (29, 0.068), (30, 0.011), (31, -0.008), (32, -0.007), (33, 0.004), (34, -0.036), (35, -0.046), (36, 0.081), (37, -0.019), (38, 0.016), (39, 0.082), (40, -0.034), (41, -0.014), (42, -0.006), (43, 0.039), (44, 0.005), (45, -0.086), (46, 0.022), (47, 0.004), (48, 0.0), (49, -0.036)]
simIndex simValue blogId blogTitle
same-blog 1 0.96932232 362 high scalability-2008-08-11-Distributed Computing & Google Infrastructure
Introduction: A couple of videos about distributed computing with direct reference on Google infrastructure. You will get acquainted with: --MapReduce the software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on commodity hardware --GFS and the way it stores it's data into 64mb chunks --Bigtable which is the simple implementation of a non-relational database at Google Cluster Computing and MapReduce Lectures 1-5 .
2 0.7948218 483 high scalability-2009-01-04-Paper: MapReduce: Simplified Data Processing on Large Clusters
Introduction: Update: MapReduce and PageRank Notes from Remzi Arpaci-Dusseau's Fall 2008 class . Collects interesting facts about MapReduce and PageRank. For example, the history of the solution to searching for the term "flu" is traced through multiple generations of technology. With Google entering the cloud space with Google AppEngine and a maturing Hadoop product, the MapReduce scaling approach might finally become a standard programmer practice. This is the best paper on the subject and is an excellent primer on a content-addressable memory future. Some interesting stats from the paper: Google executes 100k MapReduce jobs each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory. One common criticism ex-Googlers have is that it takes months to get up and be productive in the Google environment. Hopefully a way will be found to lower the learning curve a
3 0.78716993 309 high scalability-2008-04-23-Behind The Scenes of Google Scalability
Introduction: The recent Data-Intensive Computing Symposium brought together experts in system design, programming, parallel algorithms, data management, scientific applications, and information-based applications to better understand existing capabilities in the development and application of large-scale computing systems, and to explore future opportunities. Google Fellow Jeff Dean had a very interesting presentation on Handling Large Datasets at Google: Current Systems and Future Directions. He discussed: • Hardware infrastructure • Distributed systems infrastructure: –Scheduling system –GFS –BigTable –MapReduce • Challenges and Future Directions –Infrastructure that spans all datacenters –More automation It is really like a "How does Google work" presentation in ~60 slides? Check out the slides and the video !
4 0.77541965 401 high scalability-2008-10-04-Is MapReduce going mainstream?
Introduction: Compares MapReduce to other parallel processing approaches and suggests new paradigm for clouds and grids
5 0.76153582 223 high scalability-2008-01-25-Google: Introduction to Distributed System Design
Introduction: Update: Google added videos on Cluster Computing and MapReduce . There are five lectures: Introduction, MapReduce, Distributed File Systems, Clustering Algorithms, and Graph Algorithms . Advanced website design depends on deep distributed system design knowledge. Where do you get this knowledge? Try Google. They have a a whole Code for Educators program with tutorials and lectures on AJAX programming, distributed systems, and web security. Looks pretty nice.
6 0.75233221 590 high scalability-2009-05-06-Art of Distributed
7 0.71607608 871 high scalability-2010-08-04-Dremel: Interactive Analysis of Web-Scale Datasets - Data as a Programming Paradigm
8 0.68320179 409 high scalability-2008-10-13-Challenges from large scale computing at Google
9 0.68053609 376 high scalability-2008-09-03-MapReduce framework Disco
10 0.67014837 592 high scalability-2009-05-06-DyradLINQ
11 0.66421115 734 high scalability-2009-10-30-Hot Scalabilty Links for October 30 2009
12 0.64379549 211 high scalability-2008-01-13-Google Reveals New MapReduce Stats
13 0.6360777 1535 high scalability-2013-10-21-Google's Sanjay Ghemawat on What Made Google Google and Great Big Data Career Advice
14 0.62735301 650 high scalability-2009-07-02-Product: Hbase
15 0.61956638 850 high scalability-2010-06-30-Paper: GraphLab: A New Framework For Parallel Machine Learning
16 0.61531216 497 high scalability-2009-01-19-Papers: Readings in Distributed Systems
18 0.60051984 1328 high scalability-2012-09-24-Google Spanner's Most Surprising Revelation: NoSQL is Out and NewSQL is In
20 0.59168226 1540 high scalability-2013-10-30-Strategy: Use Your Quantum Computer Lab to Tell Intentional Blinks from Involuntary Blinks
topicId topicWeight
[(1, 0.051), (2, 0.198), (10, 0.116), (15, 0.335), (79, 0.13)]
simIndex simValue blogId blogTitle
same-blog 1 0.82810402 362 high scalability-2008-08-11-Distributed Computing & Google Infrastructure
Introduction: A couple of videos about distributed computing with direct reference on Google infrastructure. You will get acquainted with: --MapReduce the software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on commodity hardware --GFS and the way it stores it's data into 64mb chunks --Bigtable which is the simple implementation of a non-relational database at Google Cluster Computing and MapReduce Lectures 1-5 .
2 0.74432737 812 high scalability-2010-04-19-Strategy: Order Two Mediums Instead of Two Smalls and the EC2 Buffet
Introduction: Vaibhav Puranik in Web serving in the cloud – our experiences with nginx and instance sizes describes their experience trying to maximum traffic and minimum their web serving costs on EC2. Initially they tested with two m1.small instance types and then they the switched to two c1.mediums instance types. The m1s are the standard instance types and the c1s are the high CPU instance types. Obviously the mediums have greater capability, but the cost difference was interesting: In the long term they will save money using the larger instances and not autoscaling. With the small instances, traffic bursts caused autoscaling to kick in. New instances were started in response to load. The instances woud be up for a short period of time and then spin down again. This constant churn costs a lot of money. Selecting the larger instance sizes, which are capable of handling the load without autoscaling, turn out to save money even though they are more expensive. Starting new instances also tak
3 0.73212999 1512 high scalability-2013-09-05-Paper: MillWheel: Fault-Tolerant Stream Processing at Internet Scale
Introduction: Ever wonder what powers Google's world spirit sensing Zeitgeist service ? No, it's not a homunculus of Georg Wilhelm Friedrich Hegel sitting in each browser. It's actually a stream processing (think streaming MapReduce on steroids) system called MillWheel, described in this very well written paper: MillWheel: Fault-Tolerant Stream Processing at Internet Scale . MillWheel isn't just used for Zeitgeist at Google, it's also used for streaming joins for a variety of Ads customers, generalized anomaly-detection service, and network switch and cluster health monitoring. Abstract: MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework’s fault-tolerance guarantees. This paper describes MillWheel’s programming model as well as it
4 0.66635823 1455 high scalability-2013-05-10-Stuff The Internet Says On Scalability For May 10, 2013
Introduction: Hey, it's HighScalability time: ( In Thailand, they figured out how to solve the age-old queuing problem! ) Nanoscale : Plants IM Using Nanoscale Sound Waves; 100 petabytes : CERN data storage Quotable Quotes: Geoff Arnold : Arguably all interesting advances in computer science and software engineering occur when a resource that was previously scarce or expensive becomes cheap and plentiful. @jamesurquhart : "Complexity is a characteristic of the system, not of the parts in it." -Dekker @louisnorthmore : Scaling down - now that's scalability! @peakscale : Where distributed systems people retire to forget the madness: http://en.wikipedia.org/wiki/Antipaxos @dozba : "The Linux Game Database" ... Well, at least they will never have scaling problems. Michael Widenius : There is no reason at all to use MySQL @steveloughran : Whenever someone says "unlimited scalability", ask if that exceeds the ber
5 0.65739226 85 high scalability-2007-09-08-Making the case for PHP at Yahoo! (Oct 2002)
Introduction: This presentation by Michael Radwin describes why Yahoo! had standardized on PHP going forward. It describes how after reviewing all the web technologies including their own internal ones, PHP was choosen. It shows that not only technical reasons , but also business and development processes were taken into account.
6 0.64858222 414 high scalability-2008-10-15-Hadoop - A Primer
7 0.6335026 923 high scalability-2010-10-21-Machine VM + Cloud API - Rewriting the Cloud from Scratch
8 0.62129474 1297 high scalability-2012-08-03-Stuff The Internet Says On Scalability For August 3, 2012
9 0.62051672 1421 high scalability-2013-03-11-Low Level Scalability Solutions - The Conditioning Collection
10 0.61409402 20 high scalability-2007-07-16-Paper: The Clustered Storage Revolution
11 0.60839486 88 high scalability-2007-09-10-Blog: Scalable Web Architectures by Royans Tharakan
12 0.60746354 948 high scalability-2010-11-24-Great Introductory Video on Scalability from Harvard Computer Science
13 0.60405719 1353 high scalability-2012-11-01-Cost Analysis: TripAdvisor and Pinterest costs on the AWS cloud
14 0.59494704 1237 high scalability-2012-05-02-12 Ways to Increase Throughput by 32X and Reduce Latency by 20X
15 0.59366834 1425 high scalability-2013-03-18-Beyond Threads and Callbacks - Application Architecture Pros and Cons
16 0.59365261 1371 high scalability-2012-12-12-Pinterest Cut Costs from $54 to $20 Per Hour by Automatically Shutting Down Systems
17 0.59321821 353 high scalability-2008-07-20-Strategy: Front S3 with a Caching Proxy
18 0.59115893 157 high scalability-2007-11-16-Product: lbpool - Load Balancing JDBC Pool
19 0.5898819 257 high scalability-2008-02-22-Kevin's Great Adventures in SSDland
20 0.58803314 1491 high scalability-2013-07-15-Ask HS: What's Wrong with Twitter, Why Isn't One Machine Enough?