high_scalability high_scalability-2008 high_scalability-2008-401 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Compares MapReduce to other parallel processing approaches and suggests new paradigm for clouds and grids
sentIndex sentText sentNum sentScore
wordName wordTfidf (topN-words)
[('compares', 0.484), ('suggests', 0.421), ('paradigm', 0.381), ('clouds', 0.326), ('grids', 0.31), ('approaches', 0.306), ('mapreduce', 0.263), ('parallel', 0.2), ('processing', 0.182), ('new', 0.07)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 401 high scalability-2008-10-04-Is MapReduce going mainstream?
Introduction: Compares MapReduce to other parallel processing approaches and suggests new paradigm for clouds and grids
2 0.29046699 743 high scalability-2009-11-23-Big Data on Grids or on Clouds?
Introduction: Contributed by Wolfgang Gentzsch: Now that we have a new computing paradigm, Cloud Computing, how can Clouds help our data? Replace our internal data vaults as we hoped Grids would? Are Grids dead now that we have Clouds? Despite all the promising developments in the Grid and Cloud computing space, and the avalanche of publications and talks on this subject, many people still seem to be confused about internal data and compute resources, versus Grids versus Clouds, and they are hesitant to take the next step. I think there are a number of issues driving this uncertainty. read more at: BigDataMatters.com
3 0.18602523 590 high scalability-2009-05-06-Art of Distributed
Introduction: Art of Distributed Part 1: Rethinking about distributed computing models I ‘m getting a lot of questions lately about the distributed computing, especially distributed computing model, and MapReduce, such as: What is MapReduce? Can MapReduce fit in all situations? How we can compares it with other technologies such as Grid Computing? And what is the best solution to our situation? So I decide to write about the distributed computing article in two parts. First one about the distributed computing model and what is the difference between them. In the second part I will discuss the reliability, and distributed storage systems. Download the article in PDF format. Download the article in MS Word format. I wait for your comments, and questions, and I will answer it in part two.
4 0.1207296 483 high scalability-2009-01-04-Paper: MapReduce: Simplified Data Processing on Large Clusters
Introduction: Update: MapReduce and PageRank Notes from Remzi Arpaci-Dusseau's Fall 2008 class . Collects interesting facts about MapReduce and PageRank. For example, the history of the solution to searching for the term "flu" is traced through multiple generations of technology. With Google entering the cloud space with Google AppEngine and a maturing Hadoop product, the MapReduce scaling approach might finally become a standard programmer practice. This is the best paper on the subject and is an excellent primer on a content-addressable memory future. Some interesting stats from the paper: Google executes 100k MapReduce jobs each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory. One common criticism ex-Googlers have is that it takes months to get up and be productive in the Google environment. Hopefully a way will be found to lower the learning curve a
5 0.10250118 871 high scalability-2010-08-04-Dremel: Interactive Analysis of Web-Scale Datasets - Data as a Programming Paradigm
Introduction: If Google was a boxer then MapReduce would be a probing right hand that sets up the massive left hook that is Dremel , Google's—scalable (thousands of CPUs, petabytes of data, trillions of rows), SQL based, columnar, interactive (results returned in seconds), ad-hoc—analytics system. If Google was a magician then MapReduce would be the shiny thing that distracts the mind while the trick goes unnoticed. I say that because even though Dremel has been around internally at Google since 2006, we have not heard a whisper about it. All we've heard about is MapReduce, clones of which have inspired entire new industries. Tricky . Dremel, according to Brian Bershad, Director of Engineering at Google, is targeted at solving BigData class problems : While we all know that systems are huge and will get even huger, the implications of this size on programmability, manageability, power, etc. is hard to comprehend. Alfred noted that the Internet is predicted to be carrying a zetta-byte (10 21
6 0.10157449 850 high scalability-2010-06-30-Paper: GraphLab: A New Framework For Parallel Machine Learning
7 0.09732195 718 high scalability-2009-10-08-Riak - web-shaped data storage system
8 0.094525076 1313 high scalability-2012-08-28-Making Hadoop Run Faster
9 0.08544752 376 high scalability-2008-09-03-MapReduce framework Disco
10 0.085402861 109 high scalability-2007-10-03-Save on a Load Balancer By Using Client Side Load Balancing
11 0.084016129 612 high scalability-2009-05-31-Parallel Programming for real-world
12 0.080078296 540 high scalability-2009-03-16-Cisco and Sun to Compete for Unified Computing?
13 0.079158716 1002 high scalability-2011-03-09-Productivity vs. Control tradeoffs in PaaS
14 0.076769158 485 high scalability-2009-01-05-Messaging is not just for investment banks
15 0.074259989 1509 high scalability-2013-08-30-Stuff The Internet Says On Scalability For August 30, 2013
16 0.072488353 601 high scalability-2009-05-17-Product: Hadoop
17 0.071392044 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
18 0.069194347 441 high scalability-2008-11-13-CloudCamp London 2: private clouds and standardisation
19 0.068007618 882 high scalability-2010-08-18-Misco: A MapReduce Framework for Mobile Systems - Start of the Ambient Cloud?
20 0.06707038 362 high scalability-2008-08-11-Distributed Computing & Google Infrastructure
topicId topicWeight
[(0, 0.045), (1, 0.011), (2, 0.024), (3, 0.077), (4, -0.043), (5, 0.041), (6, 0.01), (7, -0.0), (8, 0.01), (9, 0.076), (10, 0.031), (11, -0.015), (12, 0.018), (13, -0.015), (14, 0.031), (15, -0.026), (16, -0.032), (17, -0.041), (18, 0.022), (19, 0.016), (20, 0.013), (21, 0.014), (22, -0.032), (23, 0.023), (24, 0.067), (25, -0.019), (26, 0.022), (27, 0.086), (28, -0.018), (29, 0.058), (30, 0.038), (31, 0.05), (32, -0.048), (33, 0.027), (34, 0.028), (35, -0.049), (36, 0.099), (37, -0.021), (38, 0.013), (39, 0.054), (40, -0.028), (41, 0.044), (42, -0.041), (43, -0.019), (44, 0.018), (45, -0.061), (46, 0.004), (47, 0.008), (48, 0.012), (49, -0.044)]
simIndex simValue blogId blogTitle
same-blog 1 0.98583204 401 high scalability-2008-10-04-Is MapReduce going mainstream?
Introduction: Compares MapReduce to other parallel processing approaches and suggests new paradigm for clouds and grids
2 0.75272214 590 high scalability-2009-05-06-Art of Distributed
Introduction: Art of Distributed Part 1: Rethinking about distributed computing models I ‘m getting a lot of questions lately about the distributed computing, especially distributed computing model, and MapReduce, such as: What is MapReduce? Can MapReduce fit in all situations? How we can compares it with other technologies such as Grid Computing? And what is the best solution to our situation? So I decide to write about the distributed computing article in two parts. First one about the distributed computing model and what is the difference between them. In the second part I will discuss the reliability, and distributed storage systems. Download the article in PDF format. Download the article in MS Word format. I wait for your comments, and questions, and I will answer it in part two.
3 0.73746324 362 high scalability-2008-08-11-Distributed Computing & Google Infrastructure
Introduction: A couple of videos about distributed computing with direct reference on Google infrastructure. You will get acquainted with: --MapReduce the software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on commodity hardware --GFS and the way it stores it's data into 64mb chunks --Bigtable which is the simple implementation of a non-relational database at Google Cluster Computing and MapReduce Lectures 1-5 .
4 0.70874256 612 high scalability-2009-05-31-Parallel Programming for real-world
Introduction: Multicore computers shift the burden of software performance from chip designers and architects to software developers. What is the parallel Computing ? and what the different between Multi-Threading and Concurrency and Parallelism ? and what is differences between task and data parallel ? and how we can use it ? Fundamental article into Parallel Programming...
5 0.68324399 376 high scalability-2008-09-03-MapReduce framework Disco
Introduction: Disco is an open-source implementation of the MapReduce framework for distributed computing. It was started at Nokia Research Center as a lightweight framework for rapid scripting of distributed data processing tasks. The Disco core is written in Erlang. The MapReduce jobs in Disco are natively described as Python programs, which makes it possible to express complex algorithmic and data processing tasks often only in tens of lines of code.
6 0.67203683 891 high scalability-2010-09-01-Scale-out vs Scale-up
7 0.66116041 483 high scalability-2009-01-04-Paper: MapReduce: Simplified Data Processing on Large Clusters
8 0.65959704 608 high scalability-2009-05-27-The Future of the Parallelism and its Challenges
9 0.62899381 850 high scalability-2010-06-30-Paper: GraphLab: A New Framework For Parallel Machine Learning
10 0.61633968 592 high scalability-2009-05-06-DyradLINQ
11 0.60744119 743 high scalability-2009-11-23-Big Data on Grids or on Clouds?
12 0.5906058 309 high scalability-2008-04-23-Behind The Scenes of Google Scalability
13 0.57865286 591 high scalability-2009-05-06-Dyrad
14 0.5563755 581 high scalability-2009-04-26-Map-Reduce for Machine Learning on Multicore
15 0.54002476 1049 high scalability-2011-05-31-Awesome List of Advanced Distributed Systems Papers
16 0.53593528 386 high scalability-2008-09-22-Cloud computing, grid computing, utility computing - list of top providers
17 0.52536988 652 high scalability-2009-07-08-Art of Parallelism presentation
18 0.52012312 470 high scalability-2008-12-18-Risk Analysis on the Cloud (Using Excel and GigaSpaces)
19 0.50961018 325 high scalability-2008-05-25-How do you explain cloud computing to your grandma?
20 0.5006752 575 high scalability-2009-04-21-Thread Pool Engine in MS CLR 4, and Work-Stealing scheduling algorithm
topicId topicWeight
[(2, 0.053), (10, 0.109), (61, 0.08), (79, 0.514)]
simIndex simValue blogId blogTitle
same-blog 1 0.99679822 401 high scalability-2008-10-04-Is MapReduce going mainstream?
Introduction: Compares MapReduce to other parallel processing approaches and suggests new paradigm for clouds and grids
2 0.97097594 443 high scalability-2008-11-14-Paper: Pig Latin: A Not-So-Foreign Language for Data Processing
Introduction: Yahoo has developed a new language called Pig Latin that fit in a sweet spot between high-level declarative querying in the spirit of SQL, and low-level, procedural programming `a la map-reduce and combines best of both worlds. The accompanying system, Pig, is fully implemented, and compiles Pig Latin into physical plans that are executed over Hadoop, an open-source, map-reduce implementation. Pig has just graduated from the Apache Incubator and joined Hadoop as a subproject. The paper has a few examples of how engineers at Yahoo! are using Pig to dramatically reduce the time required for the development and execution of their data analysis tasks, compared to using Hadoop directly. References: Apache Pig Wiki
3 0.96204334 692 high scalability-2009-09-01-Cheap storage: how backblaze takes matters in hand
Introduction: Blackblaze blogs about how they built their own storage infrastructure on the cheap to run their cloud backup service. This episode: the hardware. Sorry, just a link this time.
4 0.96204334 1119 high scalability-2011-09-20-HighScalability is old news. Step your scaling game way up... (NSFW cartoon)
Introduction: Jeremy Raines tweeted a link to this cartoon my new filing technique is unstoppable , showing how scotch tape can be used to create a new super-database. Very funny in a Dilbert sort of way, but definitely not NSFW... For more on Twisted Tuesday, you may enjoy: Hilarious Video: Relational Database Vs NoSQL Fanbois NSFW: Hilarious Fault-Tolerance Cartoon
5 0.95846099 782 high scalability-2010-02-23-When to migrate your database?
Introduction: Why migrate your database? Efficiency and availability problems are harming your business – reports are out of date, your batch processing window is nearing its limits, outages (unplanned/planned) frequently halt work. Database consolidation – remove the costs that result from a heterogeneous database environment (DBAs time, database vendor pricing, database versions, hardware, OSs, patches, upgrades etc.). OK, so the driving forces for migration are clear, what now? Read more on BigDataMatters.com
6 0.95590627 743 high scalability-2009-11-23-Big Data on Grids or on Clouds?
7 0.95248604 8 high scalability-2007-07-12-Should I use LAMP or Windows?
8 0.94707739 107 high scalability-2007-10-02-Some Real Financial Numbers for Your Startup
9 0.9362216 1169 high scalability-2012-01-05-Shutterfly Saw a Speedup of 500% With Flashcache
10 0.93538034 1100 high scalability-2011-08-18-Paper: The Akamai Network - 61,000 servers, 1,000 networks, 70 countries
11 0.93190753 372 high scalability-2008-08-27-Updating distributed web applications
12 0.91399926 1277 high scalability-2012-07-05-10 Golden Principles For Building Successful Mobile-Web Applications
13 0.8999247 784 high scalability-2010-02-25-Paper: High Performance Scalable Data Stores
14 0.88895816 323 high scalability-2008-05-19-Twitter as a scalability case study
15 0.86624485 871 high scalability-2010-08-04-Dremel: Interactive Analysis of Web-Scale Datasets - Data as a Programming Paradigm
16 0.84440744 75 high scalability-2007-08-28-Google Utilities : An online google guide,tools and Utilities.
17 0.84124643 1181 high scalability-2012-01-25-Google Goes MoreSQL with Tenzing - SQL Over MapReduce
18 0.83978105 1403 high scalability-2013-02-08-Stuff The Internet Says On Scalability For February 8, 2013
19 0.83345801 1162 high scalability-2011-12-23-Funny: A Cautionary Tale About Storage and Backup
20 0.82242072 1420 high scalability-2013-03-08-Stuff The Internet Says On Scalability For March 8, 2013