high_scalability high_scalability-2009 high_scalability-2009-592 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for ordinary programmers. DryadLINQ combines two important pieces of Microsoft technology: the Dryad distributed execution engine and the .NET Language Integrated Query (LINQ).
sentIndex sentText sentNum sentScore
1 The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for ordinary programmers. [sent-1, score-0.993]
2 DryadLINQ combines two important pieces of Microsoft technology: the Dryad distributed execution engine and the . [sent-2, score-0.815]
wordName wordTfidf (topN-words)
[('dryadlinq', 0.74), ('dryad', 0.289), ('linq', 0.289), ('ordinary', 0.266), ('combines', 0.173), ('pieces', 0.162), ('execution', 0.139), ('integrated', 0.133), ('goal', 0.117), ('compute', 0.108), ('microsoft', 0.108), ('language', 0.106), ('engine', 0.103), ('distributed', 0.102), ('query', 0.096), ('cluster', 0.081), ('important', 0.08), ('enough', 0.08), ('computing', 0.079), ('technology', 0.078), ('simple', 0.067), ('two', 0.056), ('large', 0.052), ('make', 0.041)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 592 high scalability-2009-05-06-DyradLINQ
Introduction: The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for ordinary programmers. DryadLINQ combines two important pieces of Microsoft technology: the Dryad distributed execution engine and the .NET Language Integrated Query (LINQ).
2 0.19747131 591 high scalability-2009-05-06-Dyrad
Introduction: The Dryad Project is investigating programming models for writing parallel and distributed programs to scale from a small cluster to a large data-center.
3 0.17032368 155 high scalability-2007-11-15-Video: Dryad: A general-purpose distributed execution platform
Introduction: Dryad is Microsoft's answer to Google's map-reduce . What's the question: How do you process really large amounts of data? My initial impression of Dryad is it's like a giant Unix command line filter on steroids. There are lots of inputs, outputs, tees, queues, and merge sorts all connected together by a master exec program. What else does Dryad have to offer the scalable infrastructure wars? Dryad models programs as the execution of a directed acyclic graph. Each vertex is a program and edges are typed communication channels (files, TCP pipes, and shared memory channels within a process). Map-reduce uses a different model. It's more like a large distributed sort where the programmer defines functions for mapping, partitioning, and reducing. Each approach seems to borrow from the spirit of its creating organization. The graph approach seems a bit too complicated and map-reduce seems a bit too simple. How ironic, in the Alanis Morissette sense. Dryad is a middleware layer that ex
4 0.085452154 1136 high scalability-2011-11-03-Paper: G2 : A Graph Processing System for Diagnosing Distributed Systems
Introduction: One of the problems in building distributed systems is figuring out what the heck is going on. Usually endless streams of log files are consulted like ancients using entrails to divine the will of the Gods. To rise above these ancient practices we must rise another level of abstraction and that's the approach described in a Microsoft research paper: G2: A Graph Processing System for Diagnosing Distributed Systems , which uses execution graphs that model runtime events and their correlations in distributed systems . The problem with these schemes is viewing applications, written by programmers in low level code, as execution graphs. But we're heading in this direction in any case. To program a warehouse or an internet sized computer we'll have to write at higher levels of abstraction so code can be executed transparently at runtime on these giant distributed computers. There are many advantages to this approach, fault diagnosis and performance monitoring are just one of the wins
5 0.06697832 590 high scalability-2009-05-06-Art of Distributed
Introduction: Art of Distributed Part 1: Rethinking about distributed computing models I ‘m getting a lot of questions lately about the distributed computing, especially distributed computing model, and MapReduce, such as: What is MapReduce? Can MapReduce fit in all situations? How we can compares it with other technologies such as Grid Computing? And what is the best solution to our situation? So I decide to write about the distributed computing article in two parts. First one about the distributed computing model and what is the difference between them. In the second part I will discuss the reliability, and distributed storage systems. Download the article in PDF format. Download the article in MS Word format. I wait for your comments, and questions, and I will answer it in part two.
6 0.063081056 40 high scalability-2007-07-30-Product: Amazon Elastic Compute Cloud
7 0.061395653 1528 high scalability-2013-10-07-Ask HS: Is Microsoft the Right Technology for a Scalable Web-based System?
8 0.056090519 279 high scalability-2008-03-17-Microsoft's New Database Cloud Ready to Rumble with Amazon
9 0.051029816 867 high scalability-2010-07-27-YeSQL: An Overview of the Various Query Semantics in the Post Only-SQL World
10 0.048299767 445 high scalability-2008-11-14-Useful Cloud Computing Blogs
11 0.047724836 801 high scalability-2010-03-30-Running Large Graph Algorithms - Evaluation of Current State-of-the-Art and Lessons Learned
12 0.047543041 1063 high scalability-2011-06-17-Stuff The Internet Says On Scalability For June 17, 2011
13 0.043398239 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
14 0.043136962 184 high scalability-2007-12-13-Amazon SimpleDB - Scalable Cloud Database
15 0.041624133 1635 high scalability-2014-04-21-This is why Microsoft won. And why they lost.
16 0.041136194 46 high scalability-2007-07-30-Product: Sun Utility Computing
18 0.040965658 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud
19 0.040016003 658 high scalability-2009-07-17-Against all the odds
20 0.039973743 1559 high scalability-2013-12-06-Stuff The Internet Says On Scalability For December 6th, 2013
topicId topicWeight
[(0, 0.052), (1, 0.024), (2, 0.014), (3, 0.048), (4, -0.02), (5, 0.029), (6, -0.002), (7, -0.017), (8, -0.0), (9, 0.036), (10, -0.006), (11, 0.002), (12, 0.004), (13, -0.005), (14, 0.019), (15, -0.014), (16, -0.035), (17, -0.008), (18, 0.024), (19, 0.012), (20, -0.016), (21, -0.012), (22, -0.016), (23, 0.015), (24, 0.012), (25, 0.002), (26, 0.012), (27, 0.011), (28, 0.009), (29, -0.001), (30, -0.011), (31, -0.004), (32, -0.055), (33, 0.033), (34, 0.013), (35, -0.012), (36, 0.043), (37, -0.014), (38, -0.03), (39, 0.013), (40, -0.021), (41, -0.008), (42, 0.014), (43, -0.008), (44, -0.014), (45, 0.018), (46, -0.003), (47, 0.021), (48, 0.041), (49, -0.009)]
simIndex simValue blogId blogTitle
same-blog 1 0.94920671 592 high scalability-2009-05-06-DyradLINQ
Introduction: The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for ordinary programmers. DryadLINQ combines two important pieces of Microsoft technology: the Dryad distributed execution engine and the .NET Language Integrated Query (LINQ).
2 0.62594396 590 high scalability-2009-05-06-Art of Distributed
Introduction: Art of Distributed Part 1: Rethinking about distributed computing models I ‘m getting a lot of questions lately about the distributed computing, especially distributed computing model, and MapReduce, such as: What is MapReduce? Can MapReduce fit in all situations? How we can compares it with other technologies such as Grid Computing? And what is the best solution to our situation? So I decide to write about the distributed computing article in two parts. First one about the distributed computing model and what is the difference between them. In the second part I will discuss the reliability, and distributed storage systems. Download the article in PDF format. Download the article in MS Word format. I wait for your comments, and questions, and I will answer it in part two.
3 0.61784887 743 high scalability-2009-11-23-Big Data on Grids or on Clouds?
Introduction: Contributed by Wolfgang Gentzsch: Now that we have a new computing paradigm, Cloud Computing, how can Clouds help our data? Replace our internal data vaults as we hoped Grids would? Are Grids dead now that we have Clouds? Despite all the promising developments in the Grid and Cloud computing space, and the avalanche of publications and talks on this subject, many people still seem to be confused about internal data and compute resources, versus Grids versus Clouds, and they are hesitant to take the next step. I think there are a number of issues driving this uncertainty. read more at: BigDataMatters.com
4 0.60782993 1049 high scalability-2011-05-31-Awesome List of Advanced Distributed Systems Papers
Introduction: As part of Dr. Indranil Gupta 's CS 525 Spring 2011 Advanced Distributed Systems class, he has collected an incredible list of resources on distributed systems . His research group is also doing some interesting work. The various topics include: Before there Were Clouds, Cloud Computing, P2P Systems, Basic Distributed Computing Concepts, Sensor Networks, Overlays and DHTs, Cloud Programming, Cloud Scheduling, Key-Value Stores, Storage, Sensor Net Routing, Geo-Distribution, P2P Apps, In-network processing, Epidemics, Probabilistic Membership Protocols, Distributed Monitoring and Management, Publish-Subscribe/CDNs, Measurement Studies, Old Wine: Stale or Vintage?, In Byzantium, Cloud Pricing, Other Industrial Systems, Structure of Networks, Completing the Circle, Green Clouds, Distributed Debugging, Flash!, The Middle or the End?, Availability-Aware Systems, Design Methodologies, Handling Stress, Sources of unreliability in networks, Handling Stress, Selfish algorithms, Securi
5 0.59873927 839 high scalability-2010-06-09-Paper: Propagation Networks: A Flexible and Expressive Substrate for Computation
Introduction: Alexey Radul in his fascinating 174 page dissertation Propagation Networks: A Flexible and Expressive Substrate for Computation , offers to help us break free of the tyranny of linear time by arranging computation as a network of autonomous but interconnected machines . We can do this by organizing computation as a network of interconnected machines of some kind, each of which is free to run when it pleases, propagating information around the network as proves possible. The consequence of this freedom is that the structure of the aggregate does not impose an order of time. The abstract from his thesis is : In this dissertation I propose a shift in the foundations of computation. Modern programming systems are not expressive enough. The traditional image of a single computer that has global effects on a large memory is too restrictive. The propagation paradigm replaces this with computing by networks of local, independent, stateless machines interconnected with stateful storage
6 0.58457839 401 high scalability-2008-10-04-Is MapReduce going mainstream?
7 0.58286124 591 high scalability-2009-05-06-Dyrad
8 0.58281356 581 high scalability-2009-04-26-Map-Reduce for Machine Learning on Multicore
9 0.58184689 810 high scalability-2010-04-14-Parallel Information Retrieval and Other Search Engine Goodness
10 0.57810575 362 high scalability-2008-08-11-Distributed Computing & Google Infrastructure
11 0.56871718 393 high scalability-2008-09-25-GridGain: One Compute Grid, Many Data Grids
12 0.55971384 325 high scalability-2008-05-25-How do you explain cloud computing to your grandma?
13 0.55402261 850 high scalability-2010-06-30-Paper: GraphLab: A New Framework For Parallel Machine Learning
14 0.55381185 46 high scalability-2007-07-30-Product: Sun Utility Computing
15 0.55193007 983 high scalability-2011-02-02-Piccolo - Building Distributed Programs that are 11x Faster than Hadoop
16 0.54785591 1127 high scalability-2011-09-28-Pursue robust indefinite scalability with the Movable Feast Machine
17 0.54707688 355 high scalability-2008-07-21-Eucalyptus - Build Your Own Private EC2 Cloud
18 0.54622996 470 high scalability-2008-12-18-Risk Analysis on the Cloud (Using Excel and GigaSpaces)
19 0.54107696 891 high scalability-2010-09-01-Scale-out vs Scale-up
20 0.53755295 844 high scalability-2010-06-18-Paper: The Declarative Imperative: Experiences and Conjectures in Distributed Logic
topicId topicWeight
[(2, 0.132), (61, 0.119), (79, 0.128), (84, 0.41)]
simIndex simValue blogId blogTitle
same-blog 1 0.74505204 592 high scalability-2009-05-06-DyradLINQ
Introduction: The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for ordinary programmers. DryadLINQ combines two important pieces of Microsoft technology: the Dryad distributed execution engine and the .NET Language Integrated Query (LINQ).
2 0.7099297 215 high scalability-2008-01-16-Strategy: Asynchronous Queued Virus Scanning
Introduction: Atif Ghaffar has a nice strategy to deal with virus checking uploads: Upload item into a safe area. If necessary, the uploader blocks waiting for a result. Queue a work order into a job system so all the work can be distributed throughout your cluster. A service in your cluster performs the virus scan and informs the uploader of the result. Move the vetted item into your system. This removes the CPU bottleneck from your web servers and distributes it through your cluster. Keep your web servers providing prompt service to users. Let your cluster do the heavy lifting. This minimizes response time and maximizes throughput. A similar system can be used for creating thumbnails, transcoding, copyright checks, updating indexes, event notification or any other kind of intensive work.
3 0.65650576 1384 high scalability-2013-01-09-The Story of How Turning Disk Into a Service Lead to a Deluge of Density
Introduction: We usually think of the wonderful advantages of service oriented architectures as a software thing, but it also applies to hardware. In Security Now 385 , that Doyen of Disk, Steve Gibson , tells the fascinating story (@ about 41:30) of how moving to a service oriented architecture in hard drives, modeling a drive as a linear stream of sectors, helped create the amazing high density disk drives we enjoy today. When drives switched to use the IDE (integrated drive electronics) interface, the controller function moved into the drive instead of the computer. No longer were low level drive signals moved across cables and into the motherboard. Now we just ask the drive for the desired sector and the drive takes care of it. This allowed manufacturers to do anything they wanted to behind the IDE interface. The drive stopped being dumb, it became smart, providing a sort of sector service. Density sky rocketed because there was no dependency on the computer. All the internals could co
Introduction: Todd had originally posted an entry on collectl here at Collectl - Performance Data Collector . Collectl collects real-time data from a large number of subsystems like buddyinfo, cpu, disk, inodes, infiniband, lustre, memory, network, nfs, processes, quadrics, slabs, sockets and tcp, all using one tool and in one consistent format. Since then a lot has happened. It's now part of both Fedora and Debian distros, not to mention several others. There has also been a pretty good summary written up by Joe Brockmeier . It's also pretty well documented (I like to think) on sourceforge . There have also been a few blog postings by Martin Bach on his blog. Anyhow, awhile back I released a new version of collectl-utils and gave a complete face-lift to one of the utilities, colmux, which is a collectl multiplexor. This tool has the ability to run collectl on multiple systems, which in turn send all their output back to colmux. Colmux then sorts the output on a user-specified column
5 0.59062999 1165 high scalability-2011-12-28-Strategy: Guaranteed Availability Requires Reserving Instances in Specific Zones
Introduction: When EC2 first started the mental model was of a magic Pez dispenser supplying an infinite stream of instances in any desired flavor. If you needed an instance, because of a either a failure or traffic spike, it would be there. As amazing as EC2 is, this model turned out to be optimistic. From a thread on the Amazon discussion forum we learn any dispenser has limits: As Availability Zones grow over time, our ability to continue to expand them can become constrained. In these scenarios, we will prevent customers from launching in the constrained zone if they do not yet have existing resources in that zone. We also might remove the constrained zone entirely from the list of options for new customers. This means that occasionally, different customers will see a different number of Availability Zones in a particular Region. Both approaches aim to help customers avoid accidentally starting to build up their infrastructure in an Availability Zone where they might have less ability
6 0.57008684 739 high scalability-2009-11-09-10 NoSQL Systems Reviewed
7 0.5448848 1625 high scalability-2014-04-03-Leslie Lamport to Programmers: You're Doing it Wrong
9 0.48924911 730 high scalability-2009-10-28-GemFire: Solving the hardest problems in data management
11 0.44053835 1242 high scalability-2012-05-09-Cell Architectures
12 0.43216601 1337 high scalability-2012-10-10-Antirez: You Need to Think in Terms of Organizing Your Data for Fetching
13 0.43179375 1142 high scalability-2011-11-14-Using Gossip Protocols for Failure Detection, Monitoring, Messaging and Other Good Things
14 0.42927191 1018 high scalability-2011-04-07-Paper: A Co-Relational Model of Data for Large Shared Data Banks
15 0.42613256 283 high scalability-2008-03-18-Shared filesystem on EC2
16 0.42507654 1535 high scalability-2013-10-21-Google's Sanjay Ghemawat on What Made Google Google and Great Big Data Career Advice
17 0.4226746 867 high scalability-2010-07-27-YeSQL: An Overview of the Various Query Semantics in the Post Only-SQL World
18 0.42205435 561 high scalability-2009-04-08-N+1+caching is ok?
20 0.42086789 780 high scalability-2010-02-19-Twitter’s Plan to Analyze 100 Billion Tweets