high_scalability high_scalability-2010 high_scalability-2010-852 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Professor Lance Fortnow, in his blog post Drowning in Data , says complexity has taught him this lesson: When storage is expensive, it is cheaper to recompute what you've already computed. And that's the world we now live in: Storage is pretty cheap but data acquisition and computation are even cheaper. Jouni, one of the commenters, thinks the opposite is true: storage is cheap, but computation is expensive. When you are dealing with massive data, the size of the data set is very often determined by the amount of computing power available for a certain price . With such data, a linear-time algorithm takes O(1) seconds to finish, while a quadratic-time algorithm requires O(n) seconds. But as computing power increases exponentially over time, the quadratic algorithm gets exponentially slower . For me it's not a matter of which is true, both positions can be true, but what's interesting is to think that storage and computation are in some cases fungible. Your architecture can dec
sentIndex sentText sentNum sentScore
1 Professor Lance Fortnow, in his blog post Drowning in Data , says complexity has taught him this lesson: When storage is expensive, it is cheaper to recompute what you've already computed. [sent-1, score-0.831]
2 And that's the world we now live in: Storage is pretty cheap but data acquisition and computation are even cheaper. [sent-2, score-0.829]
3 Jouni, one of the commenters, thinks the opposite is true: storage is cheap, but computation is expensive. [sent-3, score-0.684]
4 When you are dealing with massive data, the size of the data set is very often determined by the amount of computing power available for a certain price . [sent-4, score-0.823]
5 With such data, a linear-time algorithm takes O(1) seconds to finish, while a quadratic-time algorithm requires O(n) seconds. [sent-5, score-0.71]
6 But as computing power increases exponentially over time, the quadratic algorithm gets exponentially slower . [sent-6, score-1.393]
7 For me it's not a matter of which is true, both positions can be true, but what's interesting is to think that storage and computation are in some cases fungible. [sent-7, score-0.724]
8 Your architecture can decide which tradeoffs to make based on the cost of resources and the nature of your data. [sent-8, score-0.387]
9 I'm not sure, but this seems like a new degree of freedom in the design space. [sent-9, score-0.327]
wordName wordTfidf (topN-words)
[('computation', 0.285), ('algorithm', 0.279), ('exponentially', 0.271), ('true', 0.244), ('drowning', 0.223), ('lance', 0.223), ('recompute', 0.209), ('quadratic', 0.209), ('commenters', 0.177), ('cheap', 0.168), ('professor', 0.166), ('acquisition', 0.161), ('taught', 0.156), ('storage', 0.146), ('determined', 0.145), ('finish', 0.145), ('freedom', 0.145), ('opposite', 0.142), ('positions', 0.137), ('tradeoffs', 0.135), ('lesson', 0.119), ('degree', 0.114), ('thinks', 0.111), ('computing', 0.105), ('decide', 0.104), ('dealing', 0.103), ('power', 0.099), ('cheaper', 0.094), ('increases', 0.088), ('nature', 0.088), ('seconds', 0.086), ('data', 0.083), ('certain', 0.081), ('complexity', 0.079), ('matter', 0.078), ('cases', 0.078), ('massive', 0.077), ('says', 0.075), ('blog', 0.072), ('expensive', 0.072), ('gets', 0.071), ('pretty', 0.07), ('seems', 0.068), ('space', 0.067), ('requires', 0.066), ('amount', 0.065), ('size', 0.065), ('sure', 0.064), ('live', 0.062), ('resources', 0.06)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 852 high scalability-2010-07-07-Strategy: Recompute Instead of Remember Big Data
Introduction: Professor Lance Fortnow, in his blog post Drowning in Data , says complexity has taught him this lesson: When storage is expensive, it is cheaper to recompute what you've already computed. And that's the world we now live in: Storage is pretty cheap but data acquisition and computation are even cheaper. Jouni, one of the commenters, thinks the opposite is true: storage is cheap, but computation is expensive. When you are dealing with massive data, the size of the data set is very often determined by the amount of computing power available for a certain price . With such data, a linear-time algorithm takes O(1) seconds to finish, while a quadratic-time algorithm requires O(n) seconds. But as computing power increases exponentially over time, the quadratic algorithm gets exponentially slower . For me it's not a matter of which is true, both positions can be true, but what's interesting is to think that storage and computation are in some cases fungible. Your architecture can dec
2 0.1188164 1571 high scalability-2014-01-02-xkcd: How Standards Proliferate:
Introduction: The great thing about standards is there are so many to choose from. What is it about human nature that makes this so recognizably true?
3 0.1082279 1305 high scalability-2012-08-16-Paper: A Provably Correct Scalable Concurrent Skip List
Introduction: In MemSQL Architecture we learned one of the core strategies MemSQL uses to achieve their need for speed is lock-free skip lists. Skip lists are used to efficiently handle range queries. Making the skip-lists lock-free helps eliminate contention and make writes fast. If this all sounds a little pie-in-the-sky then here's a very good paper on the subject that might help make it clearer: A Provably Correct Scalable Concurrent Skip List . From the abstract: We propose a new concurrent skip list algorithm distinguished by a combination of simplicity and scalability. The algorithm employs optimistic synchronization, searching without acquiring locks, followed by short lock-based validation before adding or removing nodes. It also logically removes an item before physically unlinking it. Unlike some other concurrent skip list algorithms, this algorithm preserves the skiplist properties at all times, which facilitates reasoning about its correctness. Experimental evidence shows that
4 0.1070085 761 high scalability-2010-01-17-Applications Become Black Boxes Using Markets to Scale and Control Costs
Introduction: This is an excerpt from my article Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud. We tend to think compute of resources as residing primarily in datacenters. Given the fast pace of innovation we will likely see compute resources become pervasive. Some will reside in datacenters, but compute resources can be anywhere, not just in the datacenter, we'll actually see the bulk of compute resources live outside of datacenters in the future. Given the diversity of compute resources it's reasonable to assume they won't be homogeneous or conform to a standard API. They will specialize by service. Programmers will have to use those specialized service interfaces to build applications that are adaptive enough to take advantage of whatever leverage they can find, whenever and wherever they can find it. Once found the application will have to reorganize on the fly to use whatever new resources it has found and let go of whatever resources it doe
5 0.10330343 340 high scalability-2008-06-06-Economies of Non-Scale
Introduction: Scalability forces us to think differently. What worked on a small scale doesn't always work on a large scale -- and costs are no different. If 90% of our application is free of contention, and only 10% is spent on a shared resources, we will need to grow our compute resources by a factor of 100 to scale by a factor of 10! Another important thing to note is that 10x, in this case, is the limit of our ability to scale, even if more resources are added. 1. The cost of non-linearly scalable applications grows exponentially with the demand for more scale. 2. Non-linearly scalable applications have an absolute limit of scalability. According to Amdhal's Law, with 10% contention, the maximum scaling limit is 10. With 40% contention, our maximum scaling limit is 2.5 - no matter how many hardware resources we will throw at the problem. This post discuss in further details how to measure the true cost of non linearly scalable systems and suggest a model for reducing that cost signifi
6 0.10239805 527 high scalability-2009-03-06-Cloud Programming Directly Feeds Cost Allocation Back into Software Design
9 0.091532007 575 high scalability-2009-04-21-Thread Pool Engine in MS CLR 4, and Work-Stealing scheduling algorithm
10 0.083807364 70 high scalability-2007-08-22-How many machines do you need to run your site?
11 0.083152153 445 high scalability-2008-11-14-Useful Cloud Computing Blogs
12 0.081480846 898 high scalability-2010-09-09-6 Scalability Lessons
13 0.080981314 1186 high scalability-2012-02-02-The Data-Scope Project - 6PB storage, 500GBytes-sec sequential IO, 20M IOPS, 130TFlops
14 0.080532387 839 high scalability-2010-06-09-Paper: Propagation Networks: A Flexible and Expressive Substrate for Computation
15 0.079101972 1255 high scalability-2012-06-01-Stuff The Internet Says On Scalability For June 1, 2012
16 0.078775749 692 high scalability-2009-09-01-Cheap storage: how backblaze takes matters in hand
17 0.076864645 1600 high scalability-2014-02-21-Stuff The Internet Says On Scalability For February 21st, 2014
18 0.076564036 823 high scalability-2010-05-05-How will memristors change everything?
19 0.076519951 11 high scalability-2007-07-15-Coyote Point Load Balancing Systems
20 0.074148357 1642 high scalability-2014-05-02-Stuff The Internet Says On Scalability For May 2nd, 2014
topicId topicWeight
[(0, 0.111), (1, 0.067), (2, 0.031), (3, 0.069), (4, -0.052), (5, 0.003), (6, -0.01), (7, 0.005), (8, -0.024), (9, 0.022), (10, 0.001), (11, -0.034), (12, -0.015), (13, 0.049), (14, 0.047), (15, 0.006), (16, -0.013), (17, -0.019), (18, -0.005), (19, 0.03), (20, -0.038), (21, 0.019), (22, 0.001), (23, -0.001), (24, -0.009), (25, -0.043), (26, 0.032), (27, 0.042), (28, -0.009), (29, 0.02), (30, -0.011), (31, 0.008), (32, 0.001), (33, 0.053), (34, -0.065), (35, -0.011), (36, 0.033), (37, 0.026), (38, 0.036), (39, -0.02), (40, -0.013), (41, -0.033), (42, 0.003), (43, 0.035), (44, 0.062), (45, 0.012), (46, -0.017), (47, 0.012), (48, -0.022), (49, -0.038)]
simIndex simValue blogId blogTitle
same-blog 1 0.93413085 852 high scalability-2010-07-07-Strategy: Recompute Instead of Remember Big Data
Introduction: Professor Lance Fortnow, in his blog post Drowning in Data , says complexity has taught him this lesson: When storage is expensive, it is cheaper to recompute what you've already computed. And that's the world we now live in: Storage is pretty cheap but data acquisition and computation are even cheaper. Jouni, one of the commenters, thinks the opposite is true: storage is cheap, but computation is expensive. When you are dealing with massive data, the size of the data set is very often determined by the amount of computing power available for a certain price . With such data, a linear-time algorithm takes O(1) seconds to finish, while a quadratic-time algorithm requires O(n) seconds. But as computing power increases exponentially over time, the quadratic algorithm gets exponentially slower . For me it's not a matter of which is true, both positions can be true, but what's interesting is to think that storage and computation are in some cases fungible. Your architecture can dec
2 0.72087586 901 high scalability-2010-09-16-How Can the Large Hadron Collider Withstand One Petabyte of Data a Second?
Introduction: Why is there something rather than nothing? That's the kind of question the Large Hadron Collider in CERN is hopefully poised to answer. And what is the output of this beautiful 17-mile long, 6 billion dollar wabi-sabish proton smashing machine? Data. Great heaping torrents of Grand Canyon sized data. 15 million gigabytes every year. That's 1000 times the information printed in books every year. It's so much data 10,000 scientists will use a grid of 80,000+ computers , in 300 computer centers , in 50 different countries just to help make sense of it all. How will all this data be collected, transported, stored, and analyzed? It turns out, using what amounts to sort of Internet of Particles instead of an Internet of Things. Two good articles have recently shed some electro-magnetic energy in the human visible spectrum on the IT aspects of the collider: LHC computing grid pushes petabytes of data, beats expectations by John Timmer on Ars Technica and an overview of the Br
3 0.70041376 823 high scalability-2010-05-05-How will memristors change everything?
Introduction: A non-random sample of my tech friends shows that not many have heard of memristors (though I do suspect vote tampering). I'd read a little about memristors in 2008 when the initial hubbub about the existence of memristors was raised. I, however, immediately filed them into that comforting conceptual bucket of potentially revolutionary technologies I didn't have to worry about because like most wondertech, nothing would ever come of it. Wrong. After watching Finding the Missing Memristor by R. Stanley Williams I've had to change my mind. Memristors have gone from "maybe never" to holy cow this could happen soon and it could change everything. Let's assume for the sake of dreaming memristors do prove out. How will we design systems when we have access to a new material that is two orders of magnitude more efficient from a power perspective than traditional transistor technologies, contains multiple petabits (1 petabit = 128TB) of persistent storage, and can be reconfigured t
4 0.69370091 839 high scalability-2010-06-09-Paper: Propagation Networks: A Flexible and Expressive Substrate for Computation
Introduction: Alexey Radul in his fascinating 174 page dissertation Propagation Networks: A Flexible and Expressive Substrate for Computation , offers to help us break free of the tyranny of linear time by arranging computation as a network of autonomous but interconnected machines . We can do this by organizing computation as a network of interconnected machines of some kind, each of which is free to run when it pleases, propagating information around the network as proves possible. The consequence of this freedom is that the structure of the aggregate does not impose an order of time. The abstract from his thesis is : In this dissertation I propose a shift in the foundations of computation. Modern programming systems are not expressive enough. The traditional image of a single computer that has global effects on a large memory is too restrictive. The propagation paradigm replaces this with computing by networks of local, independent, stateless machines interconnected with stateful storage
5 0.66783637 1600 high scalability-2014-02-21-Stuff The Internet Says On Scalability For February 21st, 2014
Introduction: Hey, it's HighScalability time (a particularly bountiful week): The Telephone Wires of Manhattan in 1887 ( full ) $19 billion: you know what it is; $46 billion : cost of Sochi Olympics; 400 gigabytes : data transmitted during the Sochi opening ceremony; 26.9 million : Stack Overflow community monthly visitors; 93 million : Candy Crush daily active users; 200-400 Gbps : The New Normal in DDoS Attacks Quotable Quotes: @brianacton : Facebook turned me down. It was a great opportunity to connect with some fantastic people. Looking forward to life's next adventure. @BenedictEvans : Flickr: $35m. Youtube: $1.65bn Whatsapp: $19bn. Mobile is big. And global. And the next computing platform. Paying attention? @taziden : On the Internet, worst cases will become common cases #fosdem #postfix Brian Hayes : Any quantum program must have a stovepipe architecture: Information flows straight through. So you think V
6 0.66375417 188 high scalability-2007-12-19-How can I learn to scale my project?
7 0.66186327 786 high scalability-2010-03-02-Using the Ambient Cloud as an Application Runtime
8 0.66023237 403 high scalability-2008-10-06-Paper: Scaling Genome Sequencing - Complete Genomics Technology Overview
9 0.65969378 463 high scalability-2008-12-09-Rules of Thumb in Data Engineering
10 0.65872997 1430 high scalability-2013-03-27-The Changing Face of Scale - The Downside of Scaling in the Contextual Age
11 0.65675735 387 high scalability-2008-09-22-Paper: On Delivering Embarrassingly Distributed Cloud Services
12 0.65428221 1186 high scalability-2012-02-02-The Data-Scope Project - 6PB storage, 500GBytes-sec sequential IO, 20M IOPS, 130TFlops
13 0.65366465 1035 high scalability-2011-05-05-Paper: A Study of Practical Deduplication
14 0.63934171 527 high scalability-2009-03-06-Cloud Programming Directly Feeds Cost Allocation Back into Software Design
15 0.63810682 917 high scalability-2010-10-08-4 Scalability Themes from Surgecon
16 0.63761944 777 high scalability-2010-02-15-Scaling Ambition at StackOverflow
17 0.63555038 992 high scalability-2011-02-18-Stuff The Internet Says On Scalability For February 18, 2011
18 0.63341033 526 high scalability-2009-03-05-Strategy: In Cloud Computing Systematically Drive Load to the CPU
19 0.63246769 69 high scalability-2007-08-21-What does the next generation data center look like?
20 0.63225967 1127 high scalability-2011-09-28-Pursue robust indefinite scalability with the Movable Feast Machine
topicId topicWeight
[(1, 0.171), (2, 0.174), (10, 0.022), (47, 0.439), (79, 0.074)]
simIndex simValue blogId blogTitle
1 0.89287519 57 high scalability-2007-08-03-Scaling IMAP and POP3
Introduction: Just thought I'd drop a brief suggestion to anyone building a large mail system. Our solution for scaling mail pickup was to develop a sharded architecture whereby accounts are spread across a cluster of servers, each with imap/pop3 capability. Then we use a cluster of reverse proxies (Perdition) speaking to the backend imap/pop3 servers . The benefit of this approach is you can use simply use round-robin or HA loadbalancing on the perdition servers that end users connect to (e.g. admins can easily move accounts around on the backend storage servers without affecting end users). Perdition manages routing users to the appropriate backend servers and has MySQL support. What we also liked about this approach was that it had no dependency on a distributed or networked filesystem, so less chance of corruption or data consistency issues. When an individual server reaches capacity, we just off load users to a less used server. If any server goes offline, it only affects the fraction of users
2 0.85637945 81 high scalability-2007-09-06-Scaling IMAP and POP3
Introduction: Another scalability strategy brought to you by Erik Osterman: Just thought I'd drop a brief suggestion to anyone building a large mail system. Our solution for scaling mail pickup was to develop a sharded architecture whereby accounts are spread across a cluster of servers, each with imap/pop3 capability. Then we use a cluster of reverse proxies (Perdition) speaking to the backend imap/pop3 servers . The benefit of this approach is you can use simply use round-robin or HA load balancing on the perdition servers that end users connect to (e.g. admins can easily move accounts around on the backend storage servers without affecting end users). Perdition manages routing users to the appropriate backend servers and has MySQL support. What we also liked about this approach was that it had no dependency on a distributed or networked file system, so less chance of corruption or data consistency issues. When an individual server reaches capacity, we just off load users to a less u
3 0.82664132 94 high scalability-2007-09-17-Blog: Adding Simplicity by Dan Pritchett
Introduction: Dan has genuine insight into building software and large scale scalable systems in particular. You'll always learn something interesting reading his blog. A Quick Hit of What's Inside Inverting the Reliability Stack , In Support of Non-Stop Software , Chaotic Perspectives , Latency Exists, Cope! , A Real eBay Architect Analyzes Part 3 , Avoiding Two Phase Commit, Redux Site: http://www.addsimplicity.com/
4 0.80283594 708 high scalability-2009-09-17-Infinispan narrows the gap between open source and commercial data caches
Introduction: Recently I attended a lecture presented by Manik Surtani , JBoss Cache & Infinispan project lead. The goal of the talk was to provide a technical overview of both products and outline Infinispan's road-map. Infinispan is the successor to the open-source JBoss Cache. JBoss Cache was originally targeted at simple web page caching and Infinispan builds on this to take it into the Cloud paradigm. Why did I attend? Well, over the past few years I have worked on projects that have used commercial distributed caching (aka data grid) technologies such as GemFire, GigaSpaces XAP or Oracle Coherence . These projects required more functionality than is currently provided by open-source solutions such as memcached or EHCache. Looking at the road-map for Infinispan, I was struck by its ambition – will it provide the functionality that I need? Read more at: http://bigdatamatters.com/bigdatamatters/2009/09/infinispan-vs-gigaspaces.html
5 0.79859895 760 high scalability-2010-01-13-10 Hot Scalability Links for January 13, 2010
Introduction: Has Amazon EC2 become over subscribed? by Alan Williamson. Systemic problems hit AWS as users experience problems across Amazon's infrastructure. It seems the strange attractor of a cloud may be the same as for a shared hosting service. Understanding Infrastructure 2.0 by James Urquhart. We need to take a systems view of our entire infrastructure, and build our automation around the end-to-end architecture of that system . Hey You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds . We show that it is possible to map the internal cloud infrastructure . Hadoop World: Building Data Intensive Apps with Hadoop and EC2 by Pete Skomoroch. Dives into detail about how he built TrendingTopics.org using Hadoop and EC2 . A Crash Course in Modern Hardware by Cliff Click. Yes, your mind will hurt after watching this. And no, you probably don't know what your microprocessor is doing anymore. EVE Scalability Explained by James Harrison. This pos
6 0.78670508 163 high scalability-2007-11-21-n-phase commit for FS writes, reads stay local
7 0.78030574 1326 high scalability-2012-09-20-How Vimeo Saves 50% on EC2 by Playing a Smarter Game
same-blog 8 0.7790181 852 high scalability-2010-07-07-Strategy: Recompute Instead of Remember Big Data
9 0.76901418 756 high scalability-2009-12-30-Terrastore - Scalable, elastic, consistent document store.
10 0.7542541 676 high scalability-2009-08-08-Yahoo!'s PNUTS Database: Too Hot, Too Cold or Just Right?
11 0.72069341 1166 high scalability-2011-12-30-Stuff The Internet Says On Scalability For December 30, 2011
12 0.6923582 144 high scalability-2007-11-07-What CDN would you recommend?
13 0.69144416 550 high scalability-2009-03-30-Ebay history and architecture
14 0.68646097 1062 high scalability-2011-06-15-101 Questions to Ask When Considering a NoSQL Database
15 0.68061906 1054 high scalability-2011-06-06-NoSQL Pain? Learn How to Read-write Scale Without a Complete Re-write
16 0.66358829 1530 high scalability-2013-10-11-Stuff The Internet Says On Scalability For October 11th, 2013
17 0.61911583 276 high scalability-2008-03-15-New Website Design Considerations
18 0.60772818 683 high scalability-2009-08-18-Hardware Architecture Example (geographical level mapping of servers)
19 0.58895892 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
20 0.57954466 696 high scalability-2009-09-07-Product: Infinispan - Open Source Data Grid