high_scalability high_scalability-2008 high_scalability-2008-311 knowledge-graph by maker-knowledge-mining

311 high scalability-2008-04-29-Strategy: Sample to Reduce Data Set


meta infos for this blog

Source: html

Introduction: Update: Arjen links to video Supporting Scalable Online Statistical Processing which shows "rather than doing complete aggregates, use statistical sampling to provide a reasonable estimate (unbiased guess) of the result." When you have a lot of data, sampling allows you to draw conclusions from a much smaller amount of data. That's why sampling is a scalability solution. If you don't have to process all your data to get the information you need then you've made the problem smaller and you'll need fewer resources and you'll get more timely results. Sampling is not useful when you need a complete list that matches a specific criteria. If you need to know the exact set of people who bought a car in the last week then sampling won't help. But, if you want to know many people bought a car then you could take a sample and then create estimate of the full data-set. The difference is you won't really know the exact car count. You'll have a confidence interval saying how confident


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Update: Arjen links to video Supporting Scalable Online Statistical Processing which shows "rather than doing complete aggregates, use statistical sampling to provide a reasonable estimate (unbiased guess) of the result. [sent-1, score-1.28]

2 " When you have a lot of data, sampling allows you to draw conclusions from a much smaller amount of data. [sent-2, score-0.95]

3 If you don't have to process all your data to get the information you need then you've made the problem smaller and you'll need fewer resources and you'll get more timely results. [sent-4, score-0.735]

4 Sampling is not useful when you need a complete list that matches a specific criteria. [sent-5, score-0.51]

5 If you need to know the exact set of people who bought a car in the last week then sampling won't help. [sent-6, score-1.815]

6 But, if you want to know many people bought a car then you could take a sample and then create estimate of the full data-set. [sent-7, score-1.263]

7 The difference is you won't really know the exact car count. [sent-8, score-0.833]

8 You'll have a confidence interval saying how confident you are in your estimate. [sent-9, score-0.466]

9 But if running a report takes an entire day because the data set is so large, then taking a sample is an excellent way to scale. [sent-11, score-0.645]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('sampling', 0.465), ('car', 0.344), ('exact', 0.31), ('statistical', 0.246), ('estimate', 0.237), ('bought', 0.219), ('sample', 0.215), ('conclusions', 0.155), ('aggregates', 0.148), ('smaller', 0.147), ('interval', 0.143), ('draw', 0.126), ('confidence', 0.124), ('timely', 0.122), ('matches', 0.121), ('complete', 0.119), ('confident', 0.117), ('wo', 0.115), ('guess', 0.104), ('know', 0.098), ('reasonable', 0.097), ('fewer', 0.091), ('need', 0.091), ('report', 0.091), ('week', 0.084), ('saying', 0.082), ('difference', 0.081), ('generally', 0.08), ('set', 0.077), ('shows', 0.068), ('people', 0.065), ('specific', 0.064), ('taking', 0.063), ('last', 0.062), ('online', 0.06), ('useful', 0.058), ('amount', 0.057), ('list', 0.057), ('excellent', 0.057), ('rather', 0.055), ('resources', 0.053), ('update', 0.052), ('entire', 0.05), ('provide', 0.048), ('takes', 0.048), ('get', 0.047), ('made', 0.046), ('full', 0.045), ('day', 0.044), ('create', 0.04)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 311 high scalability-2008-04-29-Strategy: Sample to Reduce Data Set

Introduction: Update: Arjen links to video Supporting Scalable Online Statistical Processing which shows "rather than doing complete aggregates, use statistical sampling to provide a reasonable estimate (unbiased guess) of the result." When you have a lot of data, sampling allows you to draw conclusions from a much smaller amount of data. That's why sampling is a scalability solution. If you don't have to process all your data to get the information you need then you've made the problem smaller and you'll need fewer resources and you'll get more timely results. Sampling is not useful when you need a complete list that matches a specific criteria. If you need to know the exact set of people who bought a car in the last week then sampling won't help. But, if you want to know many people bought a car then you could take a sample and then create estimate of the full data-set. The difference is you won't really know the exact car count. You'll have a confidence interval saying how confident

2 0.18584603 237 high scalability-2008-02-03-Product: Collectl - Performance Data Collector

Introduction: From their website : There are a number of times in which you find yourself needing performance data. These can include benchmarking, monitoring a system's general heath or trying to determine what your system was doing at some time in the past. Sometimes you just want to know what the system is doing right now. Depending on what you're doing, you often end up using different tools, each designed to for that specific situation. Features include: You are be able to run with non-integral sampling intervals. Collectl uses very little CPU. In fact it has been measured to use <0.1% when run as a daemon using the default sampling interval of 60 seconds for process and slab data and 10 seconds for everything else. Brief, verbose, and plot formats are supported. You can report aggregated performance numbers on many devices such as CPUs, Disks, interconnects such as Infiniband or Quadrics, Networks or even Lustre file systems. Collectl will align its sampling on integral sec

3 0.16345103 815 high scalability-2010-04-27-Paper: Dapper, Google's Large-Scale Distributed Systems Tracing Infrastructure

Introduction: Imagine a single search request coursing through Google's massive infrastructure. A single request can run across thousands of machines and involve hundreds of different subsystems. And oh by the way, you are processing more requests per second than any other system in the world. How do you debug such a system? How do you figure out where the problems are? How do you determine if programmers are coding correctly? How do you keep sensitive data secret and safe? How do ensure products don't use more resources than they are assigned? How do you store all the data? How do you make use of it? That's where Dapper comes in. Dapper is Google's tracing system and it was originally created to understand the system behaviour from a search request. Now Google's production clusters generate more than 1 terabyte of sampled trace data per day . So how does Dapper do what Dapper does? Dapper is described in an very well written and intricately detailed paper: Dapper, a Large-Scale Distributed Sy

4 0.12800348 1388 high scalability-2013-01-16-What if Cars Were Rented Like We Hire Programmers?

Introduction: Imagine if you will that car rental agencies rented cars like programmers are hired at many software companies... Agency : So sorry you had to wait in the reception area for an hour. Nobody knew you were coming to today. I finally found 8 people to interview before we can rent you a car. If we like you you may have to come in for another round of interviews tomorrow because our manager isn't in today. I didn't have a chance to read your application, so I'll just start with a question. What car do you drive today? Applicant : I drive a 2008 Subaru. Agency : That's a shame. We don't have a Subaru to rent you. Applicant : That's OK. Any car will do. Agency : No, we can only take on clients who know how to drive the cars we stock. We find it's safer that way. There are so many little differences between cars, we just don't want to take a chance. Applicant : I have a drivers license. I know how to drive. I've been driving all kinds of cars for 15 years, I am sure I can adapt.

5 0.12161886 657 high scalability-2009-07-16-Scaling Traffic: People Pod Pool of On Demand Self Driving Robotic Cars who Automatically Refuel from Cheap Solar

Introduction: Update 17 : Are Wireless Road Trains the Cure for Traffic Congestion? BY   ADDY DUGDALE . The concept of road trains--up to eight vehicles zooming down the road together--has long been considered a faster, safer, and greener way of traveling long distances by car Update 16: The first electric vehicle in the country powered completely by ultracapacitors . The minibus can be fully recharged in fifteen minutes, unlike battery vehicles, which typically takes hours to recharge. Update 15: How to Make UAVs Fully Autonomous . The Sense-and-Avoid system uses a four-megapixel camera on a pan tilt to detect obstacles from the ground. It puts red boxes around planes and birds, and blue boxes around movement that it determines is not an obstacle (e.g., dust on the lens). Update 14: ATNMBL is a concept vehicle for 2040 that represents the end of driving and an alternative approach to car design. Upon entering ATNMBL, you are presented with a simple question: "Where can I take you

6 0.10626826 9 high scalability-2007-07-15-Blog: Occam’s Razor by Avinash Kaushik

7 0.098922029 1545 high scalability-2013-11-08-Stuff The Internet Says On Scalability For November 8th, 2013

8 0.084717073 1166 high scalability-2011-12-30-Stuff The Internet Says On Scalability For December 30, 2011

9 0.084530294 583 high scalability-2009-04-26-Scale-up vs. Scale-out: A Case Study by IBM using Nutch-Lucene

10 0.081417881 224 high scalability-2008-01-27-Scalability vs Performance vs Availability vs Reliability.. Also scale up vs scale out ???

11 0.072118595 793 high scalability-2010-03-10-Saying Yes to NoSQL; Going Steady with Cassandra at Digg

12 0.070115507 1131 high scalability-2011-10-24-StackExchange Architecture Updates - Running Smoothly, Amazon 4x More Expensive

13 0.065822266 1413 high scalability-2013-02-27-42 Monster Problems that Attack as Loads Increase

14 0.065403447 1355 high scalability-2012-11-05-Gone Fishin': Building Super Scalable Systems: Blade Runner Meets Autonomic Computing In The Ambient Cloud

15 0.065000139 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud

16 0.064317286 1228 high scalability-2012-04-16-Instagram Architecture Update: What’s new with Instagram?

17 0.063120365 151 high scalability-2007-11-12-a8cjdbc - Database Clustering via JDBC

18 0.062878385 558 high scalability-2009-04-06-How do you monitor the performance of your cluster?

19 0.062587641 1231 high scalability-2012-04-20-Stuff The Internet Says On Scalability For April 20, 2012

20 0.061675552 284 high scalability-2008-03-19-RAD Lab is Creating a Datacenter Operating System


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.099), (1, 0.046), (2, -0.004), (3, 0.012), (4, 0.004), (5, -0.017), (6, -0.01), (7, 0.04), (8, 0.008), (9, -0.026), (10, 0.003), (11, 0.005), (12, -0.01), (13, 0.019), (14, 0.054), (15, -0.022), (16, 0.002), (17, -0.028), (18, -0.031), (19, 0.017), (20, 0.006), (21, -0.021), (22, 0.001), (23, 0.031), (24, -0.005), (25, 0.003), (26, -0.017), (27, 0.026), (28, -0.012), (29, 0.017), (30, 0.006), (31, 0.02), (32, 0.014), (33, 0.02), (34, -0.014), (35, -0.0), (36, 0.022), (37, -0.015), (38, 0.002), (39, -0.034), (40, -0.017), (41, 0.001), (42, 0.018), (43, 0.012), (44, 0.012), (45, 0.026), (46, -0.032), (47, -0.025), (48, 0.001), (49, -0.013)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94254947 311 high scalability-2008-04-29-Strategy: Sample to Reduce Data Set

Introduction: Update: Arjen links to video Supporting Scalable Online Statistical Processing which shows "rather than doing complete aggregates, use statistical sampling to provide a reasonable estimate (unbiased guess) of the result." When you have a lot of data, sampling allows you to draw conclusions from a much smaller amount of data. That's why sampling is a scalability solution. If you don't have to process all your data to get the information you need then you've made the problem smaller and you'll need fewer resources and you'll get more timely results. Sampling is not useful when you need a complete list that matches a specific criteria. If you need to know the exact set of people who bought a car in the last week then sampling won't help. But, if you want to know many people bought a car then you could take a sample and then create estimate of the full data-set. The difference is you won't really know the exact car count. You'll have a confidence interval saying how confident

2 0.83703727 390 high scalability-2008-09-23-Scaling your cookie recipes

Introduction: This article on scaling cookie baking recipes showed up in one my key word alerts. Lots of weird things show up in alerts, but I really like cookies and the parallels were just so delicious. Scaling in the cookie baking world is: the process of multiplying your recipe by many times to produce much more dough for many more cookies. It’s the difference between making enough dough in one batch to make two dozen cookies, or 2000 cookies. Hey, pretty close to the website notion. Yet as any good cook knows any scaled up recipe must be tweaked a little as things change at scale. Let's see what else we're supposed to do (quoted from the article): Be Patient - When making large batches of cookies, the most important thing that you have to remember is not to rush. Use Fresh Ingredients - This is always an important thing to keep in mind. Don’t use as much leavening - When you’re making a large batch of cookie dough, remember to scale down the amount of baking powder that you

3 0.81045437 919 high scalability-2010-10-14-I, Cloud

Introduction: Every time a technological innovation has spurred automation – since the time of Henry Ford right up to a minute ago – someone has claimed that machines will displace human beings. But the rainbow and unicorn dream attributed to business stakeholders everywhere, i.e. the elimination of IT, is just that – a dream. It isn’t realistic and in fact it’s downright silly to think that systems that only a few years ago were unable to automatically scale up and scale down will suddenly be able to perform the complex analysis required of IT to keep the business running. The rare reports of the elimination of IT staff due to cloud computing and automation are highlighted in the news because they evoke visceral reactions in technologists everywhere and, to be honest, they get the click counts rising. But the jury remains out on this one and in fact many postulate that it is not a reduction in staff that will occur, but a transformation of staff, which may eliminate some old timey positions

4 0.79706532 1566 high scalability-2013-12-18-How to get started with sizing and capacity planning, assuming you don't know the software behavior?

Introduction: Here's a common situation and question from the mechanical-sympathy Google group by Avinash Agrawal on the black art of capacity planning: How to get started with sizing and capacity planning, assuming we don't know the software behavior and its completely new product to deal with? Gil Tene , Vice President of Technology and CTO & Co-Founder, wrote a very  understandable and useful answer  that is worth highlighting: Start with requirements. I see way too many "capacity planning" exercises that go off spending weeks measuring some irrelevant metrics about a system (like how many widgets per hour can this thing do) without knowing what they actually need it to do. There are two key sets of metrics to state here: the "how much" set and the "how bad" set: In the "How Much" part, you need to establish, based on expected business needs, Numbers for things (like connections, users, streams, transactions or messages per second) that you expect to interact with at the peak t

5 0.79343957 1410 high scalability-2013-02-20-Smart Companies Fail Because they Do Everything Right - Staying Alive to Scale

Introduction: Wired has a wonderful interview  with  Clayton Christensen , author of the tech ninja's bible,  Innovator's Dilemma . Innovation is the name of the game in Silicon Valley and if you want to understand the rules of the game this article is a quick and clear way of learning. Everything is simply explained with compelling examples by the man himself. Just as every empire has fallen, every organization is open to disruption. It's the human condition to become comfortable and discount potential dangers. It takes a great deal of mindfulness to outwit and outlast the human condition. If you want to be the disruptor and avoid being the disruptee, this is good stuff. He also talks about his new book, The Capitalist's Dilemma , which addresses this puzzle: if corporations are doing so well why are individuals doing so bad? If someone can help you see a deep meaningful pattern in life then they haven't brought you a fish, they've taught you how to fish. That's what Christensen does. Here'

6 0.78141958 1500 high scalability-2013-08-12-100 Curse Free Lessons from Gordon Ramsay on Building Great Software

7 0.77676004 863 high scalability-2010-07-22-How can we spark the movement of research out of the Ivory Tower and into production?

8 0.77341932 347 high scalability-2008-07-07-Five Ways to Stop Framework Fixation from Crashing Your Scaling Strategy

9 0.77284104 1388 high scalability-2013-01-16-What if Cars Were Rented Like We Hire Programmers?

10 0.77130461 1225 high scalability-2012-04-09-Why My Slime Mold is Better than Your Hadoop Cluster

11 0.77098358 657 high scalability-2009-07-16-Scaling Traffic: People Pod Pool of On Demand Self Driving Robotic Cars who Automatically Refuel from Cheap Solar

12 0.76820403 643 high scalability-2009-06-29-How to Succeed at Capacity Planning Without Really Trying : An Interview with Flickr's John Allspaw on His New Book

13 0.76519448 917 high scalability-2010-10-08-4 Scalability Themes from Surgecon

14 0.75867134 1204 high scalability-2012-03-06-Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

15 0.7567572 1503 high scalability-2013-08-19-What can the Amazing Race to the South Pole Teach us About Startups?

16 0.75210851 719 high scalability-2009-10-09-Have you collectl'd yet? If not, maybe collectl-utils will make it easier to do so

17 0.74862915 1506 high scalability-2013-08-23-Stuff The Internet Says On Scalability For August 23, 2013

18 0.74187487 1492 high scalability-2013-07-17-How do you create a 100th Monkey software development culture?

19 0.73736918 1366 high scalability-2012-12-03-Resiliency is the New Normal - A Deep Look at What It Means and How to Build It

20 0.73341852 1458 high scalability-2013-05-15-Lesson from Airbnb: Give Yourself Permission to Experiment with Non-scalable Changes


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.115), (2, 0.193), (37, 0.269), (61, 0.182), (85, 0.025), (94, 0.085)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.86124814 1033 high scalability-2011-05-02-The Updated Big List of Articles on the Amazon Outage

Introduction: Since The Big List Of Articles On The Amazon Outage  was published we've a had few updates that people might not have seen. Amazon of course released their  Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region . Netlix shared their Lessons Learned from the AWS Outage  as did Heroku ( How Heroku Survived the Amazon Outage ), Smug Mug ( How SmugMug survived the Amazonpocalypse ), and SimpleGeo ( How SimpleGeo Stayed Up During the AWS Downtime ).  The curious thing from my perspective is the general lack of response to Amazon's explanation. I expected more discussion. There's been almost none that I've seen. My guess is very few people understand what Amazon was talking about enough to comment whereas almost everyone feels qualified to talk about the event itself. Lesson for crisis handlers : deep dive post-mortems that are timely, long, honestish, and highly technical are the most effective means of staunching the downward spiral of media attention.

same-blog 2 0.85696125 311 high scalability-2008-04-29-Strategy: Sample to Reduce Data Set

Introduction: Update: Arjen links to video Supporting Scalable Online Statistical Processing which shows "rather than doing complete aggregates, use statistical sampling to provide a reasonable estimate (unbiased guess) of the result." When you have a lot of data, sampling allows you to draw conclusions from a much smaller amount of data. That's why sampling is a scalability solution. If you don't have to process all your data to get the information you need then you've made the problem smaller and you'll need fewer resources and you'll get more timely results. Sampling is not useful when you need a complete list that matches a specific criteria. If you need to know the exact set of people who bought a car in the last week then sampling won't help. But, if you want to know many people bought a car then you could take a sample and then create estimate of the full data-set. The difference is you won't really know the exact car count. You'll have a confidence interval saying how confident

3 0.81830633 314 high scalability-2008-05-03-Product: nginx

Introduction: Update 6 : nginx_http_push_module . Turn nginx into a long-polling message queuing HTTP push server. Update 5 : In Load Balancer Update Barry describes how WordPress.com moved from Pound to Nginx and are now "regularly serving about 8-9k requests/second and about 1.2Gbit/sec through a few Nginx instances and have plenty of room to grow!". Update 4 : Nginx better than Pound for load balancing. Pound spikes at 80% CPU, Nginx uses 3% and is easier to understand and better documented. Update 3 : igvita.com combines two cool tools together for better performance in Nginx and Memcached, a 400% boost! . Update 2 : Software Project on Installing Nginx Web Server w/ PHP and SSL . Breaking away from mother Apache can be a scary proposition and this kind of getting started article really helps easy the separation. Update: Slicehost has some nice tutorials on setting up Nginx . From their website: Nginx ("engine x") is a high-performance HTTP server and reverse proxy, as wel

4 0.81541198 1029 high scalability-2011-04-25-The Big List of Articles on the Amazon Outage

Introduction: Please see The Updated Big List Of Articles On The Amazon Outage  for a new improved list. So many great articles have been written on the Amazon Outage. Some aim at being helpful, some chastise developers for being so stupid, some chastise Amazon for being so incompetent, some talk about the pain they and their companies have experienced, and some even predict the downfall of the cloud. Still others say we have seen a sea change in future of the cloud, a prediction that's hard to disagree with, though the shape of the change remains...cloudy. I'll try to keep this list update as more information comes out. There will be a lot for developers to consider going forward. If there's a resource you think should be added, just let me know. Amazon's Explanation of What Happened Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region Hackers News thread on AWS Service Disruption Post Mortem   Quite Funny Commentary on the Summary Experiences f

5 0.80539423 891 high scalability-2010-09-01-Scale-out vs Scale-up

Introduction: In this post I'll cover the difference between multi-core concurrency that is often referred to as Scale-Up and distributed computing that is often referred to as Scale-Out mode.  more ..

6 0.78210688 965 high scalability-2010-12-29-Pinboard.in Architecture - Pay to Play to Keep a System Small

7 0.74777144 1133 high scalability-2011-10-27-Strategy: Survive a Comet Strike in the East With Reserved Instances in the West

8 0.74719429 1379 high scalability-2012-12-31-Designing for Resiliency will be so 2013

9 0.7326923 113 high scalability-2007-10-07-Paper: Architecture of a Highly Scalable NIO-Based Server

10 0.73199326 103 high scalability-2007-09-28-Kosmos File System (KFS) is a New High End Google File System Option

11 0.72735357 329 high scalability-2008-05-27-Secure Remote Administration for Large-Scale Networks

12 0.71398836 775 high scalability-2010-02-10-ElasticSearch - Open Source, Distributed, RESTful Search Engine

13 0.70688695 150 high scalability-2007-11-12-Slashdot Architecture - How the Old Man of the Internet Learned to Scale

14 0.70563489 337 high scalability-2008-05-31-memcached and Storage of Friend list

15 0.70415682 501 high scalability-2009-01-25-Where do I start?

16 0.70364946 1360 high scalability-2012-11-19-Gone Fishin': Tumblr Architecture - 15 Billion Page Views A Month And Harder To Scale Than Twitter

17 0.70364916 1191 high scalability-2012-02-13-Tumblr Architecture - 15 Billion Page Views a Month and Harder to Scale than Twitter

18 0.70339966 1461 high scalability-2013-05-20-The Tumblr Architecture Yahoo Bought for a Cool Billion Dollars

19 0.70338058 1031 high scalability-2011-04-28-PaaS on OpenStack - Run Applications on Any Cloud, Any Time Using Any Thing

20 0.70296955 1411 high scalability-2013-02-22-Stuff The Internet Says On Scalability For February 22, 2013