high_scalability high_scalability-2009 high_scalability-2009-622 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: We need to measure the number of queries-per-second our site gets for capacity planning purposes. Obviously, we need to provision the site based on the peak QPS, not average QPS. There will always be some spikes in traffic, though, where for one particular second we get a really huge number of queries. It's ok if site performance slightly degrades during that time. So what I'd really like to do is estimate the *near* peak QPS based on average or median QPS. Near peak might be defined as the QPS that I get at the 95th percentile of the busiest seconds during the day. My guess is that this is similar to what ISPs do when they measure your bandwidth usage and then charge for usage over the 95th percentile. What we've done is analyzed our logs, counted the queries executed during each second during the day, sorted from the busiest seconds to the least busy ones, and graphed it. What you get is a histogram that steeply declines and flattens out near zero. Does anyone know if there is a
sentIndex sentText sentNum sentScore
1 We need to measure the number of queries-per-second our site gets for capacity planning purposes. [sent-1, score-0.388]
2 Obviously, we need to provision the site based on the peak QPS, not average QPS. [sent-2, score-0.596]
3 There will always be some spikes in traffic, though, where for one particular second we get a really huge number of queries. [sent-3, score-0.374]
4 It's ok if site performance slightly degrades during that time. [sent-4, score-0.365]
5 So what I'd really like to do is estimate the *near* peak QPS based on average or median QPS. [sent-5, score-0.826]
6 Near peak might be defined as the QPS that I get at the 95th percentile of the busiest seconds during the day. [sent-6, score-0.97]
7 My guess is that this is similar to what ISPs do when they measure your bandwidth usage and then charge for usage over the 95th percentile. [sent-7, score-0.47]
8 What we've done is analyzed our logs, counted the queries executed during each second during the day, sorted from the busiest seconds to the least busy ones, and graphed it. [sent-8, score-1.059]
9 What you get is a histogram that steeply declines and flattens out near zero. [sent-9, score-0.693]
10 Does anyone know if there is a mathematical formula that describes this distribution? [sent-10, score-0.341]
11 I'd like to say with some certainty that the second at the 95th percentile will get X times the number of average or median number of QPS. [sent-11, score-1.353]
12 (Experimentally, our data shows, over a six week period, an avg QPS of 7. [sent-12, score-0.261]
13 But I want a better theoretical basis for claiming that we need to be able to handle 4x the average amount of traffic. [sent-14, score-0.523]
wordName wordTfidf (topN-words)
[('qps', 0.409), ('median', 0.319), ('percentile', 0.288), ('busiest', 0.219), ('average', 0.214), ('near', 0.206), ('peak', 0.2), ('flattens', 0.154), ('graphed', 0.154), ('declines', 0.138), ('experimentally', 0.138), ('claiming', 0.129), ('formula', 0.125), ('certainty', 0.125), ('counted', 0.122), ('histogram', 0.122), ('seconds', 0.119), ('avg', 0.119), ('measure', 0.116), ('number', 0.113), ('isps', 0.113), ('degrades', 0.109), ('theoretical', 0.109), ('second', 0.108), ('site', 0.096), ('usage', 0.096), ('estimate', 0.093), ('sorted', 0.091), ('mathematical', 0.088), ('provision', 0.086), ('analyzed', 0.086), ('guess', 0.082), ('slightly', 0.082), ('executed', 0.081), ('charge', 0.08), ('spikes', 0.08), ('busy', 0.079), ('ok', 0.078), ('six', 0.076), ('get', 0.073), ('describes', 0.073), ('basis', 0.071), ('defined', 0.071), ('period', 0.07), ('ones', 0.067), ('logs', 0.067), ('week', 0.066), ('planning', 0.063), ('distribution', 0.056), ('anyone', 0.055)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999982 622 high scalability-2009-06-08-Distribution of queries per second
Introduction: We need to measure the number of queries-per-second our site gets for capacity planning purposes. Obviously, we need to provision the site based on the peak QPS, not average QPS. There will always be some spikes in traffic, though, where for one particular second we get a really huge number of queries. It's ok if site performance slightly degrades during that time. So what I'd really like to do is estimate the *near* peak QPS based on average or median QPS. Near peak might be defined as the QPS that I get at the 95th percentile of the busiest seconds during the day. My guess is that this is similar to what ISPs do when they measure your bandwidth usage and then charge for usage over the 95th percentile. What we've done is analyzed our logs, counted the queries executed during each second during the day, sorted from the busiest seconds to the least busy ones, and graphed it. What you get is a histogram that steeply declines and flattens out near zero. Does anyone know if there is a
2 0.26850232 1250 high scalability-2012-05-23-Averages, web performance data, and how your analytics product is lying to you
Introduction: This guest post is written by Josh Fraser , co-founder and CEO of Torbit . Torbit creates tools for measuring, analyzing and optimizing web performance. Â Did you know that 5% of the pageviews on Walmart.com take over 20 seconds to load? Walmart discovered this recently after adding real user measurement (RUM) to analyze their web performance for every single visitor to their site. Walmart used JavaScript to measure their median load time as well as key metrics like their 95th percentile. While 20 seconds is a long time to wait for a website to load, the Walmart story is actually not that uncommon. Remember, this is the worst 5% of their pageviews, not the typical experience. Walmart's median load time was reported at around 4 seconds, meaning half of their visitors loaded Walmart.com faster than 4 seconds and the other half took longer than 4 seconds to load. Using this knowledge, Walmart was prepared to act. By reducing page load times by even one second, Walmart found that
Introduction: Update 2: Velocity 09: John Allspaw, 10+ Deploys Per Day: Dev and Ops Cooperation at Flickr . Insightful talk. Some highlights: Change is good if you can build tools and culture to lower the risk of change. Operations and developers need to become of one mind and respect each other. An automated infrastructure is the one tool you need most. Common source control. One step build. One step deploy. Don't be a pussy, deploy. Always ship trunk. Feature flags - don't branch code, make features runtime configurable in code. Dark launch - release data paths early without UI component. Shared metrics. Adaptive feedback to prioritize important features. IRC for communication for human context. Best solutions occur when dev and op work together and trust each other. Trust is earned by helping each other solve their problems. Look at what new features imply for operations, what can go wrong, and how to recover. Provide knobs and levers to help operations. Devs should have access to production
4 0.090590149 269 high scalability-2008-03-08-Audiogalaxy.com Architecture
Introduction: Update 3: Always Refer to Your V1 As a Prototype . You really do have to plan to throw one away. Update 2: Lessons Learned Scaling the Audiogalaxy Search Engine . Things he should have done and fun things he couldn’t justify doing. Update: Design details of Audiogalaxy.com’s high performance MySQL search engine . At peak times, the search engine needed to handle 1500-2000 searches every second against a MySQL database with about 200 million rows. Search was one of most interesting problems at Audiogalaxy. It was one of the core functions of the site, and somewhere between 50 to 70 million searches were performed every day. At peak times, the search engine needed to handle 1500-2000 searches every second against a MySQL database with about 200 million rows.
5 0.08596459 1038 high scalability-2011-05-11-Troubleshooting response time problems – why you cannot trust your system metrics
Introduction: Production Monitoring is about ensuring the stability and health of our system, that also includes the application. A lot of times we encounter production systems that concentrate on System Monitoring, under the assumption that a stable system leads to stable and healthy applications. So let’s see what System Monitoring can tell us about our Application . Let’s take a very simple two tier Web Application: This is a simple multi-tier eCommerce solution. Users are concerned about bad performance when they do a search. Let's see what we can find out about it if performance is not satisfactory. We start by looking at a couple of simple metrics. CPU Utilization The best known operating system metric is CPU utilization, but it is also the most misunderstood. This metric tells us how much time the CPU spent executing code in the last interval and how much more it could execute theoretically. Like all other utilization measures it tells us something about the capaci
6 0.084534608 533 high scalability-2009-03-11-The Implications of Punctuated Scalabilium for Website Architecture
7 0.077923886 1207 high scalability-2012-03-12-Google: Taming the Long Latency Tail - When More Machines Equals Worse Results
8 0.076729856 934 high scalability-2010-11-04-Facebook at 13 Million Queries Per Second Recommends: Minimize Request Variance
9 0.07435932 232 high scalability-2008-01-29-When things aren't scalable
10 0.074205913 1004 high scalability-2011-03-14-Twitter by the Numbers - 460,000 New Accounts and 140 Million Tweets Per Day
11 0.072786674 152 high scalability-2007-11-13-Flickr Architecture
12 0.072567202 35 high scalability-2007-07-28-Product: FastStats Log Analyzer
13 0.069548965 1266 high scalability-2012-06-18-Google on Latency Tolerant Systems: Making a Predictable Whole Out of Unpredictable Parts
14 0.069231935 1335 high scalability-2012-10-08-How UltraDNS Handles Hundreds of Thousands of Zones and Tens of Millions of Records
15 0.068820409 1566 high scalability-2013-12-18-How to get started with sizing and capacity planning, assuming you don't know the software behavior?
16 0.066872172 1105 high scalability-2011-08-25-The Cloud and The Consumer: The Impact on Bandwidth and Broadband
17 0.066018716 602 high scalability-2009-05-17-Scaling Django Web Apps by Mike Malone
18 0.066012263 614 high scalability-2009-06-01-Guess How Many Users it Takes to Kill Your Site?
19 0.06575457 1413 high scalability-2013-02-27-42 Monster Problems that Attack as Loads Increase
20 0.065720573 106 high scalability-2007-10-02-Secrets to Fotolog's Scaling Success
topicId topicWeight
[(0, 0.09), (1, 0.058), (2, -0.026), (3, -0.045), (4, -0.006), (5, -0.035), (6, 0.007), (7, 0.054), (8, 0.017), (9, -0.001), (10, -0.012), (11, -0.019), (12, 0.019), (13, 0.022), (14, 0.037), (15, -0.006), (16, 0.001), (17, -0.021), (18, -0.032), (19, 0.01), (20, 0.004), (21, -0.004), (22, 0.002), (23, 0.004), (24, -0.014), (25, -0.039), (26, -0.063), (27, 0.023), (28, 0.061), (29, -0.006), (30, 0.056), (31, -0.005), (32, -0.011), (33, 0.018), (34, -0.007), (35, 0.065), (36, 0.022), (37, -0.003), (38, -0.063), (39, -0.008), (40, -0.024), (41, -0.008), (42, 0.06), (43, -0.019), (44, 0.019), (45, -0.011), (46, 0.015), (47, 0.044), (48, 0.044), (49, -0.0)]
simIndex simValue blogId blogTitle
same-blog 1 0.96924895 622 high scalability-2009-06-08-Distribution of queries per second
Introduction: We need to measure the number of queries-per-second our site gets for capacity planning purposes. Obviously, we need to provision the site based on the peak QPS, not average QPS. There will always be some spikes in traffic, though, where for one particular second we get a really huge number of queries. It's ok if site performance slightly degrades during that time. So what I'd really like to do is estimate the *near* peak QPS based on average or median QPS. Near peak might be defined as the QPS that I get at the 95th percentile of the busiest seconds during the day. My guess is that this is similar to what ISPs do when they measure your bandwidth usage and then charge for usage over the 95th percentile. What we've done is analyzed our logs, counted the queries executed during each second during the day, sorted from the busiest seconds to the least busy ones, and graphed it. What you get is a histogram that steeply declines and flattens out near zero. Does anyone know if there is a
2 0.73231703 934 high scalability-2010-11-04-Facebook at 13 Million Queries Per Second Recommends: Minimize Request Variance
Introduction: Facebook gave a MySQL Tech Talk where they talked about many things MySQL, but one of the more subtle and interesting points was their focus on controlling the variance of request response times and not just worrying about maximizing queries per second. But first the scalability porn. Facebook's OLTP performance numbers were as usual, quite dramatic: Query response times: 4ms reads, 5ms writes. Rows read per second: 450M peak Network bytes per second: 38GB peak Queries per second: 13M peak Rows changed per second: 3.5M peak InnoDB disk ops per second: 5.2M peak Some thoughts on creating quality, not quantity : They don't care about average response times, instead, they want to minimize variance. Every click must be responded to quickly. The quality of service for each request matters. It's OK if a query is slow as long as it is always slow. They don't try to get the highest queries per second out of each machine. What is important is that the edge case
3 0.71191257 614 high scalability-2009-06-01-Guess How Many Users it Takes to Kill Your Site?
Introduction: Update: Here's the first result . Good response time until 400 users. At 1,340 users the response time was 6 seconds. And at 2000 users the site was effectively did. An interesting point was that errors that could harm a site's reputation started at 1000 users. Cheers to the company that had the guts to give this a try. That which doesn't kill your site makes it stronger. Or at least that's the capacity planning strategy John Allspaw recommends (not really, but I'm trying to make a point here) in The Art of Capacity Planning : Using production traffic to define your resources ceilings in a controlled setting allows you to see firsthand what would happen when you run out of capacity in a particular resource. Of course I'm not suggesting that you run your site into the ground, but better to know what your real (not simulated) loads are while you're watching, than find out the hard way. In addition, a lot of unexpected systemic things can happen when load increases in a particular
4 0.67685115 1250 high scalability-2012-05-23-Averages, web performance data, and how your analytics product is lying to you
Introduction: This guest post is written by Josh Fraser , co-founder and CEO of Torbit . Torbit creates tools for measuring, analyzing and optimizing web performance. Â Did you know that 5% of the pageviews on Walmart.com take over 20 seconds to load? Walmart discovered this recently after adding real user measurement (RUM) to analyze their web performance for every single visitor to their site. Walmart used JavaScript to measure their median load time as well as key metrics like their 95th percentile. While 20 seconds is a long time to wait for a website to load, the Walmart story is actually not that uncommon. Remember, this is the worst 5% of their pageviews, not the typical experience. Walmart's median load time was reported at around 4 seconds, meaning half of their visitors loaded Walmart.com faster than 4 seconds and the other half took longer than 4 seconds to load. Using this knowledge, Walmart was prepared to act. By reducing page load times by even one second, Walmart found that
Introduction: Twitter has taken some hits lately, but they are still pumping out the tweets and adding massive numbers of new users. Here are some numbers they just published , hot off the analytics engine: Tweets 3 years, 2 months and 1 day. The time it took from the first Tweet to the billionth Tweet. 1 week. The time it now takes for users to send a billion Tweets. 50 million. The average number of Tweets people sent per day, one year ago. 140 million. The average number of Tweets people sent per day, in the last month. 177 million. Tweets sent on March 11, 2011. 456. Tweets per second (TPS) when Michael Jackson died on June 25, 2009 (a record at that time). 6,939. Current TPS record, set 4 seconds after midnight in Japan on New Year’s Day. Accounts 572,000. Number of new accounts created on March 12, 2011. 460,000. Average number of new accounts per day over the last month. 182%. Increase in number of mobile users over the past year. Employee
6 0.63496971 611 high scalability-2009-05-31-Need help on Site loading & database optimization - URGENT
7 0.6158275 711 high scalability-2009-09-22-How Ravelry Scales to 10 Million Requests Using Rails
8 0.61195588 965 high scalability-2010-12-29-Pinboard.in Architecture - Pay to Play to Keep a System Small
9 0.60078776 189 high scalability-2007-12-21-Strategy: Limit Result Sets
10 0.59741002 304 high scalability-2008-04-19-How to build a real-time analytics system?
12 0.59454346 593 high scalability-2009-05-06-Guinness Book of World Records Anyone?
13 0.5881322 533 high scalability-2009-03-11-The Implications of Punctuated Scalabilium for Website Architecture
14 0.58261842 714 high scalability-2009-10-02-HighScalability has Moved to Squarespace.com!
15 0.5797745 1453 high scalability-2013-05-07-Not Invented Here: A Comical Series on Scalability
16 0.57808799 249 high scalability-2008-02-16-S3 Failed Because of Authentication Overload
17 0.57757002 425 high scalability-2008-10-22-Scalability Best Practices: Lessons from eBay
18 0.57593417 788 high scalability-2010-03-04-How MySpace Tested Their Live Site with 1 Million Concurrent Users
19 0.56439066 159 high scalability-2007-11-18-Reverse Proxy
20 0.56079084 48 high scalability-2007-07-30-What is Mashery?
topicId topicWeight
[(1, 0.101), (2, 0.283), (10, 0.097), (66, 0.281), (79, 0.05), (85, 0.037), (94, 0.036)]
simIndex simValue blogId blogTitle
same-blog 1 0.8680073 622 high scalability-2009-06-08-Distribution of queries per second
Introduction: We need to measure the number of queries-per-second our site gets for capacity planning purposes. Obviously, we need to provision the site based on the peak QPS, not average QPS. There will always be some spikes in traffic, though, where for one particular second we get a really huge number of queries. It's ok if site performance slightly degrades during that time. So what I'd really like to do is estimate the *near* peak QPS based on average or median QPS. Near peak might be defined as the QPS that I get at the 95th percentile of the busiest seconds during the day. My guess is that this is similar to what ISPs do when they measure your bandwidth usage and then charge for usage over the 95th percentile. What we've done is analyzed our logs, counted the queries executed during each second during the day, sorted from the busiest seconds to the least busy ones, and graphed it. What you get is a histogram that steeply declines and flattens out near zero. Does anyone know if there is a
2 0.82994843 185 high scalability-2007-12-13-Is premature scalation a real disease?
Introduction: Update 3: InfoQ's Big Architecture Up Front - A Case of Premature Scalaculation? twines several different threads on the topic together into a fine noose. Update 2: Kevin says the biggest problems he sees with startups is they need to scale their backend (no, the other one). Update: My bad. It's hard to sell scalability so just forget it. The premise of Startups and The Problem Of Premature Scalaculation and Don’t scale: 99.999% uptime is for Wal-Mart is that you shouldn't spend precious limited resources worrying about scaling before you've first implemented the functionality that will make you successful enough to have scaling problems in the first place. It's kind of an embodied life force model of system creation. Energy is scarce so any parasites siphoning off energy must be hunted down and destroyed so the body has its best chance of survival. Is this really how it works? If I ever believed this I certainly don't believe it anymore. The world has c
3 0.81235141 375 high scalability-2008-09-01-A Scalability checklist?
Introduction: Hi everyone, I'm researching on Scalability for a college paper, and found this site great, but it has too many tips, articles and the like, but I can't see a hierarchical organization of subjects, I would need something like a checklist of things or fields, or technologies to take into account when assesing scalability. So far I've identified these: - Hardware scalability: - scale out - scale up - Cache What types of cache are there? app-level, os-level, network-level, I/O-level? - Load Balancing - DB Clustering Am I missing something important? (I'm sure I am) I don't expect you to give a lecture here, but maybe point some things out, give me some useful links... Thanks!
4 0.79938859 130 high scalability-2007-10-24-Scaling Operations Saves Money and Scales Faster
Introduction: Jesse Robbins at O'Reily Radar has a nice post on how spending a little up front time on figuring out how to scale your operations process saves money on ops people and allows you to save time adding and upgrading servers. Adding, monitoring, and upgrading servers can get so incredibly screwed up that a herd of squirrels has to work overtime just to put out a release. Or it can be one button simple from your automated build system out to your servers. This is one area where "do the simplest thing that could possibly work" is a dumb idea and Jesse does a good job capturing the advantages of doing it right.
5 0.77685946 973 high scalability-2011-01-14-Stuff The Internet Says On Scalability For January 14, 2011
Introduction: Submitted for your reading pleasure... On the new year Twitter set a record with 6,939 Tweets Per Second (TPS). Cool video visualizing New Year's Eve Tweet data across the world. Marko Rodriguez in Memoirs of a Graph Addict: Despair to Redemption tells a stirring tale of how graph programming saved the world from certain destruction by realizing Aritstotle's dream of an eudaimonia-driven society. Could a relational database do that? The tools of the revolution can be found at tinkerprop.com , which describes a databases agnostic stack for working with property graphs, they include Blueprints - a property graph model interface; Pipes - a dataflow netowork using process grapphs; Gremlin - a graph based programming language; Rexster - a RESTful graph shell. The never never ending battle of good versus evil has nothing on programmers arguing about bracket policies or sync vs async programming models. In this node.js thread, I love async, but I can't code like this , the batt
7 0.75054848 684 high scalability-2009-08-18-Real World Web: Performance & Scalability
8 0.74729335 406 high scalability-2008-10-08-Strategy: Flickr - Do the Essential Work Up-front and Queue the Rest
9 0.74075538 1425 high scalability-2013-03-18-Beyond Threads and Callbacks - Application Architecture Pros and Cons
10 0.73375648 1251 high scalability-2012-05-24-Build your own twitter like real time analytics - a step by step guide
11 0.73267561 242 high scalability-2008-02-07-Looking for good business examples of compaines using Hadoop
12 0.72889423 1124 high scalability-2011-09-26-17 Techniques Used to Scale Turntable.fm and Labmeeting to Millions of Users
13 0.72837681 1429 high scalability-2013-03-25-AppBackplane - A Framework for Supporting Multiple Application Architectures
14 0.7277118 1446 high scalability-2013-04-25-Paper: Making reliable distributed systems in the presence of software errors
15 0.72729105 1204 high scalability-2012-03-06-Ask For Forgiveness Programming - Or How We'll Program 1000 Cores
16 0.72658557 1001 high scalability-2011-03-09-Google and Netflix Strategy: Use Partial Responses to Reduce Request Sizes
17 0.72587538 1346 high scalability-2012-10-24-Saving Cash Using Less Cache - 90% Savings in the Caching Tier
18 0.72523254 1237 high scalability-2012-05-02-12 Ways to Increase Throughput by 32X and Reduce Latency by 20X
19 0.72466618 1283 high scalability-2012-07-13-Stuff The Internet Says On Scalability For July 13, 2012
20 0.72347289 1373 high scalability-2012-12-17-11 Uses For the Humble Presents Queue, er, Message Queue