high_scalability high_scalability-2009 high_scalability-2009-514 knowledge-graph by maker-knowledge-mining

514 high scalability-2009-02-18-Numbers Everyone Should Know

meta infos for this blog

Source: html

Introduction: Google AppEngine Numbers This group of numbers is from Brett Slatkin in Building Scalable Web Apps with Google App Engine . Writes are expensive! Datastore is transactional: writes require disk access Disk access means disk seeks Rule of thumb: 10ms for a disk seek Simple math: 1s / 10ms = 100 seeks/sec maximum Depends on: * The size and shape of your data * Doing work in batches (batch puts and gets) Reads are cheap! Reads do not need to be transactional, just consistent Data is read from disk once, then it's easily cached All subsequent reads come straight from memory Rule of thumb: 250usec for 1MB of data from memory Simple math: 1s / 250usec = 4GB/sec maximum * For a 1MB entity, that's 4000 fetches/sec Numbers Miscellaneous This group of numbers is from a presentation Jeff Dean gave at a Engineering All-Hands Meeting at Google. L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 100 ns Main me

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Datastore is transactional: writes require disk access Disk access means disk seeks Rule of thumb: 10ms for a disk seek Simple math: 1s / 10ms = 100 seeks/sec maximum Depends on: * The size and shape of your data * Doing work in batches (batch puts and gets) Reads are cheap! [sent-3, score-0.413]

2 Given the the number of writes that can be made per second is so limited, a high write load serializes and slows down the whole process. [sent-22, score-0.314]

3 Reads are cheap so we replace having a single easily read counter with having to make multiple reads to recover the actual count. [sent-31, score-0.436]

4 Frequently updated shared variables are expensive so we shard and parallelize those writes. [sent-32, score-0.331]

5 But to scale writes you need to partition and once you partition it becomes difficult to keep any shared state like counters. [sent-34, score-0.294]

6 As a comment is made you get a sequence number and that's the order comments are displayed. [sent-39, score-0.643]

7 But as we saw in the last section shared state like a single counter won't scale in high write environments. [sent-40, score-0.461]

8 A sharded counter won't work in this situation either because summing the shared counters isn't transactional. [sent-41, score-0.515]

9 There's no way to guarantee each comment will get back the sequence number it allocated so we could have duplicates. [sent-42, score-0.502]

10 So what is needed for a key is something unique and alphabetical so when searching through comments you can go forward and backward using only keys. [sent-44, score-0.459]

11 BigTable knows how to get things by keys so you must make keys that return data in the proper order. [sent-48, score-0.364]

12 In the grand old tradition of making unique keys we just keep appending stuff until it becomes unique. [sent-49, score-0.293]

13 The suggested key for GAE is: time stamp + user ID + user comment ID. [sent-50, score-0.576]

14 What we need then is a sequence number for each user's comments. [sent-57, score-0.287]

15 Our goal is to remove write contention so we want to parallelize writes. [sent-59, score-0.287]

16 When a user adds a comment it's added to a user's comment list and a sequence number is allocated. [sent-62, score-0.844]

17 So each comment add is guaranteed to be unique because updates in an Entity Group are serialized. [sent-64, score-0.408]

18 The resulting key is guaranteed unique and sorts properly in alphabetical order. [sent-65, score-0.4]

19 When paging a query is made across entity groups using the ID index. [sent-66, score-0.299]

20 The idea of keeping per user comment indexes is out there. [sent-71, score-0.342]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('counter', 0.221), ('comment', 0.215), ('sequence', 0.211), ('alphabetical', 0.207), ('nsread', 0.207), ('keys', 0.182), ('entity', 0.152), ('sequentially', 0.149), ('shared', 0.148), ('paging', 0.147), ('writes', 0.146), ('comments', 0.141), ('nssend', 0.138), ('reads', 0.137), ('count', 0.135), ('user', 0.127), ('mb', 0.123), ('unique', 0.111), ('gae', 0.109), ('bigtable', 0.108), ('stamp', 0.107), ('thumb', 0.105), ('reference', 0.102), ('contention', 0.1), ('parallelize', 0.095), ('appengine', 0.094), ('write', 0.092), ('transactional', 0.091), ('disk', 0.089), ('shard', 0.088), ('math', 0.083), ('guaranteed', 0.082), ('cheap', 0.078), ('counters', 0.076), ('number', 0.076), ('numbers', 0.074), ('bytes', 0.073), ('situation', 0.07), ('cleverly', 0.069), ('counterswe', 0.069), ('deangave', 0.069), ('inbuilding', 0.069), ('mispredict', 0.069), ('nsbranch', 0.069), ('nscompress', 0.069), ('nsdisk', 0.069), ('nsmain', 0.069), ('nsmutex', 0.069), ('nsround', 0.069), ('zippy', 0.069)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 514 high scalability-2009-02-18-Numbers Everyone Should Know

2 0.31927207 978 high scalability-2011-01-26-Google Pro Tip: Use Back-of-the-envelope-calculations to Choose the Best Design

Introduction: How do you know which is the "best" design for a given problem? If, for example, you were given the problem of generating an image search results page of 30 thumbnails, would you load images sequentially? In parallel? Would you cache? How would you decide? If you could harness the power of the multiverse you could try every possible option in the design space and see which worked best. But that's crazy impractical, isn't it? Another option is to consider the order of various algorithm alternatives. As a prophet for the Golden Age of Computational Thinking , Google would definitely do this, but what else might Google do? Use Back-of-the-envelope Calculations to Evaluate Different Designs Jeff Dean , Head of Google's School of Infrastructure Wizardry—instrumental in many of Google's key systems: ad serving, BigTable; search, MapReduce, ProtocolBuffers—advocates evaluating different designs using back-of-the-envelope calculations . He gives the full story in this Stanfor

3 0.30648389 327 high scalability-2008-05-27-How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale

Introduction: Update 3 : ReadWriteWeb says Google App Engine Announces New Pricing Plans, APIs, Open Access . Pricing is specified but I'm not sure what to make of it yet. An image manipulation library is added (thus the need to pay for more CPU :-) and memcached support has been added. Memcached will help resolve the can't write for every read problem that pops up when keeping counters. Update 2 : onGWT.com threw a GAE load party and a lot of people came. The results at Load test : Google App Engine = 1, Community = 0 . GAE handled a peak of 35 requests/second and a sustained 10 requests/second. Some think performance was good, others not so good. My GMT watch broke and I was late to arrive. Maybe next time. Also added a few new design rules from the post. Update : Added a few new rules gleaned from the GAE Meetup : Design By Explicit Cost Model and Puts are Precious. How do you structure your database using a distributed hash table like BigTable ? The answer isn't what you might expect. If

4 0.19274458 517 high scalability-2009-02-21-Google AppEngine - A Second Look

Introduction: Update 6: : Back to the Future for Data Storage . We are in the middle of a renaissance in data storage with the application of many new ideas and techniques; there's huge potential for breaking out of thinking about data storage in just one way. Update 5 : Building Scalable Web Applications with Google App Engine by Brett Slatkin. Update 4 : Why Google App Engine is broken and what Google must do to fix it by Aral Balkan. We don't care that it can scale. We care that it does scale. And that it scales when you need it the most. Issues: 1MB limit on data structures; 1MB limit on data structures; the short-term high CPU quota; quotas in general; Admin? What's that? Update 3 : BigTable Blues . Catherine Devlin couldn't port an application to GAE because it can't do basic filtering and can't search 5,000 records without timing out: "Querying from 5000 records - too much for the mighty BigTable, apparently." Followup: not the future database . "90% of the work of this proje

5 0.17767158 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?

Introduction: For everything given something seems to be taken. Caching is a great scalability solution, but caching also comes with problems . Sharding is a great scalability solution, but as Foursquare recently revealed in a post-mortem about their 17 hours of downtime, sharding also has problems. MongoDB, the database Foursquare uses, also contributed their post-mortem of what went wrong too. Now that everyone has shared and resharded, what can we learn to help us skip these mistakes and quickly move on to a different set of mistakes? First, like for Facebook , huge props to Foursquare and MongoDB for being upfront and honest about their problems. This helps everyone get better and is a sign we work in a pretty cool industry. Second, overall, the fault didn't flow from evil hearts or gross negligence. As usual the cause was more mundane: a key system, that could be a little more robust, combined with a very popular application built by a small group of people, under immense pressure

6 0.1682815 687 high scalability-2009-08-24-How Google Serves Data from Multiple Datacenters

7 0.15432666 1112 high scalability-2011-09-07-What Google App Engine Price Changes Say About the Future of Web Architecture

8 0.14378382 972 high scalability-2011-01-11-Google Megastore - 3 Billion Writes and 20 Billion Read Transactions Daily

9 0.14185093 1008 high scalability-2011-03-22-Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day

10 0.14154199 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?

11 0.14112096 672 high scalability-2009-08-06-An Unorthodox Approach to Database Design : The Coming of the Shard

12 0.14096756 1205 high scalability-2012-03-07-Scale Indefinitely on S3 With These Secrets of the S3 Masters

13 0.13354024 1611 high scalability-2014-03-12-Paper: Scalable Eventually Consistent Counters over Unreliable Networks

14 0.13111539 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years

15 0.13019846 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O

16 0.12925465 1345 high scalability-2012-10-22-Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale

17 0.12778534 152 high scalability-2007-11-13-Flickr Architecture

18 0.1224717 301 high scalability-2008-04-08-Google AppEngine - A First Look

19 0.11965584 1475 high scalability-2013-06-13-Busting 4 Modern Hardware Myths - Are Memory, HDDs, and SSDs Really Random Access?

20 0.11655425 476 high scalability-2008-12-28-How to Organize a Database Table’s Keys for Scalability

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.194), (1, 0.147), (2, -0.037), (3, -0.037), (4, 0.001), (5, 0.099), (6, 0.023), (7, 0.039), (8, 0.007), (9, -0.078), (10, 0.001), (11, -0.003), (12, -0.079), (13, 0.083), (14, 0.037), (15, -0.009), (16, -0.133), (17, -0.032), (18, 0.067), (19, -0.004), (20, 0.019), (21, -0.015), (22, 0.033), (23, -0.003), (24, -0.024), (25, -0.036), (26, 0.065), (27, -0.008), (28, 0.026), (29, -0.012), (30, 0.051), (31, -0.113), (32, 0.053), (33, 0.031), (34, -0.04), (35, -0.009), (36, 0.026), (37, 0.014), (38, 0.055), (39, -0.046), (40, -0.064), (41, 0.1), (42, -0.028), (43, 0.004), (44, 0.06), (45, 0.017), (46, -0.007), (47, 0.013), (48, 0.033), (49, 0.051)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9661808 514 high scalability-2009-02-18-Numbers Everyone Should Know

2 0.82792932 327 high scalability-2008-05-27-How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale

3 0.72585726 1135 high scalability-2011-10-31-15 Ways to Make Your Application Feel More Responsive under Google App Engine

Introduction: Small Imrovements , makers of a hosted, lightweight feedback platform, have written an excellent article on Performance issues on GAE, and how we resolved them . They show how they trimmed most of their requests to between 300ms and 800ms, some still take 2 seconds when memcache is stale, and others clock in at 150ms. Not zippy overall, but acceptable, especially if you really like GAE's PaaS promise. What's tricky with PaaS is if your performance is poor, there's often not a lot you can do about it. But the folks at Small Improvements have been clever and diligent, giving many specific details and timings. Though their advice is specifically for GAE, it will apply to a lot of different situations as well. Here are the 15 ways they made small performance improvements: Understand App Engine has bad days . App Engine can have bad days where performance can degrade. Your design needs to take this potential for high latency variability into account. Don't always assume the bes

4 0.70633137 561 high scalability-2009-04-08-N+1+caching is ok?

Introduction: Hibernate and iBATIS and other similar tools have documentation with recommendations for avoiding the "N+1 select" problem. The problem being that if you wanted to retrieve a set of widgets from a table, one query would be used to to retrieve all the ids of the matching widgets (select widget_id from widget where ...) and then for each id, another select is used to retrieve the details of that widget (select * from widget where widget_id = ?). If you have 100 widgets, it requires 101 queries to get the details of them all. I can see why this is bad, but what if you're doing entity caching? i.e. If you run the first query to get your list of ids, and then for each widget you retrive it from the cache. Surely in that case, N+1(+caching) is good? Assuming of course that there is a high probability of all of the matching entities being in the cache. I may be asking a daft question here - one whose answer is obviously implied by the large scalable mechanisms for storing data th

5 0.70175612 972 high scalability-2011-01-11-Google Megastore - 3 Billion Writes and 20 Billion Read Transactions Daily

Introduction: A giant step into the fully distributed future has been taken by the Google App Engine team with the release of their High Replication Datastore . The HRD is targeted at mission critical applications that require data replicated to at least three datacenters, full ACID semantics for entity groups , and lower consistency guarantees across entity groups. This is a major accomplishment. Few organizations can implement a true multi-datacenter datastore. Other than SimpleDB, how many other publicly accessible database services can operate out of multiple datacenters? Now that capability can be had by anyone. But there is a price, literally and otherwise. Because the HRD uses three times the resources as Google App Engine's Master/Slave datastatore, it will cost three times as much. And because it is a distributed database, with all that implies in the CAP sense, developers will have to be very careful in how they architect their applications because as costs increased, reliability incre

6 0.69748271 1205 high scalability-2012-03-07-Scale Indefinitely on S3 With These Secrets of the S3 Masters

7 0.68684727 517 high scalability-2009-02-21-Google AppEngine - A Second Look

8 0.68422943 978 high scalability-2011-01-26-Google Pro Tip: Use Back-of-the-envelope-calculations to Choose the Best Design

9 0.68225294 1096 high scalability-2011-08-10-LevelDB - Fast and Lightweight Key-Value Database From the Authors of MapReduce and BigTable

10 0.68021345 1222 high scalability-2012-04-05-Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory

11 0.67504179 1345 high scalability-2012-10-22-Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale

12 0.67126781 828 high scalability-2010-05-17-7 Lessons Learned While Building Reddit to 270 Million Page Views a Month

13 0.6690045 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O

14 0.66439271 1475 high scalability-2013-06-13-Busting 4 Modern Hardware Myths - Are Memory, HDDs, and SSDs Really Random Access?

15 0.66246361 351 high scalability-2008-07-16-The Mother of All Database Normalization Debates on Coding Horror

16 0.66039723 687 high scalability-2009-08-24-How Google Serves Data from Multiple Datacenters

17 0.66003627 281 high scalability-2008-03-18-Database Design 101

18 0.65943813 763 high scalability-2010-01-22-How BuddyPoke Scales on Facebook Using Google App Engine

19 0.65600485 1472 high scalability-2013-06-07-Stuff The Internet Says On Scalability For June 7, 2013

20 0.64673549 672 high scalability-2009-08-06-An Unorthodox Approach to Database Design : The Coming of the Shard

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.129), (2, 0.199), (10, 0.056), (27, 0.015), (40, 0.039), (51, 0.011), (52, 0.014), (61, 0.056), (76, 0.043), (77, 0.034), (79, 0.105), (85, 0.048), (86, 0.084), (94, 0.069)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9527154 514 high scalability-2009-02-18-Numbers Everyone Should Know

2 0.93476778 1626 high scalability-2014-04-04-Stuff The Internet Says On Scalability For April 4th, 2014

Introduction: Hey, it's HighScalability time: The world ends not with a bang, but with 1 exaFLOP of bitcoin whimpers. Quotable Quotes: @EtienneRoy : Algorithm: you must encode and leverage your ignorance, not only your knowledge #hadoopsummit - enthralling Chris Brenny : A material is nothing without a process. While the constituent formulation imbues the final product with fundamental properties, the bridge between material and function has a dramatic effect on its perception and use. @gallifreya n: Using AWS c1, m1, m2? @adrianco says don't. c3, m3, r3 are now better and cheaper. #cloudconnect #ccevent @christianhern : Mobile banking in the UK: 1,800 transactions per MINUTE. A "seismic shift" that banks were unprepared for While we are waiting for that epic article deeply comparing Google's Cloud with AWS, we have Adrian Cockcroft's highly hopped slide comparing the two . Google: no enterprise customers, no reservation options, need m

3 0.93129957 789 high scalability-2010-03-05-Strategy: Planning for a Power Outage Google Style

Introduction: We can all learn from problems. The Google App Engine team has created a teachable moment through a remarkably honest and forthcoming post-mortem for February 24th, 2010 outage post, chronicling in elaborate detail a power outage that took down Google App Engine for a few hours. The world is ending! The cloud is unreliable! Jump ship! Not. This is not evidence that the cloud is a beautiful, powerful and unsinkable ship that goes down on its maiden voyage. Stuff happens, no matter how well you prepare. If you think private datacenters don't go down, well, then I have some rearangeable deck chairs to sell you. The goal is to keep improving and minimizing those failure windows. From that perspective there is a lot to learn from the problems the Google App Engine team encountered and how they plan to fix them. Please read the article for all the juicy details, but here's what struck me as key: Power fails. Plan for it . This seems to happen with unexpected frequency for su

4 0.9311462 1122 high scalability-2011-09-23-Stuff The Internet Says On Scalability For September 23, 2011

Introduction: I'd walk a mile for HighScalability : 1/12th the World Population on Facebook in One Day ; 1.8 ZettaBytes of data in 2011; 1 Billion Foursquare Checkins ; 2 million on Spotify ; 1 Million on GitHub ; $1,279-per-hour, 30,000-core cluster built on EC2 ; Patent trolls cost .5 trillion dollars ; 235 terabytes of data collected by the U.S. Library of Congress in April . Potent quotables: @jstogdill : Corporations over protect low value info assets (which screws up collaboration) and under protects high value assets. #strataconf @sbtourist : I think BigMemory-like approaches based on large put-and-forget memory cans, are rarely a solution to performance/scalability problems. 1 Million TCP Connections . Remember when 10K was a real limit and you had to build out boxes just to handle the load? Amazing. We don't know how much processing can be attached to these connections, how much memory the apps use, or what the response latency is to

5 0.92939001 371 high scalability-2008-08-24-A Scalable, Commodity Data Center Network Architecture

Introduction: Looks interesting... Abstract: Todayâ€™s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Nonuniform bandwidth among data center nodes complicates application design and limits overall system performance. In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodi

6 0.92794323 978 high scalability-2011-01-26-Google Pro Tip: Use Back-of-the-envelope-calculations to Choose the Best Design

7 0.92690378 736 high scalability-2009-11-04-Damn, Which Database do I Use Now?

8 0.92604232 1612 high scalability-2014-03-14-Stuff The Internet Says On Scalability For March 14th, 2014

9 0.92597997 1389 high scalability-2013-01-18-Stuff The Internet Says On Scalability For January 18, 2013

10 0.92579019 1602 high scalability-2014-02-26-The WhatsApp Architecture Facebook Bought For $19 Billion

11 0.92410749 1576 high scalability-2014-01-10-Stuff The Internet Says On Scalability For January 10th, 2014

12 0.92402077 517 high scalability-2009-02-21-Google AppEngine - A Second Look

13 0.92329127 1117 high scalability-2011-09-16-Stuff The Internet Says On Scalability For September 16, 2011

14 0.92320895 1431 high scalability-2013-03-29-Stuff The Internet Says On Scalability For March 29, 2013

15 0.92299557 1179 high scalability-2012-01-23-Facebook Timeline: Brought to You by the Power of Denormalization

16 0.92295891 1600 high scalability-2014-02-21-Stuff The Internet Says On Scalability For February 21st, 2014

17 0.92295641 716 high scalability-2009-10-06-Building a Unique Data Warehouse

18 0.92271876 498 high scalability-2009-01-20-Product: Amazon's SimpleDB

19 0.92263597 1382 high scalability-2013-01-07-Analyzing billions of credit card transactions and serving low-latency insights in the cloud

20 0.92240512 1297 high scalability-2012-08-03-Stuff The Internet Says On Scalability For August 3, 2012