high_scalability high_scalability-2014 high_scalability-2014-1589 knowledge-graph by maker-knowledge-mining

1589 high scalability-2014-02-03-How Google Backs Up the Internet Along With Exabytes of Other Data

meta infos for this blog

Source: html

Introduction: Raymond Blum leads a team of Site Reliability Engineers charged with keeping Google's data secret and keeping it safe. Of course Google would never say how much data this actually is, but from comments it seems that it is not yet a yottabyte , but is many exabytes in size. GMail alone is approaching low exabytes of data. Mr. Blum, in the video How Google Backs Up the Internet , explained common backup strategies don’t work for Google for a very googly sounding reason: typically they scale effort with capacity . If backing up twice as much data requires twice as much stuff to do it, where stuff is time, energy, space, etc., it won’t work, it doesn’t scale. You have to find efficiencies so that capacity can scale faster than the effort needed to support that capacity. A different plan is needed when making the jump from backing up one exabyte to backing up two exabytes. And the talk is largely about how Google makes that happen. Some major themes of the t

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 If backing up twice as much data requires twice as much stuff to do it, where stuff is time, energy, space, etc. [sent-6, score-0.54]

2 Even the infamous GMail outage did not lose data, but the story is more complicated than just a lot of tape backup. [sent-12, score-0.541]

3 Tape capacity is following Moore’s law, so they are fairly happy with tape as a backup medium, though they are working on alternatives, won’t say what they are. [sent-100, score-0.64]

4 When writing to tape tell the writer to hold on to the data until we say it’s OK to change. [sent-133, score-0.651]

5 Build up 4 full tapes and then generate a 5th code tape by XORing everything together. [sent-135, score-0.874]

6 You can lose any one of the 5 tapes and recover the data. [sent-136, score-0.503]

7 Hundreds of tapes a month are lost, but don’t have hundreds cases of data loss per month because of this process. [sent-139, score-0.627]

8 If one tape is lost it is detected using the continuous restore and the sibling tapes are used to rebuild another tape and all is well. [sent-140, score-1.592]

9 In the rare case where two tapes are corrupted you’ve only lost data if the same two spots on the tapes are damaged, so reconstruction is done at the subtape level. [sent-141, score-1.102]

10 Write only half a tape and read them in parallel so you get the data back in half the time. [sent-162, score-0.531]

11 Can’t just say you want more network bandwidth and more tape drives. [sent-178, score-0.557]

12 Do you have 10,000 times the amount of loading dock to put the tape drives on until a truck picks them up. [sent-180, score-0.609]

13 Or an alert might go out if the rate of tape breakage changes from 100 tapes per day to 300 tapes per day. [sent-194, score-1.323]

14 But until then don’t tell me if 100 tapes a day broke if that’s within the norm. [sent-195, score-0.5]

15 Data was restored from many tapes and verified. [sent-238, score-0.563]

16 Otherwise no single location would have enough power to read all the tapes involved in the restoration process. [sent-245, score-0.575]

17 A restore on a tape has to happen where the tape is located. [sent-255, score-1.098]

18 But until it makes the tape the data could be in New York and the backup is in Oregon because that’s where there was capacity. [sent-256, score-0.677]

19 You can’t say I’m going to deploy more tape drives with having the operations staff. [sent-270, score-0.573]

20 Ensure data is in a guaranteed location and guaranteed not to be in a certain location, which goes against much of the rest of the philosophy which is location diversity and location independence. [sent-326, score-0.535]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('tapes', 0.449), ('tape', 0.425), ('restore', 0.195), ('backups', 0.179), ('gmail', 0.162), ('restores', 0.159), ('backup', 0.146), ('bug', 0.131), ('location', 0.126), ('restored', 0.114), ('data', 0.106), ('stuff', 0.104), ('lost', 0.098), ('exabytes', 0.096), ('human', 0.093), ('restoral', 0.088), ('twice', 0.086), ('worried', 0.081), ('drives', 0.079), ('google', 0.075), ('loss', 0.072), ('different', 0.069), ('say', 0.069), ('problemsisolation', 0.065), ('want', 0.063), ('outage', 0.062), ('vendor', 0.061), ('tax', 0.06), ('copies', 0.059), ('put', 0.058), ('backed', 0.056), ('let', 0.054), ('backing', 0.054), ('lose', 0.054), ('locality', 0.054), ('reconstruct', 0.053), ('luxury', 0.053), ('happen', 0.053), ('copy', 0.052), ('integrity', 0.052), ('diversity', 0.051), ('tell', 0.051), ('bugs', 0.05), ('road', 0.048), ('gear', 0.048), ('tests', 0.048), ('care', 0.047), ('backs', 0.047), ('times', 0.047), ('media', 0.046)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1589 high scalability-2014-02-03-How Google Backs Up the Internet Along With Exabytes of Other Data

2 0.48152894 1010 high scalability-2011-03-24-Strategy: Disk Backup for Speed, Tape Backup to Save Your Bacon, Just Ask Google

Introduction: In Stack Overflow Architecture Update - Now At 95 Million Page Views A Month , a commenter expressed surprise about Stack Overflow's backup strategy: Backup is to disk for fast retrieval and to tape for historical archiving. The comment was: Really? People still do this? I know some organizations invested a tremendous amount in automated, robotic tape backup, but seriously, a site founded in 2008 is backing up to tape? The Case of the Missing Gmail Accounts I admit that I was surprised at this strategy too. In this age of copying data to disk three times for safety, I also wondered if tape backups were still necessary? Then, like in a movie, an event happened that made sense of everything, Google suffered the quintessential #firstworldproblem , gmail accounts went missing! Queue emphatic music. And what's more they were taking a long time to come back. There was a palpable fear in the land that email accounts might never be restored. Think about that. They might ne

3 0.1466092 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud

Introduction: "But it is not complicated. [There's] just a lot of it." \--Richard Feynmanon how the immense variety of the world arises from simple rules.Contents:Have We Reached the End of Scaling?Applications Become Black Boxes Using Markets to Scale and Control CostsLet's Welcome our Neo-Feudal OverlordsThe Economic Argument for the Ambient CloudWhat Will Kill the Cloud?The Amazing Collective Compute Power of the Ambient CloudUsing the Ambient Cloud as an Application RuntimeApplications as Virtual StatesConclusionWe have not yet begun to scale. The world is still fundamentally disconnected and for all our wisdom we are still in the earliest days of learning how to build truly large planet-scaling applications.Today 350 million users on Facebook is a lot of users and five million followers on Twitter is a lot of followers. This may seem like a lot now, but consider we have no planet wide applications yet. None.Tomorrow the numbers foreshadow a newCambrian explosionof connectivity that will look as

4 0.14654608 1355 high scalability-2012-11-05-Gone Fishin': Building Super Scalable Systems: Blade Runner Meets Autonomic Computing In The Ambient Cloud

Introduction: All in all this is still my favorite post and I still think it's an accurate vision of a future. Not everyone agrees, but I guess we'll see..."But it is not complicated. [There's] just a lot of it." \--Richard Feynmanon how the immense variety of the world arises from simple rules.Contents:Have We Reached the End of Scaling?Applications Become Black Boxes Using Markets to Scale and Control CostsLet's Welcome our Neo-Feudal OverlordsThe Economic Argument for the Ambient CloudWhat Will Kill the Cloud?The Amazing Collective Compute Power of the Ambient CloudUsing the Ambient Cloud as an Application RuntimeApplications as Virtual StatesConclusionWe have not yet begun to scale. The world is still fundamentally disconnected and for all our wisdom we are still in the earliest days of learning how to build truly large planet-scaling applications.Today 350 million users on Facebook is a lot of users and five million followers on Twitter is a lot of followers. This may seem like a lot now, but c

5 0.14331217 1588 high scalability-2014-01-31-Stuff The Internet Says On Scalability For January 31st, 2014

Introduction: Hey, it's HighScalability time: Largest battle ever on Eve Online. 2,000 players. $200K in damage. Awesome pics . teaspoon of soil : hosts up to a billion bacteria spread among a million species. Quotable Quotes: Vivek Prakash : The problem of scaling always takes a toll on you. @ jcsalterego : See This One Weird Trick Hypervisors Don't Want You To Know Upgrades are the great killer of software systems. Do you really want a pill that would supply materials with instructions for nanobots to form new neurons and place them near existing cells to be replaced so you have a new brain within six months? Scary as hell. But there's an nanoapp for that. Ted Nelson has a fascinating series of Computers for Cynics vidcasts on YouTube. I'd ony really known of Mr. Nelson from his writings on hypertext, but he has a broad and penetrating insight into the early days of the computer industry. He's not really

6 0.1424147 1386 high scalability-2013-01-14-MongoDB and GridFS for Inter and Intra Datacenter Data Replication

7 0.13999537 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years

8 0.13815391 687 high scalability-2009-08-24-How Google Serves Data from Multiple Datacenters

9 0.13453737 672 high scalability-2009-08-06-An Unorthodox Approach to Database Design : The Coming of the Shard

10 0.13209818 448 high scalability-2008-11-22-Google Architecture

11 0.12737399 1320 high scalability-2012-09-11-How big is a Petabyte, Exabyte, Zettabyte, or a Yottabyte?

12 0.12620482 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?

13 0.125296 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?

14 0.12479876 1508 high scalability-2013-08-28-Sean Hull's 20 Biggest Bottlenecks that Reduce and Slow Down Scalability

15 0.12299875 1131 high scalability-2011-10-24-StackExchange Architecture Updates - Running Smoothly, Amazon 4x More Expensive

16 0.12226311 1240 high scalability-2012-05-07-Startups are Creating a New System of the World for IT

17 0.12222111 1596 high scalability-2014-02-14-Stuff The Internet Says On Scalability For February 14th, 2014

18 0.12200358 1359 high scalability-2012-11-15-Gone Fishin': Justin.Tv's Live Video Broadcasting Architecture

19 0.12192616 796 high scalability-2010-03-16-Justin.tv's Live Video Broadcasting Architecture

20 0.11774195 691 high scalability-2009-08-31-Squarespace Architecture - A Grid Handles Hundreds of Millions of Requests a Month

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.238), (1, 0.119), (2, -0.027), (3, 0.017), (4, -0.008), (5, -0.035), (6, 0.038), (7, 0.051), (8, 0.053), (9, -0.051), (10, -0.048), (11, -0.008), (12, -0.03), (13, 0.009), (14, 0.115), (15, 0.035), (16, -0.023), (17, -0.023), (18, -0.017), (19, 0.036), (20, 0.038), (21, -0.006), (22, 0.046), (23, -0.002), (24, -0.082), (25, 0.062), (26, 0.004), (27, 0.04), (28, -0.068), (29, 0.002), (30, -0.023), (31, -0.017), (32, 0.061), (33, -0.032), (34, -0.031), (35, 0.03), (36, 0.014), (37, 0.043), (38, 0.055), (39, 0.065), (40, 0.014), (41, -0.025), (42, 0.004), (43, -0.023), (44, 0.014), (45, -0.06), (46, 0.025), (47, -0.035), (48, 0.004), (49, -0.047)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94383907 1589 high scalability-2014-02-03-How Google Backs Up the Internet Along With Exabytes of Other Data

2 0.91964394 1010 high scalability-2011-03-24-Strategy: Disk Backup for Speed, Tape Backup to Save Your Bacon, Just Ask Google

3 0.83092505 23 high scalability-2007-07-24-Major Websites Down: Or Why You Want to Run in Two or More Data Centers.

Introduction: A lot of sites hosted in San Francisco are down because of at least 6 back-to-back power outages power outages. More details at laughingsquid . Sites like SecondLife, Craigstlist, Technorati, Yelp and all Six Apart properties, TypePad, LiveJournal and Vox are all down. The cause was an underground explosion in a transformer vault under a manhole at 560 Mission Street. Flames shot 6 feet out from the manhole cover. Over PG&E; 30,000 customers are without power. What's perplexing is the UPS backup and diesel generators didn't kick in to bring the datacenter back on line. I've never toured that datacenter, but they usually have massive backup systems. It's probably one of those multiple simultaneous failure situations that you hope never happen in real life, but too often do. Or maybe the infrastructure wasn't rolled out completely. Update: the cause was a cascade of failures in a tightly couples system that could never happen :-) Details at Failure Happens: A summary of the power

4 0.78994972 1209 high scalability-2012-03-14-The Azure Outage: Time Is a SPOF, Leap Day Doubly So

Introduction: This is a guest post by Steve Newman, co-founder of Writely (Google Docs), tech lead on the Paxos-based synchronous replication in Megastore, and founder of cloud service provider Scalyr.com . Microsoft’s Azure service suffered a widely publicized outage on February 28th / 29th. Microsoft recently published an excellent postmortem . For anyone trying to run a high-availability service, this incident can teach several important lessons. The central lesson is that, no matter how much work you put into redundancy, problems will arise. Murphy is strong and, I might say, creative; things go wrong. So preventative measures are important, but how you react to problems is just as important. It’s interesting to review the Azure incident in this light. The postmortem is worth reading in its entirety, but here’s a quick summary: each time Azure launches a new VM, it creates a “transfer certificate” to secure communications with that VM. There was a bug in the code that determines the ce

5 0.78245878 1503 high scalability-2013-08-19-What can the Amazing Race to the South Pole Teach us About Startups?

Introduction: At the heart of every software adventure exists a journey in service of a quest. Melodramatic much? Sorry, but while wandering dazzled through Race to the End of the Earth , a fantastic exhibit at the Royal BC Museum on the 1911-1912 race to the South Pole between Norwegian explorer Roald Amundsen and British naval officer Robert Scott , I couldn’t help but think of the two radically different approaches each team took to the race and it shocked me to see that some of the same principles that lead to success or failure in software development also seem to lead to success or failure in exploration. I wish I could reproduce the experience of walking through the exhibit . Plaque after plaque I remember wondering out loud at Scott’s choices and then nod in agreement with Amundsen’s approach. The core conflict was straight out of any ancient Agile (Amundsen) vs Waterfall (Scott) thread you can find on Usenet. And Waterfall lost. As background here are some sources you may want

6 0.78171879 917 high scalability-2010-10-08-4 Scalability Themes from Surgecon

7 0.77603596 120 high scalability-2007-10-11-How Flickr Handles Moving You to Another Shard

8 0.77371186 789 high scalability-2010-03-05-Strategy: Planning for a Power Outage Google Style

9 0.76380461 1464 high scalability-2013-05-24-Stuff The Internet Says On Scalability For May 24, 2013

10 0.74895889 1599 high scalability-2014-02-19-Planetary-Scale Computing Architectures for Electronic Trading and How Algorithms Shape Our World

11 0.7481876 919 high scalability-2010-10-14-I, Cloud

12 0.74326622 1027 high scalability-2011-04-20-Packet Pushers: How to Build a Low Cost Data Center

13 0.74020332 657 high scalability-2009-07-16-Scaling Traffic: People Pod Pool of On Demand Self Driving Robotic Cars who Automatically Refuel from Cheap Solar

14 0.74019969 1012 high scalability-2011-03-28-Aztec Empire Strategy: Use Dual Pipes in Your Aqueduct for High Availability

15 0.73843104 1612 high scalability-2014-03-14-Stuff The Internet Says On Scalability For March 14th, 2014

16 0.73632246 1634 high scalability-2014-04-18-Stuff The Internet Says On Scalability For April 18th, 2014

17 0.7350142 1500 high scalability-2013-08-12-100 Curse Free Lessons from Gordon Ramsay on Building Great Software

18 0.73423409 1627 high scalability-2014-04-07-Google Finds: Centralized Control, Distributed Data Architectures Work Better than Fully Decentralized Architectures

19 0.73298281 978 high scalability-2011-01-26-Google Pro Tip: Use Back-of-the-envelope-calculations to Choose the Best Design

20 0.73139459 1345 high scalability-2012-10-22-Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.122), (2, 0.204), (4, 0.095), (10, 0.058), (30, 0.017), (40, 0.027), (47, 0.028), (61, 0.068), (73, 0.013), (79, 0.166), (85, 0.039), (94, 0.053)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96512103 1619 high scalability-2014-03-26-Oculus Causes a Rift, but the Facebook Deal Will Avoid a Scaling Crisis for Virtual Reality

Introduction: Facebook has been teasing us. While many of their recent acquisitions have been surprising, shocking is the only word adequately describing Facebook's 5 day whirlwind acquisition of Oculus , immersive virtual reality visionaries, for a now paltry sounding $2 billion. The backlash is a pandemic, jumping across social networks with the speed only a meme powered by the directly unaffected can generate. For more than 30 years VR has been the dream burning in the heart of every science fiction fan. Now that this future might finally be here, Facebook’s ownage makes it seem like a wonderful and hopeful timeline has been choked off, killing the Metaverse before it even had a chance to begin. For the many who voted for an open future with their Kickstarter dollars , there’s a deep and personal sense of betrayal, despite Facebook’s promise to leave Oculus alone. The intensity of the reaction is because Oculus matters to people. It's new, it's different, it create

2 0.95679075 1213 high scalability-2012-03-22-Paper: Revisiting Network I-O APIs: The netmap Framework

Introduction: Here's a really good article in the Communications of the ACM on reducing network packet processing overhead by redesigning the network stack: Revisiting Network I/O APIs: The Netmap Framework by Luigi Rizzo . As commodity networking performance increases operating systems need to keep up or all those CPUs will go to waste. How do they make this happen? Abstract: Today 10-gigabit interfaces are used more and more in datacenters and servers. On these links, packets flow as fast as one every 67.2 nanoseconds, yet modern operating systems can take 10-20 times longer just to move one packet between the wire and the application. We can do much better, not with more powerful hardware but by revising architectural decisions made long ago regarding the design of device drivers and network stacks. The netmap framework is a promising step in this direction. Thanks to a careful design and the engineering of a new packet I/O API, netmap eliminates much unnecessary overhead and moves

same-blog 3 0.95598936 1589 high scalability-2014-02-03-How Google Backs Up the Internet Along With Exabytes of Other Data

4 0.94628674 919 high scalability-2010-10-14-I, Cloud

Introduction: Every time a technological innovation has spurred automation – since the time of Henry Ford right up to a minute ago – someone has claimed that machines will displace human beings. But the rainbow and unicorn dream attributed to business stakeholders everywhere, i.e. the elimination of IT, is just that – a dream. It isn’t realistic and in fact it’s downright silly to think that systems that only a few years ago were unable to automatically scale up and scale down will suddenly be able to perform the complex analysis required of IT to keep the business running. The rare reports of the elimination of IT staff due to cloud computing and automation are highlighted in the news because they evoke visceral reactions in technologists everywhere and, to be honest, they get the click counts rising. But the jury remains out on this one and in fact many postulate that it is not a reduction in staff that will occur, but a transformation of staff, which may eliminate some old timey positions

5 0.94483191 289 high scalability-2008-03-27-Amazon Announces Static IP Addresses and Multiple Datacenter Operation

Introduction: Amazon is fixing two of their major problems: no static IP addresses and single datacenter operation. By adding these two new features developers can finally build a no apology system on Amazon. Before you always had to throw in an apology or two. No, we don't have low failover times because of the silly DNS games and unexceptionable DNS update and propagation times and no, we don't operate in more than one datacenter. No more. Now Amazon is adding Elastic IP Addresses and Availability Zones . Elastic IP addresses are far better than normal IP addresses because they are both in tight with Jessica Alba and they are: Static IP addresses designed for dynamic cloud computing. An Elastic IP address is associated with your account, not a particular instance, and you control that address until you choose to explicitly release it. Unlike traditional static IP addresses, however, Elastic IP addresses allow you to mask instance or availability zone failures by programmatica

6 0.94239992 716 high scalability-2009-10-06-Building a Unique Data Warehouse

7 0.94229555 1316 high scalability-2012-09-04-Changing Architectures: New Datacenter Networks Will Set Your Code and Data Free

8 0.94220281 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud

9 0.94218397 1355 high scalability-2012-11-05-Gone Fishin': Building Super Scalable Systems: Blade Runner Meets Autonomic Computing In The Ambient Cloud

10 0.94027686 1343 high scalability-2012-10-18-Save up to 30% by Selecting Better Performing Amazon Instances

11 0.93945307 1612 high scalability-2014-03-14-Stuff The Internet Says On Scalability For March 14th, 2014

12 0.93877763 1157 high scalability-2011-12-14-Virtualization and Cloud Computing is Changing the Network to East-West Routing

13 0.93849969 1186 high scalability-2012-02-02-The Data-Scope Project - 6PB storage, 500GBytes-sec sequential IO, 20M IOPS, 130TFlops

14 0.93705648 1436 high scalability-2013-04-05-Stuff The Internet Says On Scalability For April 5, 2013

15 0.93636763 1649 high scalability-2014-05-16-Stuff The Internet Says On Scalability For May 16th, 2014

16 0.93500739 1630 high scalability-2014-04-11-Stuff The Internet Says On Scalability For April 11th, 2014

17 0.93433255 517 high scalability-2009-02-21-Google AppEngine - A Second Look

18 0.93427056 761 high scalability-2010-01-17-Applications Become Black Boxes Using Markets to Scale and Control Costs

19 0.93418652 1275 high scalability-2012-07-02-C is for Compute - Google Compute Engine (GCE)

20 0.93418264 763 high scalability-2010-01-22-How BuddyPoke Scales on Facebook Using Google App Engine