high_scalability high_scalability-2008 high_scalability-2008-274 knowledge-graph by maker-knowledge-mining

274 high scalability-2008-03-12-YouTube Architecture


meta infos for this blog

Source: html

Introduction: Update 2:   YouTube Reaches One Billion Views Per Day . That’s at least 11,574 views per second, 694,444 views per minute, and 41,666,667 views per hour. Update: YouTube: The Platform . YouTube adds a new rich set of APIs in order to become your video platform leader--all for free. Upload, edit, watch, search, and comment on video from your own site without visiting YouTube. Compose your site internally from APIs because you'll need to expose them later anyway. YouTube grew incredibly fast, to over 100 million video views per day, with only a handful of people responsible for scaling the site. How did they manage to deliver all that video to all those users? And how have they evolved since being acquired by Google? Information Sources Google Video Platform Apache Python Linux (SuSe) MySQL psyco, a dynamic python->C compiler lighttpd for video instead of Apache What's Inside? The Stats Supports the delivery of over 100 million vide


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 That’s at least 11,574 views per second, 694,444 views per minute, and 41,666,667 views per hour. [sent-2, score-0.759]

2 YouTube adds a new rich set of APIs in order to become your video platform leader--all for free. [sent-4, score-0.342]

3 Upload, edit, watch, search, and comment on video from your own site without visiting YouTube. [sent-5, score-0.342]

4 YouTube grew incredibly fast, to over 100 million video views per day, with only a handful of people responsible for scaling the site. [sent-7, score-0.595]

5 How did they manage to deliver all that video to all those users? [sent-8, score-0.342]

6 Information Sources Google Video Platform Apache Python Linux (SuSe) MySQL psyco, a dynamic python->C compiler lighttpd for video instead of Apache What's Inside? [sent-10, score-0.501]

7 Most popular content is moved to a CDN (content delivery network): - CDNs replicate content in multiple places. [sent-41, score-0.467]

8 There's a better chance of content being closer to the user, with fewer hops, and content will run over a more friendly network. [sent-42, score-0.358]

9 - CDN machines mostly serve out of memory because the content is so popular there's little thrashing of content into and out of memory. [sent-43, score-0.467]

10 Less popular content (1-20 views per day) uses YouTube servers in various colo sites. [sent-44, score-0.619]

11 A video may have a few plays, but lots of videos are being played. [sent-46, score-0.342]

12 There are a like 4 thumbnails for each video so there are a lot more thumbnails than videos. [sent-64, score-0.656]

13 - Rebooting machine took 6-10 hours for cache to warm up to not go to disk. [sent-80, score-0.247]

14 - They went through a common evolution: single server, went to a single master with multiple read slaves, then partitioned the database, and then settled on a sharding approach. [sent-91, score-0.242]

15 Slaves are single threaded and usually run on lesser machines and replication is asynchronous, so the slaves can lag significantly behind the master. [sent-94, score-0.346]

16 - One of their solutions was prioritize traffic by splitting the data into two clusters: a video watch pool and a general cluster. [sent-97, score-0.537]

17 The idea is that people want to watch video so that function should get the most resources. [sent-98, score-0.44]

18 The social networking features of YouTube are less important so they can be routed to a less capable cluster. [sent-99, score-0.245]

19 If a video is popular enough it will move into the CDN. [sent-116, score-0.451]

20 It's true that nobody really knows what simplicity is, but if you aren't afraid to make changes then that's a good sign simplicity is happening. [sent-134, score-0.28]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('video', 0.342), ('youtube', 0.199), ('content', 0.179), ('views', 0.161), ('thumbnails', 0.157), ('went', 0.121), ('popular', 0.109), ('slaves', 0.107), ('simplicity', 0.107), ('watch', 0.098), ('prioritize', 0.097), ('cache', 0.093), ('per', 0.092), ('hardware', 0.088), ('lag', 0.086), ('threaded', 0.085), ('less', 0.083), ('images', 0.082), ('credit', 0.081), ('machine', 0.08), ('compiler', 0.08), ('lighttpd', 0.079), ('routed', 0.079), ('cdn', 0.079), ('directory', 0.079), ('cards', 0.079), ('python', 0.079), ('uses', 0.078), ('took', 0.074), ('seeks', 0.073), ('tail', 0.073), ('living', 0.071), ('bigtable', 0.071), ('html', 0.07), ('essential', 0.07), ('apache', 0.069), ('usually', 0.068), ('replica', 0.068), ('inode', 0.068), ('multilevel', 0.068), ('precalculated', 0.068), ('sourcesgoogle', 0.068), ('handling', 0.067), ('afraid', 0.066), ('expensive', 0.065), ('informations', 0.063), ('suse', 0.063), ('favorable', 0.063), ('leased', 0.063), ('printers', 0.063)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000002 274 high scalability-2008-03-12-YouTube Architecture

Introduction: Update 2:   YouTube Reaches One Billion Views Per Day . That’s at least 11,574 views per second, 694,444 views per minute, and 41,666,667 views per hour. Update: YouTube: The Platform . YouTube adds a new rich set of APIs in order to become your video platform leader--all for free. Upload, edit, watch, search, and comment on video from your own site without visiting YouTube. Compose your site internally from APIs because you'll need to expose them later anyway. YouTube grew incredibly fast, to over 100 million video views per day, with only a handful of people responsible for scaling the site. How did they manage to deliver all that video to all those users? And how have they evolved since being acquired by Google? Information Sources Google Video Platform Apache Python Linux (SuSe) MySQL psyco, a dynamic python->C compiler lighttpd for video instead of Apache What's Inside? The Stats Supports the delivery of over 100 million vide

2 0.32939675 796 high scalability-2010-03-16-Justin.tv's Live Video Broadcasting Architecture

Introduction: The future is live. The future is real-time. The future is now. That's the hype anyway. And as it has a habit of doing, the hype is slowly becoming reality. We are seeing live searches, live tweets, live location, live reality augmentation, live crab (fresh and local), and live event publishing. One of the most challenging of all live technologies is that of live video broadcasting. Imagine a world in which everyone becomes a broadcaster and a consumer of video streams, all in real-time (< 250 msec latency), all so you can talk and interact directly without feeling like you are in the middle of a time shift war. The resources and the engineering needed to make this happened must be substantial. How do you do that? To find out I talked to Kyle Vogt, Justin.tv Founder and VP of Engineering. Justin.tv certainly has the numbers. Their 30 million unique monthly visitors even outshine YouTube in the video upload game, reportedly uploading nearly 30 hours per minute of video compared to Y

3 0.32789889 1359 high scalability-2012-11-15-Gone Fishin': Justin.Tv's Live Video Broadcasting Architecture

Introduction: This is one of my favorite posts for a couple of reasons. I think it gives a lot of useful information in an interesting space. And Kyle Vogt was just a real pleasure to talk to. He was very helpful and forthcoming, which makes the whole experience better for everyone. The future is live. The future is real-time. The future is now. That's the hype anyway. And as it has a habit of doing, the hype is slowly becoming reality. We are seeing live searches, live tweets, live location, live reality augmentation, live crab (fresh and local), and live event publishing. One of the most challenging of all live technologies is that of live video broadcasting. Imagine a world in which everyone becomes a broadcaster and a consumer of video streams, all in real-time (< 250 msec latency), all so you can talk and interact directly without feeling like you are in the middle of a time shift war. The resources and the engineering needed to make this happened must be substantial. How do you do tha

4 0.31853104 1215 high scalability-2012-03-26-7 Years of YouTube Scalability Lessons in 30 Minutes

Introduction: If you started out building a dating site and instead ended up  building a video sharing site (YouTube) that handles 4 billion views a day, then it’s just possible you learned something along the way. And indeed, Mike Solomon, one of the original engineers at YouTube, did learn a lot and he has given a talk about it at PyCon : Scalability at YouTube . This isn’t an architecture driven talk where we are led through a description of how a lot of boxes connect to each other. Mike could give that sort of talk. He has worked on building YouTube’s servlet infrastructure, video indexing feature, video transcoding system, their full text search, a CDN, and much more. But instead, he’s taken a step back, took a long look around at what time has wrought, and shared some deep lessons, obviously hard won from experience. The key takeaway away of the talk for me was doing a lot with really simple tools . While many teams are moving on to more complex ecosystems, YouTube really does keep it

5 0.25941944 1201 high scalability-2012-02-29-Strategy: Put Mobile Video Into Cold Storage After 30 Days

Introduction: Limelight says  95% of Mobile Video Views Take Place in First 90 Days  and 88.8 percent of views take place in the first 30 days. Since a lot of people are working with video, which is expensive to store and serve, the implication: there's little need to keep your video close to the user or on a CDN after 30 days.

6 0.24148735 576 high scalability-2009-04-21-What CDN would you recommend?

7 0.24018805 1037 high scalability-2011-05-10-Viddler Architecture - 7 Million Embeds a Day and 1500 Req-Sec Peak

8 0.2377042 991 high scalability-2011-02-16-Paper: An Experimental Investigation of the Akamai Adaptive Video Streaming

9 0.19004698 313 high scalability-2008-05-02-Friends for Sale Architecture - A 300 Million Page View-Month Facebook RoR App

10 0.18840717 70 high scalability-2007-08-22-How many machines do you need to run your site?

11 0.17527309 638 high scalability-2009-06-26-PlentyOfFish Architecture

12 0.17487313 1597 high scalability-2014-02-17-How the AOL.com Architecture Evolved to 99.999% Availability, 8 Million Visitors Per Day, and 200,000 Requests Per Second

13 0.17478833 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?

14 0.1737003 661 high scalability-2009-07-25-Latency is Everywhere and it Costs You Sales - How to Crush it

15 0.17364793 1361 high scalability-2012-11-22-Gone Fishin': PlentyOfFish Architecture

16 0.17108694 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?

17 0.16617022 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud

18 0.16603312 1355 high scalability-2012-11-05-Gone Fishin': Building Super Scalable Systems: Blade Runner Meets Autonomic Computing In The Ambient Cloud

19 0.16522887 1289 high scalability-2012-07-23-State of the CDN: More Traffic, Stable Prices, More Products, Profits - Not So Much

20 0.16495349 856 high scalability-2010-07-12-Creating Scalable Digital Libraries


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.323), (1, 0.171), (2, -0.071), (3, -0.179), (4, -0.051), (5, -0.062), (6, -0.008), (7, 0.048), (8, -0.012), (9, 0.062), (10, -0.012), (11, -0.128), (12, -0.072), (13, 0.015), (14, 0.005), (15, 0.14), (16, -0.021), (17, 0.062), (18, -0.063), (19, -0.132), (20, -0.078), (21, 0.064), (22, 0.091), (23, 0.017), (24, -0.007), (25, 0.082), (26, 0.037), (27, 0.025), (28, 0.035), (29, 0.132), (30, 0.035), (31, -0.01), (32, 0.049), (33, -0.03), (34, 0.009), (35, -0.061), (36, -0.069), (37, -0.084), (38, 0.032), (39, 0.067), (40, -0.046), (41, -0.001), (42, -0.016), (43, 0.003), (44, 0.041), (45, -0.062), (46, 0.018), (47, -0.009), (48, 0.09), (49, 0.049)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97366089 274 high scalability-2008-03-12-YouTube Architecture

Introduction: Update 2:   YouTube Reaches One Billion Views Per Day . That’s at least 11,574 views per second, 694,444 views per minute, and 41,666,667 views per hour. Update: YouTube: The Platform . YouTube adds a new rich set of APIs in order to become your video platform leader--all for free. Upload, edit, watch, search, and comment on video from your own site without visiting YouTube. Compose your site internally from APIs because you'll need to expose them later anyway. YouTube grew incredibly fast, to over 100 million video views per day, with only a handful of people responsible for scaling the site. How did they manage to deliver all that video to all those users? And how have they evolved since being acquired by Google? Information Sources Google Video Platform Apache Python Linux (SuSe) MySQL psyco, a dynamic python->C compiler lighttpd for video instead of Apache What's Inside? The Stats Supports the delivery of over 100 million vide

2 0.88877511 796 high scalability-2010-03-16-Justin.tv's Live Video Broadcasting Architecture

Introduction: The future is live. The future is real-time. The future is now. That's the hype anyway. And as it has a habit of doing, the hype is slowly becoming reality. We are seeing live searches, live tweets, live location, live reality augmentation, live crab (fresh and local), and live event publishing. One of the most challenging of all live technologies is that of live video broadcasting. Imagine a world in which everyone becomes a broadcaster and a consumer of video streams, all in real-time (< 250 msec latency), all so you can talk and interact directly without feeling like you are in the middle of a time shift war. The resources and the engineering needed to make this happened must be substantial. How do you do that? To find out I talked to Kyle Vogt, Justin.tv Founder and VP of Engineering. Justin.tv certainly has the numbers. Their 30 million unique monthly visitors even outshine YouTube in the video upload game, reportedly uploading nearly 30 hours per minute of video compared to Y

3 0.88814455 1359 high scalability-2012-11-15-Gone Fishin': Justin.Tv's Live Video Broadcasting Architecture

Introduction: This is one of my favorite posts for a couple of reasons. I think it gives a lot of useful information in an interesting space. And Kyle Vogt was just a real pleasure to talk to. He was very helpful and forthcoming, which makes the whole experience better for everyone. The future is live. The future is real-time. The future is now. That's the hype anyway. And as it has a habit of doing, the hype is slowly becoming reality. We are seeing live searches, live tweets, live location, live reality augmentation, live crab (fresh and local), and live event publishing. One of the most challenging of all live technologies is that of live video broadcasting. Imagine a world in which everyone becomes a broadcaster and a consumer of video streams, all in real-time (< 250 msec latency), all so you can talk and interact directly without feeling like you are in the middle of a time shift war. The resources and the engineering needed to make this happened must be substantial. How do you do tha

4 0.82683545 1037 high scalability-2011-05-10-Viddler Architecture - 7 Million Embeds a Day and 1500 Req-Sec Peak

Introduction: Viddler is in the high quality Video as a Service business for a customer who wants to pay a fixed cost, be done with it, and just have it work. Similar to Blip and Ooyala, more focussed on business than YouTube. They serve thousands of business customers, including high traffic websites like FailBlog, Engadget, and Gawker. Viddler is a good case to learn from because they are a small company trying to provide a challenging service in a crowded field. We are catching them just as they transitioning from a startup that began in one direction, as a YouTube competitor, and pivoted into a slightly larger company focussed on paying business customers. Transition is the key word for Viddler: transitioning from a free YouTube clone to a high quality paid service. Transitioning from a few colo sites that didn't work well to a new higher quality datacenter. Transitioning from an architecture that was typical of a startup to one that features redundancy, high availability, and automation. Tr

5 0.81868714 576 high scalability-2009-04-21-What CDN would you recommend?

Introduction: Update 10: The Value of CDNs by Mike Axelrod of Google. Google implements a distributed content cache from within large ISPs . This allows them to serve content from the edge of the network and save bandwidth on the ISPs backbone. Update 9: Just Jump: Start using Clouds and CDNs . Bob Buffone gives a really nice and practical tutorial of how to use CloudFront as your CDN. Update 8: Akamai’s Services Become Affordable for Anyone! Blazing Web Site Performance by Distribution Cloud . Distribution Cloud starts at $150 per month for access to the best content distribution network in the world and the leader of Content Distribution Networks. Update 7: Where Amazon’s Data Centers Are Located , Expanding the Cloud: Amazon CloudFront . Why Amazon's CDN Offering Is No Threat To Akamai, Limelight or CDN Pricing . Amazon has launched their CDN with "“low latency, high data transfer speeds, and no commitments.” The perfect relationship for many. The m

6 0.81496632 198 high scalability-2008-01-01-HOW CDN works

7 0.80532128 1201 high scalability-2012-02-29-Strategy: Put Mobile Video Into Cold Storage After 30 Days

8 0.78567082 294 high scalability-2008-04-01-How to update video views count effectively?

9 0.7607832 991 high scalability-2011-02-16-Paper: An Experimental Investigation of the Akamai Adaptive Video Streaming

10 0.75303298 1215 high scalability-2012-03-26-7 Years of YouTube Scalability Lessons in 30 Minutes

11 0.72152615 60 high scalability-2007-08-07-Can you profit from the coming Content Delivery Network wars?

12 0.72068369 39 high scalability-2007-07-30-Product: Akamai

13 0.70618719 1289 high scalability-2012-07-23-State of the CDN: More Traffic, Stable Prices, More Products, Profits - Not So Much

14 0.69653231 800 high scalability-2010-03-26-Strategy: Caching 404s Saved the Onion 66% on Server Time

15 0.68681914 136 high scalability-2007-10-28-Scaling Early Stage Startups

16 0.68162221 437 high scalability-2008-11-03-How Sites are Scaling Up for the Election Night Crush

17 0.67849374 998 high scalability-2011-03-03-Stack Overflow Architecture Update - Now at 95 Million Page Views a Month

18 0.67389321 1261 high scalability-2012-06-08-Stuff The Internet Says On Scalability For June 8, 2012

19 0.662076 1229 high scalability-2012-04-17-YouTube Strategy: Adding Jitter isn't a Bug

20 0.66196632 1052 high scalability-2011-06-03-Stuff The Internet Says On Scalability For June 3, 2011


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.149), (2, 0.232), (10, 0.075), (30, 0.024), (33, 0.015), (38, 0.056), (40, 0.015), (47, 0.035), (61, 0.059), (77, 0.014), (79, 0.129), (85, 0.058), (94, 0.057)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9845897 1452 high scalability-2013-05-06-7 Not So Sexy Tips for Saving Money On Amazon

Introduction: Harish Ganesan  CTO of 8KMiles  has a very helpful blog,  Cloud, Big Data and Mobile , where he shows a nice analytical bent which leads to a lot of practical advice and cost saving tips: Use SQS Batch Requests  to reduce the number of requests hitting SQS which saves costs. Sending 10 messages in a single batch request which in the example save $30/month. Use SQS Long Polling  to reduce extra polling requests, cutting down empty receives, which in the example saves ~$600 in empty receive leakage costs. Choose the right search technology choice to save costs in AWS  by matching your activity pattern to the technology. For a small application with constant load or a heavily utilized search tier or seasonal loads Amazon Cloud Search looks like the cost efficient play.  Use Amazon CloudFront Price Class to minimize costs  by selecting the right Price Class for your audience to potentially reduce delivery costs by excluding Amazon CloudFront’s more expensive edge locatio

2 0.97579789 716 high scalability-2009-10-06-Building a Unique Data Warehouse

Introduction: There are many reasons to roll your own  data  storage solution on top of existing technologies. We've seen stories on HighScalability about custom databases for very large sets of individual  data (like Twitter) and large amounts of binary  data  (like Facebook pictures). However, I recently ran into a  unique  type of problem. I was tasked with recording and storing bandwidth information for more than 20,000 servers and their associated networking equipment. This  data  needed to be accessed in real-time, with less than a 5 minute delay between the  data  being recorded and the  data showing up on customer bandwidth graphs on our customer portal. After numerous false starts with off the shelf components and existing database clustering technology, we decided we must roll our own system. The real key to our problem (literally) was the ratio of the size of the key to the size of the actual  data . Because the tracked metric was so small (a 64-bit counter) compared to the  unique  ide

3 0.97524005 1186 high scalability-2012-02-02-The Data-Scope Project - 6PB storage, 500GBytes-sec sequential IO, 20M IOPS, 130TFlops

Introduction: “ Data is everywhere, never be at a single location. Not scalable, not maintainable. ” –Alex Szalay While Galileo played life and death doctrinal games over the mysteries revealed by the telescope, another revolution went unnoticed, the microscope gave up mystery after mystery and nobody yet understood how subversive would be what it revealed. For the first time these new tools of perceptual augmentation allowed humans to peek behind the veil of appearance. A new new eye driving human invention and discovery for hundreds of years. Data is another material that hides, revealing itself only when we look at different scales and investigate its underlying patterns. If the universe is truly made of information , then we are looking into truly primal stuff. A new eye is needed for Data and an ambitious project called Data-scope aims to be the lens. A detailed paper on the Data-Scope tells more about what it is: The Data-Scope is a new scientific instrum

same-blog 4 0.97430193 274 high scalability-2008-03-12-YouTube Architecture

Introduction: Update 2:   YouTube Reaches One Billion Views Per Day . That’s at least 11,574 views per second, 694,444 views per minute, and 41,666,667 views per hour. Update: YouTube: The Platform . YouTube adds a new rich set of APIs in order to become your video platform leader--all for free. Upload, edit, watch, search, and comment on video from your own site without visiting YouTube. Compose your site internally from APIs because you'll need to expose them later anyway. YouTube grew incredibly fast, to over 100 million video views per day, with only a handful of people responsible for scaling the site. How did they manage to deliver all that video to all those users? And how have they evolved since being acquired by Google? Information Sources Google Video Platform Apache Python Linux (SuSe) MySQL psyco, a dynamic python->C compiler lighttpd for video instead of Apache What's Inside? The Stats Supports the delivery of over 100 million vide

5 0.96775925 1431 high scalability-2013-03-29-Stuff The Internet Says On Scalability For March 29, 2013

Introduction: Hey, it's HighScalability time: ( Ukrainian daredevil  scaling buildings)   44.6 billion  - Tumblr posts;  300 Gb/s  - DDoS DNS amplification attacks; 100 million  - Eventbrite tickets processed. Quotable Quotes: @tveskov : Alan Kay: “The past 30 years have been completely mundane. It’s all been scaling (of old technology) and Angry Birds”  @phrawzty : OH "Complexity is accelerating. We must augment our ability to manage it." #monitorama @stallent : new wave of apps that are bringing the server doing actual legit valuable work back into vogue know scaling is more than hard @calvdee : Fending off a 300Gb/s DDoS attacks would constitute a feat of #highscalability  @solarce : "A spider farted in Finland and screwed up my IOPS!" -- @lusis #monitorama @cra : Listening to Shawn Pearce talk about scaling #git at Google with JGit... Android AOSP repos: 19.4GB, 2.5Mreq/day, 5.0TB/day #e

6 0.96661127 1612 high scalability-2014-03-14-Stuff The Internet Says On Scalability For March 14th, 2014

7 0.96647966 72 high scalability-2007-08-22-Wikimedia architecture

8 0.96603066 1041 high scalability-2011-05-15-Building a Database remote availability site

9 0.96564001 789 high scalability-2010-03-05-Strategy: Planning for a Power Outage Google Style

10 0.96520567 1075 high scalability-2011-07-07-Myth: Google Uses Server Farms So You Should Too - Resurrection of the Big-Ass Machines

11 0.96492815 1604 high scalability-2014-03-03-The “Four Hamiltons” Framework for Mitigating Faults in the Cloud: Avoid it, Mask it, Bound it, Fix it Fast

12 0.96463925 674 high scalability-2009-08-07-The Canonical Cloud Architecture

13 0.96463555 498 high scalability-2009-01-20-Product: Amazon's SimpleDB

14 0.96448553 1209 high scalability-2012-03-14-The Azure Outage: Time Is a SPOF, Leap Day Doubly So

15 0.96433163 1600 high scalability-2014-02-21-Stuff The Internet Says On Scalability For February 21st, 2014

16 0.96413946 1479 high scalability-2013-06-21-Stuff The Internet Says On Scalability For June 21, 2013

17 0.96410084 1148 high scalability-2011-11-29-DataSift Architecture: Realtime Datamining at 120,000 Tweets Per Second

18 0.9639535 1304 high scalability-2012-08-14-MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) and Familiar (SQL)

19 0.96387136 1245 high scalability-2012-05-14-DynamoDB Talk Notes and the SSD Hot S3 Cold Pattern

20 0.96317387 671 high scalability-2009-08-05-Stack Overflow Architecture