high_scalability high_scalability-2009 high_scalability-2009-599 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: An interesting post on DataCenterKnowledge! 1&1 Internet: 55,000 servers Rackspace: 50,038 servers The Planet: 48,500 servers Akamai Technologies: 48,000 servers OVH: 40,000 servers SBC Communications: 29,193 servers Verizon: 25,788 servers Time Warner Cable: 24,817 servers SoftLayer: 21,000 servers AT&T;: 20,268 servers iWeb: 10,000 servers How about Google , Microsoft, Amazon , eBay , Yahoo, GoDaddy, Facebook? Check out the post on DataCenterKnowledge and of course here on highscalability.com!
sentIndex sentText sentNum sentScore
1 Check out the post on DataCenterKnowledge and of course here on highscalability. [sent-3, score-0.163]
wordName wordTfidf (topN-words)
[('datacenterknowledge', 0.673), ('godaddy', 0.305), ('warner', 0.287), ('serversthe', 0.274), ('poston', 0.248), ('aninteresting', 0.242), ('cable', 0.198), ('planet', 0.194), ('communications', 0.184), ('yahoo', 0.136), ('check', 0.106), ('course', 0.102), ('technologies', 0.092), ('internet', 0.084), ('facebook', 0.082), ('post', 0.061)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 599 high scalability-2009-05-14-Who Has the Most Web Servers?
Introduction: An interesting post on DataCenterKnowledge! 1&1 Internet: 55,000 servers Rackspace: 50,038 servers The Planet: 48,500 servers Akamai Technologies: 48,000 servers OVH: 40,000 servers SBC Communications: 29,193 servers Verizon: 25,788 servers Time Warner Cable: 24,817 servers SoftLayer: 21,000 servers AT&T;: 20,268 servers iWeb: 10,000 servers How about Google , Microsoft, Amazon , eBay , Yahoo, GoDaddy, Facebook? Check out the post on DataCenterKnowledge and of course here on highscalability.com!
2 0.073440172 211 high scalability-2008-01-13-Google Reveals New MapReduce Stats
Introduction: The Google Operating System blog has an interesting post on Google's scale based on an updated version of Google's paper about MapReduce. The input data for some of the MapReduce jobs run in September 2007 was 403,152 TB (terabytes), the average number of machines allocated for a MapReduce job was 394, while the average completion time was 6 minutes and a half. The paper mentions that Google's indexing system processes more than 20 TB of raw data. Niall Kennedy calculates that the average MapReduce job runs across a $1 million hardware infrastructure, assuming that Google still uses the same cluster configurations from 2004: two 2 GHz Intel Xeon processors with Hyper-Threading enabled, 4 GB of memory, two 160 GB IDE hard drives and a gigabit Ethernet link. Greg Linden notices that Google's infrastructure is an important competitive advantage. "Anyone at Google can process terabytes of data. And they can get their results back in about 10 minutes, so they ca
3 0.063406765 596 high scalability-2009-05-11-Facebook, Hadoop, and Hive
Introduction: Facebook has the second largest installation of Hadoop (a software platform that lets one easily write and run applications that process vast amounts of data), Yahoo being the first. Learn how they do it and what are the challenges on DBMS2 blog, which is a blog for people who care about database and analytic technologies.
4 0.054206613 627 high scalability-2009-06-11-Yahoo! Distribution of Hadoop
Introduction: Many people in the Apache Hadoop community have asked Yahoo! to publish the version of Apache Hadoop they test and deploy across their large Hadoop clusters. As a service to the Hadoop community, Yahoo is releasing the Yahoo! Distribution of Hadoop -- a source code distribution that is based entirely on code found in the Apache Hadoop project. This source distribution includes code patches that they have added to improve the stability and performance of their clusters. In all cases, these patches have already been contributed back to Apache, but they may not yet be available in an Apache release of Hadoop. Read more and get the Hadoop distribution from Yahoo
5 0.049563278 720 high scalability-2009-10-12-High Performance at Massive Scale – Lessons learned at Facebook
Introduction: Jeff Rothschild, Vice President of Technology at Facebook gave a great presentation at UC San Diego on our favorite subject: " High Performance at Massive Scale – Lessons learned at Facebook ". The abstract for the talk is: Facebook has grown into one of the largest sites on the Internet today serving over 200 billion pages per month. The nature of social data makes engineering a site for this level of scale a particularly challenging proposition. In this presentation, I will discuss the aspects of social data that present challenges for scalability and will describe the the core architectural components and design principles that Facebook has used to address these challenges. In addition, I will discuss emerging technologies that offer new opportunities for building cost-effective high performance web architectures. There's a lot of interesting about this talk that we'll get into later, but I thought you might want a head start on learning how Facebook handles 30K+ machines,
6 0.047855668 624 high scalability-2009-06-10-Hive - A Petabyte Scale Data Warehouse using Hadoop
7 0.046420757 617 high scalability-2009-06-04-New Book: Even Faster Web Sites: Performance Best Practices for Web Developers
8 0.045090701 1105 high scalability-2011-08-25-The Cloud and The Consumer: The Impact on Bandwidth and Broadband
9 0.044866722 845 high scalability-2010-06-22-Exploring the software behind Facebook, the world’s largest site
10 0.041433766 85 high scalability-2007-09-08-Making the case for PHP at Yahoo! (Oct 2002)
11 0.038405549 886 high scalability-2010-08-24-21 Quality Screencasts on Scaling Rails
12 0.036512408 273 high scalability-2008-03-09-Best Practices for Speeding Up Your Web Site
13 0.036219873 450 high scalability-2008-11-24-Scalability Perspectives #3: Marc Andreessen – Internet Platforms
14 0.035678554 244 high scalability-2008-02-11-Yahoo Live's Scaling Problems Prove: Release Early and Often - Just Don't Screw Up
15 0.03486418 1490 high scalability-2013-07-12-Stuff The Internet Says On Scalability For July 12, 2013
16 0.03401947 1558 high scalability-2013-12-04-How Can Batching Requests Actually Reduce Latency?
17 0.032813743 1371 high scalability-2012-12-12-Pinterest Cut Costs from $54 to $20 Per Hour by Automatically Shutting Down Systems
18 0.032636046 1389 high scalability-2013-01-18-Stuff The Internet Says On Scalability For January 18, 2013
19 0.032513063 509 high scalability-2009-02-05-Product: HAProxy - The Reliable, High Performance TCP-HTTP Load Balancer
20 0.032225817 1123 high scalability-2011-09-23-The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month…
topicId topicWeight
[(0, 0.021), (1, 0.008), (2, 0.011), (3, 0.005), (4, 0.017), (5, -0.012), (6, -0.023), (7, 0.017), (8, 0.014), (9, 0.031), (10, -0.002), (11, 0.002), (12, 0.018), (13, -0.012), (14, -0.014), (15, 0.012), (16, 0.02), (17, -0.007), (18, 0.009), (19, 0.005), (20, 0.031), (21, 0.047), (22, 0.025), (23, 0.007), (24, 0.015), (25, 0.006), (26, -0.001), (27, -0.012), (28, 0.022), (29, -0.011), (30, -0.016), (31, 0.01), (32, 0.013), (33, 0.037), (34, -0.001), (35, 0.013), (36, 0.003), (37, -0.012), (38, -0.017), (39, -0.015), (40, -0.001), (41, 0.014), (42, 0.013), (43, -0.012), (44, -0.012), (45, 0.006), (46, 0.018), (47, -0.021), (48, -0.028), (49, 0.025)]
simIndex simValue blogId blogTitle
same-blog 1 0.96293628 599 high scalability-2009-05-14-Who Has the Most Web Servers?
Introduction: An interesting post on DataCenterKnowledge! 1&1 Internet: 55,000 servers Rackspace: 50,038 servers The Planet: 48,500 servers Akamai Technologies: 48,000 servers OVH: 40,000 servers SBC Communications: 29,193 servers Verizon: 25,788 servers Time Warner Cable: 24,817 servers SoftLayer: 21,000 servers AT&T;: 20,268 servers iWeb: 10,000 servers How about Google , Microsoft, Amazon , eBay , Yahoo, GoDaddy, Facebook? Check out the post on DataCenterKnowledge and of course here on highscalability.com!
2 0.7048493 624 high scalability-2009-06-10-Hive - A Petabyte Scale Data Warehouse using Hadoop
Introduction: This post about using Hive and Hadoop for analytics comes straight from Facebook engineers. Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and non-engineering. Apart from ad hoc analysis and business intelligence applications used by analysts across the company, a number of Facebook products are also based on analytics. These products range from simple reporting applications like Insights for the Facebook Ad Network, to more advanced kind such as Facebook's Lexicon product. As a result a flexible infrastructure that caters to the needs of these diverse applications and users and that also scales up in a cost effective manner with the ever increasing amounts of data being generated on Facebook, is critical. Hive and Hadoop are the technologies that we have used to address these requirements at Facebook. Read the rest of the article on Engineering @ Facebook's Notes page
3 0.69742084 596 high scalability-2009-05-11-Facebook, Hadoop, and Hive
Introduction: Facebook has the second largest installation of Hadoop (a software platform that lets one easily write and run applications that process vast amounts of data), Yahoo being the first. Learn how they do it and what are the challenges on DBMS2 blog, which is a blog for people who care about database and analytic technologies.
4 0.69030976 845 high scalability-2010-06-22-Exploring the software behind Facebook, the world’s largest site
Introduction: Peter Alguacil at Pingdom wrote a HighScalability worthy article on Facebook's architecture: Exploring the software behind Facebook, the world’s largest site . It covers the challenges Facebook faces, the software Facebook uses, and the techniques Facebook uses to keep on scaling. Definitely worth a look.
5 0.67123556 720 high scalability-2009-10-12-High Performance at Massive Scale – Lessons learned at Facebook
Introduction: Jeff Rothschild, Vice President of Technology at Facebook gave a great presentation at UC San Diego on our favorite subject: " High Performance at Massive Scale – Lessons learned at Facebook ". The abstract for the talk is: Facebook has grown into one of the largest sites on the Internet today serving over 200 billion pages per month. The nature of social data makes engineering a site for this level of scale a particularly challenging proposition. In this presentation, I will discuss the aspects of social data that present challenges for scalability and will describe the the core architectural components and design principles that Facebook has used to address these challenges. In addition, I will discuss emerging technologies that offer new opportunities for building cost-effective high performance web architectures. There's a lot of interesting about this talk that we'll get into later, but I thought you might want a head start on learning how Facebook handles 30K+ machines,
6 0.64542007 1081 high scalability-2011-07-18-Building your own Facebook Realtime Analytics System
7 0.59000409 1323 high scalability-2012-09-15-4 Reasons Facebook Dumped HTML5 and Went Native
8 0.57730639 562 high scalability-2009-04-10-Facebook's Aditya giving presentation on Facebook Architecture
9 0.57640678 405 high scalability-2008-10-07-Help a Scoble out. What should Robert ask in his scalability interview?
10 0.56122655 966 high scalability-2010-12-31-Facebook in 20 Minutes: 2.7M Photos, 10.2M Comments, 4.6M Messages
11 0.55902648 1123 high scalability-2011-09-23-The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month…
12 0.55386525 264 high scalability-2008-03-03-Read This Site and Ace Your Next Interview!
13 0.54218405 563 high scalability-2009-04-10-Facebook Chat Architecture
14 0.53357899 1619 high scalability-2014-03-26-Oculus Causes a Rift, but the Facebook Deal Will Avoid a Scaling Crisis for Virtual Reality
15 0.47788519 870 high scalability-2010-08-02-7 Scaling Strategies Facebook Used to Grow to 500 Million Users
16 0.45699137 700 high scalability-2009-09-10-The technology behind Tornado, FriendFeed's web server
17 0.45138097 943 high scalability-2010-11-16-Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month
18 0.42636317 1274 high scalability-2012-06-29-Stuff The Internet Says On Scalability For June 29, 2012 - The Velocity Edition
19 0.42528671 646 high scalability-2009-07-01-Podcast about Facebook's Cassandra Project and the New Wave of Distributed Databases
20 0.42195925 1100 high scalability-2011-08-18-Paper: The Akamai Network - 61,000 servers, 1,000 networks, 70 countries
topicId topicWeight
[(1, 0.063), (14, 0.668), (85, 0.035)]
simIndex simValue blogId blogTitle
same-blog 1 0.9473753 599 high scalability-2009-05-14-Who Has the Most Web Servers?
Introduction: An interesting post on DataCenterKnowledge! 1&1 Internet: 55,000 servers Rackspace: 50,038 servers The Planet: 48,500 servers Akamai Technologies: 48,000 servers OVH: 40,000 servers SBC Communications: 29,193 servers Verizon: 25,788 servers Time Warner Cable: 24,817 servers SoftLayer: 21,000 servers AT&T;: 20,268 servers iWeb: 10,000 servers How about Google , Microsoft, Amazon , eBay , Yahoo, GoDaddy, Facebook? Check out the post on DataCenterKnowledge and of course here on highscalability.com!
2 0.82403642 441 high scalability-2008-11-13-CloudCamp London 2: private clouds and standardisation
Introduction: CloudCamp returned to London yesterday, organised with the help of Skills Matter at the Crypt on the Clarkenwell green. The main topics of this cloud/grid computing community meeting were service-level agreements, connecting private and public clouds and standardisation issues.
3 0.64093763 405 high scalability-2008-10-07-Help a Scoble out. What should Robert ask in his scalability interview?
Introduction: One of the cool things about Mr. Scoble is he doesn't pretend to know everything, which can be an deadly boring affliction in this field. In this case Robert is asking for help in an upcoming interview. Maybe we can help? Here's Robert's plight: I’m really freaked out. I have one of the biggest interviews of my life coming up and I’m way under qualified to host it. It’s on Thursday and it’s about Scalability and Performance of Web Services. Look at who will be on. Matt Mullenweg, founder of Automattic, the company behind WordPress (and behind this blog). Paul Bucheit, one of the founders of FriendFeed and the creator of Gmail (he’s also the guy who gave Google the “don’t be evil” admonishion). Nat Brown, CTO of iLike, which got six million users on Facebook in about 10 days. What would you ask?
4 0.50820494 725 high scalability-2009-10-21-Manage virtualized sprawl with VRMs
Introduction: The essence of my work is coming into daily contact with innovative technologies. A recent example was at the request of a partner company who wanted to answer- which one of these tools will best solve my virtualized datacenter headache? After initial analysis all the products could be classified as tools that troubleshoot VM sprawl, but there was no universally accepted term for them. The most descriptive term that I found was Virtual Resource Manager (VRM) from DynamicOps . As I delved deeper into their workings, the distinction between VRMs and Private Clouds became blurred. What are the differences? Read more at: http://bigdatamatters.com/bigdatamatters/2009/10/cloud-vs-vrm.html
5 0.36870816 981 high scalability-2011-02-01-Google Strategy: Tree Distribution of Requests and Responses
Introduction: If a large number of leaf node machines send requests to a central root node then that root node can become overwhelmed: The CPU becomes a bottleneck, for either processing requests or sending replies, because it can't possibly deal with the flood of requests. The network interface becomes a bottleneck because a wide fan-in causes TCP drops and retransmissions, which causes latency. Then clients start retrying requests which quickly causes a spiral of death in an undisciplined system. One solution to this problem is a strategy given by Dr. Jeff Dean , Head of Google's School of Infrastructure Wizardry, in this Stanford video presentation : Tree Distribution of Requests and Responses . Instead of having a root node connected to leaves in a flat topology, the idea is to create a tree of nodes. So a root node talks to a number of parent nodes and the parent nodes talk to a number of leaf nodes. Requests are pushed down the tree through the parents and only hit a subset
6 0.3557156 1162 high scalability-2011-12-23-Funny: A Cautionary Tale About Storage and Backup
7 0.34757647 495 high scalability-2009-01-17-Intro to Caching,Caching algorithms and caching frameworks part 1
8 0.32044744 694 high scalability-2009-09-04-Hot Links for 2009-9-4
9 0.28026244 128 high scalability-2007-10-21-Paper: Standardizing Storage Clusters (with pNFS)
10 0.27592117 1253 high scalability-2012-05-28-The Anatomy of Search Technology: Crawling using Combinators
11 0.27255133 537 high scalability-2009-03-12-QCon London 2009: Database projects to watch closely
12 0.25630501 487 high scalability-2009-01-08-Paper: Sharding with Oracle Database
13 0.21264276 744 high scalability-2009-11-24-Hot Scalability Links for Nov 24 2009
14 0.16661222 1278 high scalability-2012-07-06-Stuff The Internet Says On Scalability For July 6, 2012
15 0.10656466 646 high scalability-2009-07-01-Podcast about Facebook's Cassandra Project and the New Wave of Distributed Databases
16 0.097384803 46 high scalability-2007-07-30-Product: Sun Utility Computing
17 0.093866177 905 high scalability-2010-09-21-Sponsored Post: Joyent, DeviantART, CloudSigma, ManageEngine, Site24x7
20 0.093144074 896 high scalability-2010-09-07-Sponsored Post: deviantART, Okta, CloudSigma, ManageEngine, Site24x7