high_scalability high_scalability-2009 high_scalability-2009-596 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Facebook has the second largest installation of Hadoop (a software platform that lets one easily write and run applications that process vast amounts of data), Yahoo being the first. Learn how they do it and what are the challenges on DBMS2 blog, which is a blog for people who care about database and analytic technologies.
sentIndex sentText sentNum sentScore
1 Facebook has the second largest installation of Hadoop (a software platform that lets one easily write and run applications that process vast amounts of data), Yahoo being the first. [sent-1, score-2.328]
2 Learn how they do it and what are the challenges on DBMS2 blog, which is a blog for people who care about database and analytic technologies. [sent-2, score-1.32]
wordName wordTfidf (topN-words)
[('blog', 0.369), ('analytic', 0.367), ('installation', 0.306), ('vast', 0.273), ('lets', 0.272), ('yahoo', 0.255), ('amounts', 0.242), ('care', 0.226), ('largest', 0.205), ('challenges', 0.193), ('hadoop', 0.191), ('technologies', 0.173), ('easily', 0.164), ('facebook', 0.154), ('platform', 0.154), ('second', 0.133), ('write', 0.127), ('process', 0.115), ('run', 0.098), ('people', 0.095), ('applications', 0.095), ('software', 0.093), ('database', 0.07), ('one', 0.051), ('data', 0.043)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 596 high scalability-2009-05-11-Facebook, Hadoop, and Hive
Introduction: Facebook has the second largest installation of Hadoop (a software platform that lets one easily write and run applications that process vast amounts of data), Yahoo being the first. Learn how they do it and what are the challenges on DBMS2 blog, which is a blog for people who care about database and analytic technologies.
2 0.22988829 414 high scalability-2008-10-15-Hadoop - A Primer
Introduction: Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of the Google File System and of MapReduce to process vast amounts of data "Hadoop is a Free Java software framework that supports data intensive distributed applications running on large clusters of commodity computers. It enables applications to easily scale out to thousands of nodes and petabytes of data" (Wikipedia) * What platform does Hadoop run on? * Java 1.5.x or higher, preferably from Sun * Linux * Windows for development * Solaris
3 0.20452347 627 high scalability-2009-06-11-Yahoo! Distribution of Hadoop
Introduction: Many people in the Apache Hadoop community have asked Yahoo! to publish the version of Apache Hadoop they test and deploy across their large Hadoop clusters. As a service to the Hadoop community, Yahoo is releasing the Yahoo! Distribution of Hadoop -- a source code distribution that is based entirely on code found in the Apache Hadoop project. This source distribution includes code patches that they have added to improve the stability and performance of their clusters. In all cases, these patches have already been contributed back to Apache, but they may not yet be available in an Apache release of Hadoop. Read more and get the Hadoop distribution from Yahoo
4 0.15782987 242 high scalability-2008-02-07-Looking for good business examples of compaines using Hadoop
Introduction: I have read the blog about Mailtrust/Rackspace as well the interesting things with Google and Yahoo. Who else is using Hadoop/MapReduce to solve business problems. TIA johnmwillis.com
5 0.15338388 624 high scalability-2009-06-10-Hive - A Petabyte Scale Data Warehouse using Hadoop
Introduction: This post about using Hive and Hadoop for analytics comes straight from Facebook engineers. Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and non-engineering. Apart from ad hoc analysis and business intelligence applications used by analysts across the company, a number of Facebook products are also based on analytics. These products range from simple reporting applications like Insights for the Facebook Ad Network, to more advanced kind such as Facebook's Lexicon product. As a result a flexible infrastructure that caters to the needs of these diverse applications and users and that also scales up in a cost effective manner with the ever increasing amounts of data being generated on Facebook, is critical. Hive and Hadoop are the technologies that we have used to address these requirements at Facebook. Read the rest of the article on Engineering @ Facebook's Notes page
6 0.15223026 845 high scalability-2010-06-22-Exploring the software behind Facebook, the world’s largest site
7 0.14641321 88 high scalability-2007-09-10-Blog: Scalable Web Architectures by Royans Tharakan
8 0.13315937 445 high scalability-2008-11-14-Useful Cloud Computing Blogs
9 0.12073368 415 high scalability-2008-10-15-Need help with your Hadoop deployment? This company may help!
10 0.11734603 601 high scalability-2009-05-17-Product: Hadoop
11 0.11709669 450 high scalability-2008-11-24-Scalability Perspectives #3: Marc Andreessen – Internet Platforms
12 0.11579649 720 high scalability-2009-10-12-High Performance at Massive Scale – Lessons learned at Facebook
13 0.11435291 243 high scalability-2008-02-07-clusteradmin.blogspot.com - blog about building and administering clusters
14 0.10981609 263 high scalability-2008-02-27-Product: System Imager - Automate Deployment and Installs
15 0.10951215 272 high scalability-2008-03-08-Product: FAI - Fully Automatic Installation
16 0.1055261 769 high scalability-2010-02-02-Scale out your identity management
17 0.10382088 254 high scalability-2008-02-19-Hadoop Getting Closer to 1.0 Release
18 0.1037994 617 high scalability-2009-06-04-New Book: Even Faster Web Sites: Performance Best Practices for Web Developers
19 0.10273136 829 high scalability-2010-05-20-Strategy: Scale Writes to 734 Million Records Per Day Using Time Partitioning
20 0.10202453 1123 high scalability-2011-09-23-The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month…
topicId topicWeight
[(0, 0.121), (1, 0.006), (2, 0.026), (3, 0.024), (4, 0.063), (5, 0.017), (6, -0.005), (7, -0.027), (8, 0.082), (9, 0.147), (10, 0.043), (11, 0.029), (12, 0.085), (13, -0.032), (14, 0.008), (15, -0.022), (16, -0.018), (17, -0.058), (18, 0.031), (19, 0.068), (20, 0.07), (21, 0.139), (22, 0.098), (23, 0.028), (24, 0.047), (25, 0.021), (26, 0.099), (27, 0.02), (28, 0.03), (29, 0.01), (30, -0.021), (31, 0.053), (32, 0.054), (33, 0.091), (34, -0.011), (35, 0.019), (36, -0.047), (37, -0.034), (38, -0.01), (39, -0.039), (40, -0.011), (41, 0.004), (42, 0.057), (43, -0.047), (44, 0.054), (45, 0.025), (46, 0.135), (47, -0.039), (48, -0.072), (49, 0.119)]
simIndex simValue blogId blogTitle
same-blog 1 0.97398472 596 high scalability-2009-05-11-Facebook, Hadoop, and Hive
Introduction: Facebook has the second largest installation of Hadoop (a software platform that lets one easily write and run applications that process vast amounts of data), Yahoo being the first. Learn how they do it and what are the challenges on DBMS2 blog, which is a blog for people who care about database and analytic technologies.
2 0.71593165 599 high scalability-2009-05-14-Who Has the Most Web Servers?
Introduction: An interesting post on DataCenterKnowledge! 1&1 Internet: 55,000 servers Rackspace: 50,038 servers The Planet: 48,500 servers Akamai Technologies: 48,000 servers OVH: 40,000 servers SBC Communications: 29,193 servers Verizon: 25,788 servers Time Warner Cable: 24,817 servers SoftLayer: 21,000 servers AT&T;: 20,268 servers iWeb: 10,000 servers How about Google , Microsoft, Amazon , eBay , Yahoo, GoDaddy, Facebook? Check out the post on DataCenterKnowledge and of course here on highscalability.com!
3 0.65577513 624 high scalability-2009-06-10-Hive - A Petabyte Scale Data Warehouse using Hadoop
Introduction: This post about using Hive and Hadoop for analytics comes straight from Facebook engineers. Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and non-engineering. Apart from ad hoc analysis and business intelligence applications used by analysts across the company, a number of Facebook products are also based on analytics. These products range from simple reporting applications like Insights for the Facebook Ad Network, to more advanced kind such as Facebook's Lexicon product. As a result a flexible infrastructure that caters to the needs of these diverse applications and users and that also scales up in a cost effective manner with the ever increasing amounts of data being generated on Facebook, is critical. Hive and Hadoop are the technologies that we have used to address these requirements at Facebook. Read the rest of the article on Engineering @ Facebook's Notes page
4 0.63554305 627 high scalability-2009-06-11-Yahoo! Distribution of Hadoop
Introduction: Many people in the Apache Hadoop community have asked Yahoo! to publish the version of Apache Hadoop they test and deploy across their large Hadoop clusters. As a service to the Hadoop community, Yahoo is releasing the Yahoo! Distribution of Hadoop -- a source code distribution that is based entirely on code found in the Apache Hadoop project. This source distribution includes code patches that they have added to improve the stability and performance of their clusters. In all cases, these patches have already been contributed back to Apache, but they may not yet be available in an Apache release of Hadoop. Read more and get the Hadoop distribution from Yahoo
5 0.632092 414 high scalability-2008-10-15-Hadoop - A Primer
Introduction: Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of the Google File System and of MapReduce to process vast amounts of data "Hadoop is a Free Java software framework that supports data intensive distributed applications running on large clusters of commodity computers. It enables applications to easily scale out to thousands of nodes and petabytes of data" (Wikipedia) * What platform does Hadoop run on? * Java 1.5.x or higher, preferably from Sun * Linux * Windows for development * Solaris
6 0.63208896 415 high scalability-2008-10-15-Need help with your Hadoop deployment? This company may help!
7 0.57801986 650 high scalability-2009-07-02-Product: Hbase
8 0.56310844 601 high scalability-2009-05-17-Product: Hadoop
9 0.56208688 443 high scalability-2008-11-14-Paper: Pig Latin: A Not-So-Foreign Language for Data Processing
10 0.55756265 1081 high scalability-2011-07-18-Building your own Facebook Realtime Analytics System
11 0.55309761 968 high scalability-2011-01-04-Map-Reduce With Ruby Using Hadoop
12 0.54577404 1265 high scalability-2012-06-15-Stuff The Internet Says On Scalability For June 15, 2012
13 0.52715278 405 high scalability-2008-10-07-Help a Scoble out. What should Robert ask in his scalability interview?
14 0.52565008 851 high scalability-2010-07-02-Hot Scalability Links for July 2, 2010
15 0.52421838 1076 high scalability-2011-07-08-Stuff The Internet Says On Scalability For July 8, 2011
16 0.52154136 254 high scalability-2008-02-19-Hadoop Getting Closer to 1.0 Release
17 0.51283544 943 high scalability-2010-11-16-Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month
18 0.50716293 720 high scalability-2009-10-12-High Performance at Massive Scale – Lessons learned at Facebook
19 0.50248414 649 high scalability-2009-07-02-Product: Facebook's Cassandra - A Massive Distributed Store
20 0.49664247 88 high scalability-2007-09-10-Blog: Scalable Web Architectures by Royans Tharakan
topicId topicWeight
[(1, 0.341), (2, 0.057), (30, 0.09), (61, 0.117), (79, 0.221)]
simIndex simValue blogId blogTitle
Introduction: Where do you draw the line between scalability vs Performance vs High Availability vs Reliability? I guess at the end of the day, we all want to be highly available, great performance and always reliable. So is it safe to say that scalability is the answer ? Also when do you start to think scale out vs scale up ?
2 0.98317206 305 high scalability-2008-04-21-Google App Engine - what about existing applications?
Introduction: Recently, Google announced Google App Engine, another announcement in the rapidly growing world of cloud computing. This brings up some very serious questions: 1. If we want to take advantage of one of the clouds, are we doomed to be locked-in for life? 2. Must we re-write our existing applications to use the cloud? 3. Do we need to learn a brand new technology or language for the cloud? This post presents a pattern that will enable us to abstract our application code from the underlying cloud provider infrastructure. This will enable us to easily migrate our EXISTING applications to cloud based environment thus avoiding the need for a complete re-write.
same-blog 3 0.95589012 596 high scalability-2009-05-11-Facebook, Hadoop, and Hive
Introduction: Facebook has the second largest installation of Hadoop (a software platform that lets one easily write and run applications that process vast amounts of data), Yahoo being the first. Learn how they do it and what are the challenges on DBMS2 blog, which is a blog for people who care about database and analytic technologies.
4 0.95021284 1056 high scalability-2011-06-09-Retrospect on recent AWS outage and Resilient Cloud-Based Architecture
Introduction: A bit over a month ago Amazon experienced its infamous AWS outage in the US East Region. As a cloud evangelist, I was intrigued by the history of the outage as it occurred. There were great posts during and after the outage from those who went down. But more interestingly for me as architect were the detailed posts of those who managed to survive the outage relatively unharmed, such as SimpleGeo , Netflix , SmugMug , SmugMug’s CTO , Twilio , Bizo and others. Reading through the experience of others, I tried to summarize the patterns, principles and best practices that emerged from these posts, as I believe we can learn a lot from them on how to design our business applications to truly leverage on the benefits that the cloud offers in high availability and scalability. The main principles, patterns and best practices are: Design for failure Stateless and autonomous services Redundant hot copies spread across zones Spread across several public cloud vendors and/or
5 0.94869244 480 high scalability-2008-12-30-Scalability Perspectives #5: Werner Vogels – The Amazon Technology Platform
Introduction: Scalability Perspectives is a series of posts that highlights the ideas that will shape the next decade of IT architecture. Each post is dedicated to a thought leader of the information age and his vision of the future. Be warned though – the journey into the minds and perspectives of these people requires an open mind. Werner Vogels Dr. Werner Vogels is Vice President & Chief Technology Officer at Amazon.com where he is responsible for driving the company’s technology vision, which is to continuously enhance the innovation on behalf of Amazon’s customers at a global scale. Prior to joining Amazon, he worked as a researcher at Cornell University where he was a principal investigator in several research projects that target the scalability and robustness of mission-critical enterprise computing systems. He is regarded as one of the world's top experts on ultra-scalable systems and he uses his weblog to educate the community about issues such as eventual consistency. Information
6 0.94345391 273 high scalability-2008-03-09-Best Practices for Speeding Up Your Web Site
7 0.93489337 40 high scalability-2007-07-30-Product: Amazon Elastic Compute Cloud
8 0.9296239 51 high scalability-2007-07-31-Book: Scalable Internet Architectures
10 0.92721522 693 high scalability-2009-09-03-Storage Systems for High Scalable Systems presentation
11 0.92354548 539 high scalability-2009-03-16-Books: Web 2.0 Architectures and Cloud Application Architectures
12 0.91199154 617 high scalability-2009-06-04-New Book: Even Faster Web Sites: Performance Best Practices for Web Developers
13 0.91187716 184 high scalability-2007-12-13-Amazon SimpleDB - Scalable Cloud Database
14 0.91135949 114 high scalability-2007-10-07-Product: Wackamole
15 0.90915275 410 high scalability-2008-10-13-SQL Server 2008 Database Performance and Scalability
16 0.90904307 588 high scalability-2009-05-04-STRUCTURE 09 IS BACK!
17 0.89725196 1115 high scalability-2011-09-14-Big List of Scalabilty Conferences
18 0.89323866 1515 high scalability-2013-09-11-Ten Lessons from GitHub’s First Year in 2008
19 0.89270771 245 high scalability-2008-02-12-Product: rPath - Creating and Managing Virtual Appliances
20 0.89224267 813 high scalability-2010-04-19-The cost of High Availability (HA) with Oracle