high_scalability high_scalability-2011 high_scalability-2011-1081 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Recently, I was reading Todd Hoff's write-up on FaceBook real time analytics system . As usual, Todd did an excellent job in summarizing this video from Engineering Manager at Facebook Alex Himel . In the first post , I’d like to summarize the case study, and consider some things that weren't mentioned in the summaries. This will lead to an architecture for building your own Realtime Time Analytics for Big-Data that might be easier to implement, using Facebook's experience as a starting point and guide as well as the experience gathered through a recent work with few of GigaSpaces customers. The second post provide a summary of that new approach as well as a pattern and a demo for building your own Real Time Analytics system.. References Real Time analytics for Big Data: Facebook's New Realtime Analytics System Real Time Analytics for Big Data: An Alternative Approach
sentIndex sentText sentNum sentScore
1 Recently, I was reading Todd Hoff's write-up on FaceBook real time analytics system . [sent-1, score-0.827]
2 As usual, Todd did an excellent job in summarizing this video from Engineering Manager at Facebook Alex Himel . [sent-2, score-0.209]
3 In the first post , I’d like to summarize the case study, and consider some things that weren't mentioned in the summaries. [sent-3, score-0.57]
4 This will lead to an architecture for building your own Realtime Time Analytics for Big-Data that might be easier to implement, using Facebook's experience as a starting point and guide as well as the experience gathered through a recent work with few of GigaSpaces customers. [sent-4, score-1.252]
5 The second post provide a summary of that new approach as well as a pattern and a demo for building your own Real Time Analytics system. [sent-5, score-0.758]
wordName wordTfidf (topN-words)
[('analytics', 0.516), ('todd', 0.355), ('realtime', 0.282), ('facebook', 0.265), ('hoff', 0.191), ('summarizing', 0.187), ('gathered', 0.17), ('demo', 0.16), ('summarize', 0.147), ('alex', 0.136), ('gigaspaces', 0.127), ('time', 0.124), ('mentioned', 0.123), ('usual', 0.122), ('guide', 0.117), ('study', 0.117), ('approach', 0.116), ('experience', 0.111), ('pattern', 0.106), ('alternative', 0.105), ('real', 0.1), ('summary', 0.099), ('building', 0.097), ('recent', 0.096), ('starting', 0.094), ('recently', 0.093), ('lead', 0.088), ('reading', 0.087), ('manager', 0.086), ('big', 0.085), ('well', 0.083), ('easier', 0.083), ('consider', 0.081), ('implement', 0.08), ('video', 0.078), ('engineering', 0.072), ('excellent', 0.072), ('case', 0.061), ('job', 0.059), ('might', 0.053), ('point', 0.05), ('post', 0.049), ('new', 0.048), ('things', 0.048), ('architecture', 0.042), ('first', 0.04), ('data', 0.037), ('work', 0.034), ('using', 0.023), ('like', 0.021)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 1081 high scalability-2011-07-18-Building your own Facebook Realtime Analytics System
Introduction: Recently, I was reading Todd Hoff's write-up on FaceBook real time analytics system . As usual, Todd did an excellent job in summarizing this video from Engineering Manager at Facebook Alex Himel . In the first post , I’d like to summarize the case study, and consider some things that weren't mentioned in the summaries. This will lead to an architecture for building your own Realtime Time Analytics for Big-Data that might be easier to implement, using Facebook's experience as a starting point and guide as well as the experience gathered through a recent work with few of GigaSpaces customers. The second post provide a summary of that new approach as well as a pattern and a demo for building your own Real Time Analytics system.. References Real Time analytics for Big Data: Facebook's New Realtime Analytics System Real Time Analytics for Big Data: An Alternative Approach
2 0.27420694 14 high scalability-2007-07-15-Web Analytics: An Hour a Day
Introduction: Web Analytics: An Hour A Day is the first book by an in-the-trenches practitioner of web analytics. It provides a unique insider’s perspective of the challenges and opportunities that web analytics presents to each person who touches the Web in your organization. Rather than spamming you with metrics and definitions, Web Analytics: An Hour A Day will enhance your mindset and teach you how to fish for yourself. Avinash Kaushik is a expert in web analytics and author of the top-rated blog Occam’s Razor (http://www.kaushik.net/avinash). In this book, he goes beyond web analytics concepts and definitions to provide a step-by-step guide to implementing a successful web analytics strategy. His revolutionary approach to web analytics challenges prevalent thinking about the field and guides readers to a solution that will provide truly informed and actionable insights.
3 0.25601077 9 high scalability-2007-07-15-Blog: Occam’s Razor by Avinash Kaushik
Introduction: Author of Web Analytics An Hour of Day . Has a fresh and practical take on unlocking the power of web research and web analytics to create truly data driven organizations for gaining a strategic competitive advantage. A Quick Hit of What's Inside Find You Web Analytics Soul Mate (How To Run An Effective Tool Pilot), AK’s Web Analytics Tool Evaluation “Tips From A Tough Life”, Web Analytics Data Sampling 411, Six Data Visualizations That Rock!, Why “looking beyond the click” to optimize the experience is so necessary. Site: http://www.kaushik.net/avinash/
4 0.21211916 1008 high scalability-2011-03-22-Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day
Introduction: Facebook did it again. They've built another system capable of doing something useful with ginormous streams of realtime data. Last time we saw Facebook release their New Real-Time Messaging System: HBase To Store 135+ Billion Messages A Month . This time it's a realtime analytics system handling over 20 billion events per day (200,000 events per second) with a lag of less than 30 seconds . Alex Himel, Engineering Manager at Facebook, explains what they've built ( video ) and the scale required: Social plugins have become an important and growing source of traffic for millions of websites over the past year. We released a new version of Insights for Websites last week to give site owners better analytics on how people interact with their content and to help them optimize their websites in real time. To accomplish this, we had to engineer a system that could process over 20 billion events per day (200,000 events per second) with a lag of less than 30 seconds. Alex does a
5 0.18157719 624 high scalability-2009-06-10-Hive - A Petabyte Scale Data Warehouse using Hadoop
Introduction: This post about using Hive and Hadoop for analytics comes straight from Facebook engineers. Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and non-engineering. Apart from ad hoc analysis and business intelligence applications used by analysts across the company, a number of Facebook products are also based on analytics. These products range from simple reporting applications like Insights for the Facebook Ad Network, to more advanced kind such as Facebook's Lexicon product. As a result a flexible infrastructure that caters to the needs of these diverse applications and users and that also scales up in a cost effective manner with the ever increasing amounts of data being generated on Facebook, is critical. Hive and Hadoop are the technologies that we have used to address these requirements at Facebook. Read the rest of the article on Engineering @ Facebook's Notes page
6 0.14757282 845 high scalability-2010-06-22-Exploring the software behind Facebook, the world’s largest site
7 0.13748172 720 high scalability-2009-10-12-High Performance at Massive Scale – Lessons learned at Facebook
8 0.1330547 1123 high scalability-2011-09-23-The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month…
9 0.1306259 1251 high scalability-2012-05-24-Build your own twitter like real time analytics - a step by step guide
10 0.10694929 1444 high scalability-2013-04-23-Facebook Secrets of Web Performance
11 0.10557486 1638 high scalability-2014-04-28-How Disqus Went Realtime with 165K Messages Per Second and Less than .2 Seconds Latency
12 0.10315812 943 high scalability-2010-11-16-Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month
16 0.10068903 721 high scalability-2009-10-13-Why are Facebook, Digg, and Twitter so hard to scale?
17 0.10022787 464 high scalability-2008-12-13-Strategy: Facebook Tweaks to Handle 6 Time as Many Memcached Requests
18 0.094931528 264 high scalability-2008-03-03-Read This Site and Ace Your Next Interview!
20 0.092205495 1644 high scalability-2014-05-07-Update on Disqus: It's Still About Realtime, But Go Demolishes Python
topicId topicWeight
[(0, 0.116), (1, 0.042), (2, 0.033), (3, 0.006), (4, 0.056), (5, -0.011), (6, -0.086), (7, 0.047), (8, 0.077), (9, 0.06), (10, 0.037), (11, 0.061), (12, 0.064), (13, -0.007), (14, -0.035), (15, 0.014), (16, 0.092), (17, -0.022), (18, 0.007), (19, 0.017), (20, 0.083), (21, 0.089), (22, 0.095), (23, 0.049), (24, 0.061), (25, -0.057), (26, 0.018), (27, -0.07), (28, 0.089), (29, 0.04), (30, -0.075), (31, -0.048), (32, 0.056), (33, 0.059), (34, -0.063), (35, 0.039), (36, 0.028), (37, -0.069), (38, 0.0), (39, 0.023), (40, 0.013), (41, 0.019), (42, 0.058), (43, -0.014), (44, 0.069), (45, -0.003), (46, -0.004), (47, -0.075), (48, 0.048), (49, -0.023)]
simIndex simValue blogId blogTitle
same-blog 1 0.98024482 1081 high scalability-2011-07-18-Building your own Facebook Realtime Analytics System
Introduction: Recently, I was reading Todd Hoff's write-up on FaceBook real time analytics system . As usual, Todd did an excellent job in summarizing this video from Engineering Manager at Facebook Alex Himel . In the first post , I’d like to summarize the case study, and consider some things that weren't mentioned in the summaries. This will lead to an architecture for building your own Realtime Time Analytics for Big-Data that might be easier to implement, using Facebook's experience as a starting point and guide as well as the experience gathered through a recent work with few of GigaSpaces customers. The second post provide a summary of that new approach as well as a pattern and a demo for building your own Real Time Analytics system.. References Real Time analytics for Big Data: Facebook's New Realtime Analytics System Real Time Analytics for Big Data: An Alternative Approach
2 0.81019646 624 high scalability-2009-06-10-Hive - A Petabyte Scale Data Warehouse using Hadoop
Introduction: This post about using Hive and Hadoop for analytics comes straight from Facebook engineers. Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and non-engineering. Apart from ad hoc analysis and business intelligence applications used by analysts across the company, a number of Facebook products are also based on analytics. These products range from simple reporting applications like Insights for the Facebook Ad Network, to more advanced kind such as Facebook's Lexicon product. As a result a flexible infrastructure that caters to the needs of these diverse applications and users and that also scales up in a cost effective manner with the ever increasing amounts of data being generated on Facebook, is critical. Hive and Hadoop are the technologies that we have used to address these requirements at Facebook. Read the rest of the article on Engineering @ Facebook's Notes page
3 0.7802645 845 high scalability-2010-06-22-Exploring the software behind Facebook, the world’s largest site
Introduction: Peter Alguacil at Pingdom wrote a HighScalability worthy article on Facebook's architecture: Exploring the software behind Facebook, the world’s largest site . It covers the challenges Facebook faces, the software Facebook uses, and the techniques Facebook uses to keep on scaling. Definitely worth a look.
4 0.75517464 720 high scalability-2009-10-12-High Performance at Massive Scale – Lessons learned at Facebook
Introduction: Jeff Rothschild, Vice President of Technology at Facebook gave a great presentation at UC San Diego on our favorite subject: " High Performance at Massive Scale – Lessons learned at Facebook ". The abstract for the talk is: Facebook has grown into one of the largest sites on the Internet today serving over 200 billion pages per month. The nature of social data makes engineering a site for this level of scale a particularly challenging proposition. In this presentation, I will discuss the aspects of social data that present challenges for scalability and will describe the the core architectural components and design principles that Facebook has used to address these challenges. In addition, I will discuss emerging technologies that offer new opportunities for building cost-effective high performance web architectures. There's a lot of interesting about this talk that we'll get into later, but I thought you might want a head start on learning how Facebook handles 30K+ machines,
5 0.74249583 599 high scalability-2009-05-14-Who Has the Most Web Servers?
Introduction: An interesting post on DataCenterKnowledge! 1&1 Internet: 55,000 servers Rackspace: 50,038 servers The Planet: 48,500 servers Akamai Technologies: 48,000 servers OVH: 40,000 servers SBC Communications: 29,193 servers Verizon: 25,788 servers Time Warner Cable: 24,817 servers SoftLayer: 21,000 servers AT&T;: 20,268 servers iWeb: 10,000 servers How about Google , Microsoft, Amazon , eBay , Yahoo, GoDaddy, Facebook? Check out the post on DataCenterKnowledge and of course here on highscalability.com!
6 0.74192882 1323 high scalability-2012-09-15-4 Reasons Facebook Dumped HTML5 and Went Native
8 0.69129026 264 high scalability-2008-03-03-Read This Site and Ace Your Next Interview!
9 0.68540633 562 high scalability-2009-04-10-Facebook's Aditya giving presentation on Facebook Architecture
10 0.68066758 966 high scalability-2010-12-31-Facebook in 20 Minutes: 2.7M Photos, 10.2M Comments, 4.6M Messages
11 0.63736123 1016 high scalability-2011-04-04-Scaling Social Ecommerce Architecture Case study
12 0.63735962 646 high scalability-2009-07-01-Podcast about Facebook's Cassandra Project and the New Wave of Distributed Databases
13 0.63103342 1123 high scalability-2011-09-23-The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month…
14 0.62721509 870 high scalability-2010-08-02-7 Scaling Strategies Facebook Used to Grow to 500 Million Users
15 0.62612826 1444 high scalability-2013-04-23-Facebook Secrets of Web Performance
16 0.61610305 1008 high scalability-2011-03-22-Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day
17 0.59942979 943 high scalability-2010-11-16-Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month
18 0.58889437 378 high scalability-2008-09-03-Some Facebook Secrets to Better Operations
19 0.58069116 464 high scalability-2008-12-13-Strategy: Facebook Tweaks to Handle 6 Time as Many Memcached Requests
20 0.57678956 596 high scalability-2009-05-11-Facebook, Hadoop, and Hive
topicId topicWeight
[(1, 0.161), (2, 0.18), (7, 0.199), (30, 0.196), (85, 0.039), (94, 0.083)]
simIndex simValue blogId blogTitle
same-blog 1 0.87669009 1081 high scalability-2011-07-18-Building your own Facebook Realtime Analytics System
Introduction: Recently, I was reading Todd Hoff's write-up on FaceBook real time analytics system . As usual, Todd did an excellent job in summarizing this video from Engineering Manager at Facebook Alex Himel . In the first post , I’d like to summarize the case study, and consider some things that weren't mentioned in the summaries. This will lead to an architecture for building your own Realtime Time Analytics for Big-Data that might be easier to implement, using Facebook's experience as a starting point and guide as well as the experience gathered through a recent work with few of GigaSpaces customers. The second post provide a summary of that new approach as well as a pattern and a demo for building your own Real Time Analytics system.. References Real Time analytics for Big Data: Facebook's New Realtime Analytics System Real Time Analytics for Big Data: An Alternative Approach
2 0.7982704 263 high scalability-2008-02-27-Product: System Imager - Automate Deployment and Installs
Introduction: From their website: SystemImager is software that makes the installation of Linux to masses of similar machines relatively easy. It makes software distribution, configuration, and operating system updates easy, and can also be used for content distribution. SystemImager makes it easy to do automated installs (clones), software distribution, content or data distribution, configuration changes, and operating system updates to your network of Linux machines. You can even update from one Linux release version to another! It can also be used to ensure safe production deployments. By saving your current production image before updating to your new production image, you have a highly reliable contingency mechanism. If the new production enviroment is found to be flawed, simply roll-back to the last production image with a simple update command! Some typical environments include: Internet server farms, database server farms, high performance clusters, computer labs, and corporate
3 0.79308271 783 high scalability-2010-02-24-Hot Scalability Links for February 24, 2010
Introduction: Cassandra @ Twitter: An Interview with Ryan King . Great interview by Alex Popescu on Twitter's thought process for switching to Cassandra. Twitter chose Cassandra because it had more big system features out of the box. Is that Cassandra FTW? I Had Downtime Today. Here’s What I’m Doing About It by Patrick McKenzie. Awesome deep dive into went wrong with Bingo Card Creator. Sh*t happens. How do you design a process to help prevent it from happening and how do you deal with problems with integrity when they do? High Availability Principle : Request Queueing by Ashish Soni. Queue request to ride out traffic spikes: 1) Request Queuing allows your system to operate at optimal throughput. 2) Your users only experience linear degradation versus exponential degradation. 3) Your system experiences NO degradation. pfffft twatter tweeter by Knowbuddy. The reason you should care [about NoSQL] is because now you have more options--you're not stuck trying to wedge your system into
4 0.78912455 1016 high scalability-2011-04-04-Scaling Social Ecommerce Architecture Case study
Introduction: A recent study showed that over 92 percent of executives from leading retailers are focusing their marketing efforts on Facebook and subsequent applications. Furthermore, over 71 percent of users have confirmed they are more likely to make a purchase after “liking” a brand they find online. ( source ) Sears Architect Tomer Gabel provides an insightful overview on how they built a Social Ecommerce solution for Sears.com that can handle complex relationship quires in real time. Tomer goes through: the architectural considerations behind their solution why they chose memory over disk how they partitioned the data to gain scalability why they chose to execute code with the data using GigaSpaces Map/Reduce execution framework how they integrated with Facebook why they chose GigaSpaces over Coherence and Terracotta for in-memory caching and scale In this post I tried to summarize the main takeaway from the interview. You can also watch the full interview (highly reco
5 0.78690028 500 high scalability-2009-01-22-Heterogeneous vs. Homogeneous System Architectures
Introduction: I follow a certain philosophy when developing system architectures. I assume that very few systems will ever exist in a consistent form for more than a short period of time. What constitutes a “short period of time” differs depending on the specifics of each system, but in an effort to quantify it, I generally find that it falls somewhere between a week and a month. The driving forces behind the need for an ever changing architecture are largely business requirement based. This is a side effect of the reality that software development, in most cases, is used as a supporting role within the business unit it serves. As business requirements (i.e. additional features, new products, etc.) pour forth, it is the developer’s job to evolve their software system to accommodate these requirements and provide a software based solution to whatever problems lay ahead. Given that many businesses can be identified as having the above characteristics, I can now begin to explain why I believe t
6 0.7831406 16 high scalability-2007-07-16-Book: High Performance MySQL
7 0.77827555 518 high scalability-2009-02-22-Building and Scaling a Startup on Rails: 12 Things We Learned the Hard Way
8 0.77605212 261 high scalability-2008-02-25-Make Your Site Run 10 Times Faster
9 0.7696842 68 high scalability-2007-08-20-TypePad Architecture
10 0.76066411 699 high scalability-2009-09-10-How to handle so many socket connection
11 0.76027197 308 high scalability-2008-04-22-Simple NFS failover solution with symbolic link?
12 0.75567853 788 high scalability-2010-03-04-How MySpace Tested Their Live Site with 1 Million Concurrent Users
13 0.75273144 182 high scalability-2007-12-12-Oracle Can Do Read-Write Splitting Too
14 0.75253564 319 high scalability-2008-05-14-Scaling an image upload service
15 0.74919069 334 high scalability-2008-05-29-Amazon Improves Diagonal Scaling Support with High-CPU Instances
16 0.74718839 1284 high scalability-2012-07-16-Cinchcast Architecture - Producing 1,500 Hours of Audio Every Day
17 0.74597347 26 high scalability-2007-07-25-Paper: Lightweight Web servers
18 0.74263132 991 high scalability-2011-02-16-Paper: An Experimental Investigation of the Akamai Adaptive Video Streaming
19 0.74132991 336 high scalability-2008-05-31-Biggest Under Reported Story: Google's BigTable Costs 10 Times Less than Amazon's SimpleDB
20 0.73580366 1459 high scalability-2013-05-16-Paper: Warp: Multi-Key Transactions for Key-Value Stores