high_scalability high_scalability-2012 high_scalability-2012-1175 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: How do you scale an inbox that has multiple highly volatile feeds? That's a problem faced by social networks like Tumblr, Facebook, and Twitter. Follow a few hundred event sources and it's hard to scalably order an inbox so that you see a correct view as event sources continually publish new events. This can be considered like a view materialization problem in a database. In a database a view is a virtual table defined by a query that can be accessed like a table. Materialization refers to when the data behind the view is created. If a view is a join on several tables and that join is performed when the view is accessed, then performance will be slow. If the view is precomputed access to the view will be fast, but more resources are used, especially considering that the view may never be accessed. Your wall/inbox/stream is a view on all the people/things you follow. If you never look at your inbox then materializing the view in your inbox is a waste of resources, yet you'll be ma
sentIndex sentText sentNum sentScore
1 How do you scale an inbox that has multiple highly volatile feeds? [sent-1, score-0.349]
2 Follow a few hundred event sources and it's hard to scalably order an inbox so that you see a correct view as event sources continually publish new events. [sent-3, score-1.371]
3 This can be considered like a view materialization problem in a database. [sent-4, score-0.488]
4 In a database a view is a virtual table defined by a query that can be accessed like a table. [sent-5, score-0.431]
5 Materialization refers to when the data behind the view is created. [sent-6, score-0.348]
6 If a view is a join on several tables and that join is performed when the view is accessed, then performance will be slow. [sent-7, score-0.688]
7 If the view is precomputed access to the view will be fast, but more resources are used, especially considering that the view may never be accessed. [sent-8, score-0.917]
8 Your wall/inbox/stream is a view on all the people/things you follow. [sent-9, score-0.283]
9 If you never look at your inbox then materializing the view in your inbox is a waste of resources, yet you'll be mad if displaying your inbox takes forever because all your event streams must be read, sorted, and filtered. [sent-10, score-2.213]
10 What's a smart way of handling the materialization problem? [sent-11, score-0.145]
11 This technique minimizes system cost both for workloads with a high query rate and those with a high event rate. [sent-14, score-0.413]
12 I learned about this paper from Tumblr's Blake Matheny, in an interview with him for a forthcoming post. [sent-16, score-0.154]
13 This is broadly how they handle the inbox problem at Tumblr. [sent-17, score-0.477]
14 Abstract from the paper: Near real-time event streams are becoming a key feature of many popular web applications. [sent-19, score-0.461]
15 Many web sites allow users to create a personalized feed by selecting one or more event streams they wish to follow. [sent-20, score-0.873]
16 Examples include Twitter and Facebook, which allow a user to follow other users' activity, and iGoogle and My Yahoo, which allow users to follow selected RSS streams. [sent-21, score-0.425]
17 Constructing such a feed must be fast so the page loads quickly, yet reflects recent updates to the underlying event streams. [sent-23, score-0.523]
18 The wide fanout of popular streams (those with many followers) and high skew (fanout and update rates vary widely) make it difficult to scale such applications. [sent-24, score-0.4]
19 We associate feeds with consumers and event streams with producers. [sent-25, score-0.683]
20 We demonstrate that the best performance results from selectively materializing each consumer's feed: events from high-rate producers are retrieved at query time, while events from lower-rate producers are materialized in advance. [sent-26, score-1.155]
wordName wordTfidf (topN-words)
[('inbox', 0.349), ('materializing', 0.291), ('view', 0.283), ('event', 0.257), ('streams', 0.204), ('feed', 0.203), ('selectively', 0.185), ('feeds', 0.159), ('materialization', 0.145), ('frenzy', 0.141), ('fanout', 0.128), ('producers', 0.123), ('feeding', 0.12), ('tumblr', 0.113), ('yahoo', 0.113), ('events', 0.112), ('follow', 0.1), ('consumer', 0.088), ('rate', 0.084), ('aview', 0.084), ('producer', 0.079), ('cooper', 0.079), ('knob', 0.079), ('ramakrishnan', 0.079), ('paper', 0.078), ('allow', 0.077), ('accessed', 0.076), ('sources', 0.076), ('forthcoming', 0.076), ('silberstein', 0.076), ('matheny', 0.076), ('scalably', 0.073), ('blake', 0.073), ('results', 0.072), ('query', 0.072), ('users', 0.071), ('constructing', 0.07), ('broadly', 0.068), ('skew', 0.068), ('precomputed', 0.068), ('displaying', 0.067), ('refers', 0.065), ('materialized', 0.065), ('pnuts', 0.065), ('mad', 0.064), ('associate', 0.063), ('reflects', 0.063), ('join', 0.061), ('personalized', 0.061), ('problem', 0.06)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 1175 high scalability-2012-01-17-Paper: Feeding Frenzy: Selectively Materializing Users’ Event Feeds
Introduction: How do you scale an inbox that has multiple highly volatile feeds? That's a problem faced by social networks like Tumblr, Facebook, and Twitter. Follow a few hundred event sources and it's hard to scalably order an inbox so that you see a correct view as event sources continually publish new events. This can be considered like a view materialization problem in a database. In a database a view is a virtual table defined by a query that can be accessed like a table. Materialization refers to when the data behind the view is created. If a view is a join on several tables and that join is performed when the view is accessed, then performance will be slow. If the view is precomputed access to the view will be fast, but more resources are used, especially considering that the view may never be accessed. Your wall/inbox/stream is a view on all the people/things you follow. If you never look at your inbox then materializing the view in your inbox is a waste of resources, yet you'll be ma
2 0.20419709 1538 high scalability-2013-10-28-Design Decisions for Scaling Your High Traffic Feeds
Introduction: Guest post by Thierry Schellenbach, Founder/CTO of Fashiolista.com, follow @tschellenbach on Twitter and Github Fashiolista started out as a hobby project which we built on the side. We had absolutely no idea it would grow into one of the largest online fashion communities. The entire first version took about two weeks to develop and our feed implementation was dead simple. We’ve come a long way since then and I’d like to share our experience with scaling feed systems. Feeds are a core component of many large startups such as Pinterest, Instagram, Wanelo and Fashiolista. At Fashiolista the feed system powers the flat feed , aggregated feed and the notification system. This article will explain the troubles we ran into when scaling our feeds and the design decisions involved with building your own solution. Understanding the basics of how these feed systems work is essential as more and more applications rely on them. Furthermore we’ve open sourced Feedly , the Python m
3 0.17731151 1461 high scalability-2013-05-20-The Tumblr Architecture Yahoo Bought for a Cool Billion Dollars
Introduction: It's being reportedYahoo bought Tumblr for $1.1 billion. You may recallInstagram was profiled on HighScalabilityand they were also bought by Facebook for a ton of money. A coincidence? You be the judge.Just what is Yahoo buying? The business acumen of the deal is not something I can judge, but if you are doing due diligence on the technology then Tumblr would probably get a big thumbs up. To see why, please keep on reading...With over 15 billion page views a month Tumblr has become an insanely popular blogging platform. Users may like Tumblr for its simplicity, its beauty, its strong focus on user experience, or its friendly and engaged community, but like it they do.Growing at over 30% a month has not been without challenges. Some reliability problems among them. It helps to realize that Tumblr operates at surprisingly huge scales: 500 million page views a day, a peak rate of ~40k requests per second, ~3TB of new data to store a day, all running on 1000+ servers.One of the common patt
4 0.17352824 1191 high scalability-2012-02-13-Tumblr Architecture - 15 Billion Page Views a Month and Harder to Scale than Twitter
Introduction: With over 15 billion page views a month Tumblr has become an insanely popular blogging platform. Users may like Tumblr for its simplicity, its beauty, its strong focus on user experience, or its friendly and engaged community, but like it they do.Growing at over 30% a month has not been without challenges. Some reliability problems among them. It helps to realize that Tumblr operates at surprisingly huge scales: 500 million page views a day, a peak rate of ~40k requests per second, ~3TB of new data to store a day, all running on 1000+ servers.One of the common patterns across successful startups is the perilous chasm crossing from startup to wildly successful startup. Finding people, evolving infrastructures, servicing old infrastructures, while handling huge month over month increases in traffic, all with only four engineers, means you have to make difficult choices about what to work on. This was Tumblr's situation. Now with twenty engineers there's enough energy to work on issues an
Introduction: With over 15 billion page views a month Tumblr has become an insanely popular blogging platform. Users may like Tumblr for its simplicity, its beauty, its strong focus on user experience, or its friendly and engaged community, but like it they do.Growing at over 30% a month has not been without challenges. Some reliability problems among them. It helps to realize that Tumblr operates at surprisingly huge scales: 500 million page views a day, a peak rate of ~40k requests per second, ~3TB of new data to store a day, all running on 1000+ servers.One of the common patterns across successful startups is the perilous chasm crossing from startup to wildly successful startup. Finding people, evolving infrastructures, servicing old infrastructures, while handling huge month over month increases in traffic, all with only four engineers, means you have to make difficult choices about what to work on. This was Tumblr's situation. Now with twenty engineers there's enough energy to work on issues an
6 0.12257134 141 high scalability-2007-11-05-Quick question about efficiently implementing Facebook 'news feed' like functionality
7 0.11606656 1008 high scalability-2011-03-22-Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day
10 0.10685256 721 high scalability-2009-10-13-Why are Facebook, Digg, and Twitter so hard to scale?
11 0.10547429 7 high scalability-2007-07-12-FeedBurner Architecture
12 0.10376027 1303 high scalability-2012-08-13-Ask HighScalability: Facing scaling issues with news feeds on Redis. Any advice?
14 0.10258079 406 high scalability-2008-10-08-Strategy: Flickr - Do the Essential Work Up-front and Queue the Rest
15 0.10166786 1359 high scalability-2012-11-15-Gone Fishin': Justin.Tv's Live Video Broadcasting Architecture
16 0.10030168 796 high scalability-2010-03-16-Justin.tv's Live Video Broadcasting Architecture
17 0.095356382 265 high scalability-2008-03-03-Two data streams for a happy website
18 0.095212065 676 high scalability-2009-08-08-Yahoo!'s PNUTS Database: Too Hot, Too Cold or Just Right?
19 0.092176259 1123 high scalability-2011-09-23-The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month…
20 0.090529472 1000 high scalability-2011-03-08-Medialets Architecture - Defeating the Daunting Mobile Device Data Deluge
topicId topicWeight
[(0, 0.143), (1, 0.066), (2, -0.007), (3, -0.039), (4, 0.031), (5, 0.006), (6, -0.04), (7, 0.038), (8, 0.016), (9, 0.004), (10, 0.053), (11, 0.083), (12, 0.024), (13, -0.043), (14, -0.032), (15, 0.062), (16, -0.031), (17, -0.076), (18, 0.032), (19, -0.02), (20, -0.014), (21, 0.011), (22, -0.001), (23, 0.014), (24, 0.005), (25, -0.009), (26, -0.042), (27, 0.038), (28, 0.02), (29, -0.037), (30, 0.05), (31, -0.008), (32, 0.04), (33, 0.036), (34, -0.027), (35, 0.005), (36, 0.024), (37, -0.028), (38, -0.052), (39, -0.031), (40, -0.007), (41, 0.032), (42, -0.005), (43, -0.013), (44, 0.025), (45, 0.025), (46, -0.049), (47, -0.033), (48, 0.07), (49, 0.038)]
simIndex simValue blogId blogTitle
same-blog 1 0.94823557 1175 high scalability-2012-01-17-Paper: Feeding Frenzy: Selectively Materializing Users’ Event Feeds
Introduction: How do you scale an inbox that has multiple highly volatile feeds? That's a problem faced by social networks like Tumblr, Facebook, and Twitter. Follow a few hundred event sources and it's hard to scalably order an inbox so that you see a correct view as event sources continually publish new events. This can be considered like a view materialization problem in a database. In a database a view is a virtual table defined by a query that can be accessed like a table. Materialization refers to when the data behind the view is created. If a view is a join on several tables and that join is performed when the view is accessed, then performance will be slow. If the view is precomputed access to the view will be fast, but more resources are used, especially considering that the view may never be accessed. Your wall/inbox/stream is a view on all the people/things you follow. If you never look at your inbox then materializing the view in your inbox is a waste of resources, yet you'll be ma
2 0.70673347 1461 high scalability-2013-05-20-The Tumblr Architecture Yahoo Bought for a Cool Billion Dollars
Introduction: It's being reportedYahoo bought Tumblr for $1.1 billion. You may recallInstagram was profiled on HighScalabilityand they were also bought by Facebook for a ton of money. A coincidence? You be the judge.Just what is Yahoo buying? The business acumen of the deal is not something I can judge, but if you are doing due diligence on the technology then Tumblr would probably get a big thumbs up. To see why, please keep on reading...With over 15 billion page views a month Tumblr has become an insanely popular blogging platform. Users may like Tumblr for its simplicity, its beauty, its strong focus on user experience, or its friendly and engaged community, but like it they do.Growing at over 30% a month has not been without challenges. Some reliability problems among them. It helps to realize that Tumblr operates at surprisingly huge scales: 500 million page views a day, a peak rate of ~40k requests per second, ~3TB of new data to store a day, all running on 1000+ servers.One of the common patt
3 0.70595902 1191 high scalability-2012-02-13-Tumblr Architecture - 15 Billion Page Views a Month and Harder to Scale than Twitter
Introduction: With over 15 billion page views a month Tumblr has become an insanely popular blogging platform. Users may like Tumblr for its simplicity, its beauty, its strong focus on user experience, or its friendly and engaged community, but like it they do.Growing at over 30% a month has not been without challenges. Some reliability problems among them. It helps to realize that Tumblr operates at surprisingly huge scales: 500 million page views a day, a peak rate of ~40k requests per second, ~3TB of new data to store a day, all running on 1000+ servers.One of the common patterns across successful startups is the perilous chasm crossing from startup to wildly successful startup. Finding people, evolving infrastructures, servicing old infrastructures, while handling huge month over month increases in traffic, all with only four engineers, means you have to make difficult choices about what to work on. This was Tumblr's situation. Now with twenty engineers there's enough energy to work on issues an
Introduction: With over 15 billion page views a month Tumblr has become an insanely popular blogging platform. Users may like Tumblr for its simplicity, its beauty, its strong focus on user experience, or its friendly and engaged community, but like it they do.Growing at over 30% a month has not been without challenges. Some reliability problems among them. It helps to realize that Tumblr operates at surprisingly huge scales: 500 million page views a day, a peak rate of ~40k requests per second, ~3TB of new data to store a day, all running on 1000+ servers.One of the common patterns across successful startups is the perilous chasm crossing from startup to wildly successful startup. Finding people, evolving infrastructures, servicing old infrastructures, while handling huge month over month increases in traffic, all with only four engineers, means you have to make difficult choices about what to work on. This was Tumblr's situation. Now with twenty engineers there's enough energy to work on issues an
Introduction: Toy solutions solving Twitter’s “problems” are a favorite scalability trope. Everybody has this idea that Twitter is easy. With a little architectural hand waving we have a scalable Twitter, just that simple. Well, it’s not that simple as Raffi Krikorian , VP of Engineering at Twitter, describes in his superb and very detailed presentation on Timelines at Scale . If you want to know how Twitter works - then start here. It happened gradually so you may have missed it, but Twitter has grown up. It started as a struggling three-tierish Ruby on Rails website to become a beautifully service driven core that we actually go to now to see if other services are down. Quite a change. Twitter now has 150M world wide active users, handles 300K QPS to generate timelines, and a firehose that churns out 22 MB/sec. 400 million tweets a day flow through the system and it can take up to 5 minutes for a tweet to flow from Lady Gaga’s fingers to her 31 million followers. A couple o
6 0.69270104 1538 high scalability-2013-10-28-Design Decisions for Scaling Your High Traffic Feeds
7 0.64960134 1148 high scalability-2011-11-29-DataSift Architecture: Realtime Datamining at 120,000 Tweets Per Second
9 0.61767608 1008 high scalability-2011-03-22-Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day
11 0.6021288 141 high scalability-2007-11-05-Quick question about efficiently implementing Facebook 'news feed' like functionality
12 0.59723383 976 high scalability-2011-01-20-75% Chance of Scale - Leveraging the New Scaleogenic Environment for Growth
13 0.59542435 721 high scalability-2009-10-13-Why are Facebook, Digg, and Twitter so hard to scale?
14 0.57897055 1476 high scalability-2013-06-14-Stuff The Internet Says On Scalability For June 14, 2013
15 0.57373106 1194 high scalability-2012-02-16-A Super Short on the Youporn Stack - 300K QPS and 100 Million Page Views Per Day
16 0.56870502 1228 high scalability-2012-04-16-Instagram Architecture Update: What’s new with Instagram?
17 0.56832039 774 high scalability-2010-02-08-How FarmVille Scales to Harvest 75 Million Players a Month
18 0.5676623 431 high scalability-2008-10-27-Notify.me Architecture - Synchronicity Kills
19 0.56760103 715 high scalability-2009-10-06-10 Ways to Take your Site from One to One Million Users by Kevin Rose
20 0.56524408 1356 high scalability-2012-11-07-Gone Fishin': 10 Ways to Take your Site from One to One Million Users by Kevin Rose
topicId topicWeight
[(1, 0.154), (2, 0.179), (10, 0.031), (30, 0.049), (47, 0.021), (61, 0.104), (73, 0.248), (79, 0.095), (85, 0.021), (94, 0.018)]
simIndex simValue blogId blogTitle
Introduction: It's time to do something a little different and for me that doesn't mean cutting off my hair and joining a monastery, nor does it mean buying a cherry red convertible (yet), it means doing a webinar! On December 14th, 2:00 PM - 3:00 PM EST, I'll be hosting What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications . The webinar is sponsored by VoltDB, but it will be completely vendor independent, as that's the only honor preserving and technically accurate way of doing these things. The webinar will run about 60 minutes, with 40 minutes of speechifying and 20 minutes for questions. The hashtag for the event on Twitter will be SQLNoSQL . I'll be monitoring that hashtag if you have any suggestions for the webinar or if you would like to ask questions during the webinar. The motivation for me to do the webinar was a talk I had with another audience member at the NoSQL Evening in Palo Alto . He said he came from a Java background and was confused ab
Introduction: It's time to do something a little different and for me that doesn't mean cutting off my hair and joining a monastery, nor does it mean buying a cherry red convertible (yet), it means doing a webinar! On December 14th, 2:00 PM - 3:00 PM EST, I'll be hosting What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications . The webinar is sponsored by VoltDB, but it will be completely vendor independent, as that's the only honor preserving and technically accurate way of doing these things. The webinar will run about 60 minutes, with 40 minutes of speechifying and 20 minutes for questions. The hashtag for the event on Twitter will be SQLNoSQL . I'll be monitoring that hashtag if you have any suggestions for the webinar or if you would like to ask questions during the webinar. The motivation for me to do the webinar was a talk I had with another audience member at the NoSQL Evening in Palo Alto . He said he came from a Java background and was confused ab
3 0.91398257 471 high scalability-2008-12-19-Gigaspaces curbs latency outliers with Java Real Time
Introduction: Today, most banks have migrated their internal software development from C/C++ to the Java language because of well-known advantages in development productivity (Java Platform), robustness & reliability (Garbage Collector) and platform independence (Java Bytecode). They may even have gotten better throughput performance through the use of standard architectures and application servers (Java Enterprise Edition). Among the few banking applications that have not been able to benefit yet from the Java revolution, you find the latency-critical applications connected to the trading floor. Why? Because of the unpredictable pauses introduced by the garbage collector which result in significant jitter (variance of execution time). In this post Frederic Pariente Engineering Manager at Sun Microsystems posted a summary of a case study on how the use of Sun Real Time JVM and GigaSpaces was used in the context of of a customer proof-of-concept this summer to ensure guaranteed latency per m
same-blog 4 0.90270495 1175 high scalability-2012-01-17-Paper: Feeding Frenzy: Selectively Materializing Users’ Event Feeds
Introduction: How do you scale an inbox that has multiple highly volatile feeds? That's a problem faced by social networks like Tumblr, Facebook, and Twitter. Follow a few hundred event sources and it's hard to scalably order an inbox so that you see a correct view as event sources continually publish new events. This can be considered like a view materialization problem in a database. In a database a view is a virtual table defined by a query that can be accessed like a table. Materialization refers to when the data behind the view is created. If a view is a join on several tables and that join is performed when the view is accessed, then performance will be slow. If the view is precomputed access to the view will be fast, but more resources are used, especially considering that the view may never be accessed. Your wall/inbox/stream is a view on all the people/things you follow. If you never look at your inbox then materializing the view in your inbox is a waste of resources, yet you'll be ma
5 0.89934546 1587 high scalability-2014-01-29-10 Things Bitly Should Have Monitored
Introduction: Monitor, monitor, monitor. That's the advice every startup gives once they reach a certain size. But can you ever monitor enough? If you are Bitly and everyone will complain when you are down, probably not. Here are 10 Things We Forgot to Monitor from Bitly, along with good stories and copious amounts of code snippets. Well worth reading, especially after you've already started monitoring the lower hanging fruit. An interesting revelation from the article is that: We run bitly split across two data centers, one is a managed environment with DELL hardware, and the second is Amazon EC2. Fork Rate . A strange configuration issue caused processes to be created at a rate of several hundred a second rather than the expected 1-10/second. Flow control packets . A network configuration that honors flow control packets and isn’t configured to disable them, can temporarily cause dropped traffic. Swap In/Out Rate . Measure the right thing. It's the rate memory is swapped
6 0.86814415 125 high scalability-2007-10-18-another approach to replication
7 0.85859603 333 high scalability-2008-05-28-Webinar: Designing and Implementing Scalable Applications with Memcached and MySQL
8 0.85798138 1196 high scalability-2012-02-20-Berkeley DB Architecture - NoSQL Before NoSQL was Cool
9 0.83119643 980 high scalability-2011-01-28-Stuff The Internet Says On Scalability For January 28, 2011
10 0.82257664 33 high scalability-2007-07-26-ThemBid Architecture
11 0.82160473 986 high scalability-2011-02-10-Database Isolation Levels And Their Effects on Performance and Scalability
12 0.8172363 192 high scalability-2007-12-25-IBMer Says LAMP Can't Scale
13 0.81697369 284 high scalability-2008-03-19-RAD Lab is Creating a Datacenter Operating System
14 0.81149608 795 high scalability-2010-03-16-1 Billion Reasons Why Adobe Chose HBase
15 0.80920261 181 high scalability-2007-12-11-Hosting and CDN for startup video sharing site
16 0.807486 1642 high scalability-2014-05-02-Stuff The Internet Says On Scalability For May 2nd, 2014
17 0.80080587 217 high scalability-2008-01-17-Load Balancing of web server traffic
18 0.76572359 709 high scalability-2009-09-19-Space Based Programming in .NET
19 0.7654593 1313 high scalability-2012-08-28-Making Hadoop Run Faster
20 0.74879158 1227 high scalability-2012-04-13-Stuff The Internet Says On Scalability For April 13, 2012