high_scalability high_scalability-2007 high_scalability-2007-7 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: FeedBurner is a news feed management provider launched in 2004. FeedBurner provides custom RSS feeds and management tools to bloggers, podcasters, and other web-based content publishers. Services provided to publishers include traffic analysis and an optional advertising system. Site: http://www.feedburner.com Information Sources FeedBurner - Scalable Web Applications using MySQL and Java What the Web’s most popular sites are running on Platform Java MySQL Hibernate Spring Tomcat Cacti Load balancing: NetScaler Application Switches Routers, switches: HP, Cisco DNS: bind The Stats FeedBurner is growing faster than MySpace and Digg with 385% traffic growth. Total feeds: 808,707, Number of publishers: 471,686. 11 million subscribers in 190 countries Scaling History - July 2004: 300Kbps, 5,600 feeds, 3 app servers, 3 web servers 2 DB servers, Round Robin DNS - April 2005: 5Mbps, 47,700 feeds, 6 app servers, 6 web servers (same mac
sentIndex sentText sentNum sentScore
1 FeedBurner is a news feed management provider launched in 2004. [sent-1, score-0.103]
2 FeedBurner provides custom RSS feeds and management tools to bloggers, podcasters, and other web-based content publishers. [sent-2, score-0.383]
3 Services provided to publishers include traffic analysis and an optional advertising system. [sent-3, score-0.214]
4 Scalability Problem 2: Stats recording/mgmt - Every hit is recorded which slows everything down because of table level locks. [sent-11, score-0.146]
5 - Only stats for today are calculated in real-time. [sent-13, score-0.284]
6 Scalability Problem 3: Primary DB overload - Use master DB for everything. [sent-15, score-0.194]
7 slave load Scalability Problem 4: Total DB overload - Everything slowed down, was using the database has cache, used MyISAM - Add caching layers. [sent-18, score-0.319]
8 RAM on the machines, memcached, and in the database Scalability Problem 5: Lazy initialization - When stats get rolled up on demand popular feeds slowed down the whol system - Turned to batch processing, doing the rollups once a night. [sent-19, score-0.915]
9 Scalability Problem 6: Stats writes, again - Wrote to the master too much. [sent-20, score-0.194]
10 Added more stats tracking for ads, items, and circulation. [sent-22, score-0.284]
11 - Went to horizontal partitioning: ad serving, flare serving, circulation. [sent-25, score-0.107]
12 Scalability Problem 7: Master DB Failure - Using a primary and slave there's a single point of failure because it's hard to promote a slave to a master. [sent-27, score-0.54]
13 Too much hardware, didn't like having half the hardware going to waste, and needed a really fast connection between data centers. [sent-31, score-0.089]
14 - Create custom solution to download feeds to remote servers. [sent-32, score-0.383]
15 They have two sites in primary and secondary roles (active-passive) as their geographical redundancy plan. [sent-33, score-0.12]
16 Profile your code, usually only needed on hard-to-find leaks. [sent-39, score-0.089]
17 The greatest challenge was finding the most efficient ways to locate hotspots and bottlenecks in the application. [sent-40, score-0.176]
18 With a loose methodology for locating problems, the analysis became very easy. [sent-41, score-0.316]
19 Detailed monitoring was crucial in this, keeping track of disk, CPU and memory usage, slow database queries, handler details in MySQL, etc. [sent-42, score-0.08]
wordName wordTfidf (topN-words)
[('feeds', 0.304), ('stats', 0.284), ('db', 0.255), ('feedburner', 0.202), ('master', 0.194), ('slave', 0.162), ('cacti', 0.16), ('slowed', 0.157), ('problem', 0.152), ('publishers', 0.144), ('primary', 0.12), ('flare', 0.107), ('podcasters', 0.107), ('feed', 0.103), ('daythe', 0.101), ('lea', 0.101), ('locating', 0.101), ('netscaler', 0.101), ('rollups', 0.101), ('hotspots', 0.096), ('hottest', 0.096), ('promote', 0.096), ('doug', 0.09), ('needed', 0.089), ('machines', 0.086), ('bloggers', 0.083), ('serving', 0.082), ('crushing', 0.08), ('handler', 0.08), ('locate', 0.08), ('custom', 0.079), ('mbps', 0.079), ('methodology', 0.077), ('went', 0.077), ('servers', 0.077), ('total', 0.076), ('lazy', 0.076), ('multi', 0.076), ('monitored', 0.074), ('recorded', 0.073), ('everything', 0.073), ('july', 0.071), ('rss', 0.071), ('subscribers', 0.071), ('analysis', 0.07), ('million', 0.07), ('popular', 0.069), ('myspace', 0.069), ('loose', 0.068), ('plain', 0.068)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000002 7 high scalability-2007-07-12-FeedBurner Architecture
Introduction: FeedBurner is a news feed management provider launched in 2004. FeedBurner provides custom RSS feeds and management tools to bloggers, podcasters, and other web-based content publishers. Services provided to publishers include traffic analysis and an optional advertising system. Site: http://www.feedburner.com Information Sources FeedBurner - Scalable Web Applications using MySQL and Java What the Web’s most popular sites are running on Platform Java MySQL Hibernate Spring Tomcat Cacti Load balancing: NetScaler Application Switches Routers, switches: HP, Cisco DNS: bind The Stats FeedBurner is growing faster than MySpace and Digg with 385% traffic growth. Total feeds: 808,707, Number of publishers: 471,686. 11 million subscribers in 190 countries Scaling History - July 2004: 300Kbps, 5,600 feeds, 3 app servers, 3 web servers 2 DB servers, Round Robin DNS - April 2005: 5Mbps, 47,700 feeds, 6 app servers, 6 web servers (same mac
2 0.16031559 6 high scalability-2007-07-11-Friendster Architecture
Introduction: Friendster is one of the largest social network sites on the web. it emphasizes genuine friendships and the discovery of new people through friends. Site: http://www.friendster.com/ Information Sources Friendster - Scaling for 1 Billion Queries per day Platform MySQL Perl PHP Linux Apache What's Inside? Dual x86-64 AMD Opterons with 8 GB of RAM Faster disk (SAN) Optimized indexes Traditional 3-tier architecture with hardware load balancer in front of the databases Clusters based on types: ad, app, photo, monitoring, DNS, gallery search DB, profile DB, user infor DB, IM status cache, message DB, testimonial DB, friend DB, graph servers, gallery search, object cache. Lessons Learned No persistent database connections. Removed all sorts. Optimized indexes Don’t go after the biggest problems first Optimize without downtime Split load Moved sorting query types into the application and added LIMITS. Reduced ranges R
3 0.14492579 138 high scalability-2007-10-30-Feedblendr Architecture - Using EC2 to Scale
Introduction: A man had a dream. His dream was to blend a bunch of RSS/Atom/RDF feeds into a single feed. The man is Beau Lebens of Feedville and like most dreamers he was a little short on coin. So he took refuge in the home of a cheap hosting provider and Beau realized his dream, creating FEEDblendr . But FEEDblendr chewed up so much CPU creating blended feeds that the cheap hosting provider ordered Beau to find another home. Where was Beau to go? He eventually found a new home in the virtual machine room of Amazon's EC2. This is the story of how Beau was finally able to create his one feeds safe within the cradle of affordable CPU cycles. Site: http://feedblendr.com/ The Platform EC2 (Fedora Core 6 Lite distro) S3 Apache PHP MySQL DynDNS (for round robin DNS) The Stats Beau is a developer with some sysadmin skills, not a web server admin, so a lot of learning was involved in creating FEEDblendr. FEEDblendr uses 2 EC2 instances. The same Amazon Instance (AMI) is
4 0.14189146 1303 high scalability-2012-08-13-Ask HighScalability: Facing scaling issues with news feeds on Redis. Any advice?
Introduction: We just released a social section to our iOS app several days ago and we are already facing scaling issues with the users' news feeds. We're basically using a Fan-out-on-write (push) model for the users' news feeds (posts of people and topics they follow) and we're using Redis for this (backend is Rails on Heroku). However, our current 60,000 news feeds is ballooning our Redis store to almost 1GB in a just a few days (it's growing way too fast for our budget). Currently we're storing the entire news feed for the user (post id, post text, author, icon url, etc) and we cap the entries to 300 per feed. I'm wondering if we need to just store the post IDs of each user feed in Redis and then store the rest of the post information somewhere else? Would love some feedback here. In this case, our iOS app would make an api call to our Rails app to retrieve a user's news feed. Rails app would retrieve news feed list (just post IDs) from Redis, and then Rails app would need to query to g
5 0.14108966 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
Introduction: Pinterest has been riding an exponential growth curve, doubling every month and half. They've gone from 0 to 10s of billions of page views a month in two years, from 2 founders and one engineer to over 40 engineers, from one little MySQL server to 180 Web Engines, 240 API Engines, 88 MySQL DBs (cc2.8xlarge) + 1 slave each, 110 Redis Instances, and 200 Memcache Instances.Stunning growth. So what's Pinterest's story? To tell their story we have our bards, Pinterest'sYashwanth NelapatiandMarty Weiner, who tell the dramatic story of Pinterest's architecture evolution in a talk titledScaling Pinterest. This is the talk they would have liked to hear a year and half ago when they were scaling fast and there were a lot of options to choose from. And they made a lot of incorrect choices.This is a great talk. It's full of amazing details. It's also very practical, down to earth, and it contains strategies adoptable by nearly anyone. Highly recommended.Two of my favorite lessons from the talk:Arc
6 0.13382237 554 high scalability-2009-04-04-Digg Architecture
7 0.1314975 511 high scalability-2009-02-12-MySpace Architecture
8 0.12871727 1000 high scalability-2011-03-08-Medialets Architecture - Defeating the Daunting Mobile Device Data Deluge
9 0.12832369 1313 high scalability-2012-08-28-Making Hadoop Run Faster
10 0.12495308 1228 high scalability-2012-04-16-Instagram Architecture Update: What’s new with Instagram?
11 0.12440058 1538 high scalability-2013-10-28-Design Decisions for Scaling Your High Traffic Feeds
12 0.12439098 248 high scalability-2008-02-13-What's your scalability plan?
13 0.12422992 313 high scalability-2008-05-02-Friends for Sale Architecture - A 300 Million Page View-Month Facebook RoR App
14 0.12347317 152 high scalability-2007-11-13-Flickr Architecture
15 0.11989374 160 high scalability-2007-11-19-Tailrank Architecture - Learn How to Track Memes Across the Entire Blogosphere
16 0.11961653 1542 high scalability-2013-11-04-ESPN's Architecture at Scale - Operating at 100,000 Duh Nuh Nuhs Per Second
18 0.11803146 70 high scalability-2007-08-22-How many machines do you need to run your site?
19 0.11676957 1508 high scalability-2013-08-28-Sean Hull's 20 Biggest Bottlenecks that Reduce and Slow Down Scalability
20 0.11650407 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
topicId topicWeight
[(0, 0.201), (1, 0.071), (2, -0.034), (3, -0.16), (4, 0.001), (5, 0.013), (6, 0.001), (7, -0.035), (8, 0.017), (9, -0.011), (10, -0.01), (11, -0.016), (12, 0.029), (13, 0.005), (14, -0.027), (15, 0.02), (16, -0.002), (17, 0.018), (18, 0.006), (19, 0.035), (20, -0.001), (21, 0.032), (22, -0.03), (23, -0.012), (24, 0.042), (25, 0.016), (26, -0.011), (27, 0.044), (28, -0.028), (29, -0.008), (30, 0.047), (31, -0.05), (32, -0.002), (33, -0.022), (34, 0.029), (35, -0.007), (36, 0.025), (37, -0.029), (38, 0.022), (39, 0.067), (40, 0.008), (41, 0.021), (42, -0.057), (43, 0.042), (44, 0.083), (45, 0.03), (46, -0.071), (47, 0.013), (48, -0.058), (49, -0.015)]
simIndex simValue blogId blogTitle
same-blog 1 0.96105951 7 high scalability-2007-07-12-FeedBurner Architecture
Introduction: FeedBurner is a news feed management provider launched in 2004. FeedBurner provides custom RSS feeds and management tools to bloggers, podcasters, and other web-based content publishers. Services provided to publishers include traffic analysis and an optional advertising system. Site: http://www.feedburner.com Information Sources FeedBurner - Scalable Web Applications using MySQL and Java What the Web’s most popular sites are running on Platform Java MySQL Hibernate Spring Tomcat Cacti Load balancing: NetScaler Application Switches Routers, switches: HP, Cisco DNS: bind The Stats FeedBurner is growing faster than MySpace and Digg with 385% traffic growth. Total feeds: 808,707, Number of publishers: 471,686. 11 million subscribers in 190 countries Scaling History - July 2004: 300Kbps, 5,600 feeds, 3 app servers, 3 web servers 2 DB servers, Round Robin DNS - April 2005: 5Mbps, 47,700 feeds, 6 app servers, 6 web servers (same mac
2 0.80532885 6 high scalability-2007-07-11-Friendster Architecture
Introduction: Friendster is one of the largest social network sites on the web. it emphasizes genuine friendships and the discovery of new people through friends. Site: http://www.friendster.com/ Information Sources Friendster - Scaling for 1 Billion Queries per day Platform MySQL Perl PHP Linux Apache What's Inside? Dual x86-64 AMD Opterons with 8 GB of RAM Faster disk (SAN) Optimized indexes Traditional 3-tier architecture with hardware load balancer in front of the databases Clusters based on types: ad, app, photo, monitoring, DNS, gallery search DB, profile DB, user infor DB, IM status cache, message DB, testimonial DB, friend DB, graph servers, gallery search, object cache. Lessons Learned No persistent database connections. Removed all sorts. Optimized indexes Don’t go after the biggest problems first Optimize without downtime Split load Moved sorting query types into the application and added LIMITS. Reduced ranges R
3 0.77666968 68 high scalability-2007-08-20-TypePad Architecture
Introduction: TypePad is considered the largest paid blogging service in the world. After experience problems because of their meteoric growth, they eventually transitioned to an architecture patterned after their sister company, LiveJournal. Site: http://www.typepad.com/ The Platform MySQL Memcached Perl MogileFS Apache Linux The Stats As of 2005 TypePad sends 250mbps of traffic using multiple network pipes for 3TB of traffic a day. They were growing by 10-20% each month. I was unable to find more recent statistics. The Architecture Original Architecture: - Single server running Linux, Apache, Postgres, Perl, mod_perl - Storage was NFS on a filer. A Devastating Crash Caused a New Direction - A RAID controller failed and spewed data across all RAID disks. - The database was corrupted and the backups were corrupted. - Their redundant filers suffered from "split brain" syndrome. They move to LiveJournal Architecture type architecture which isn't surprising
4 0.76830339 1340 high scalability-2012-10-15-Simpler, Cheaper, Faster: Playtomic's Move from .NET to Node and Heroku
Introduction: This is a guest post by Ben Lowry, CEO of Playtomic . Playtomic is a game analytics service implemented in about 8000 mobile, web and downloadable games played by approximately 20 million people daily. Here's a good summary quote by Ben Lowry on Hacker News : Just over 20,000,000 people hit my API yesterday 700,749,252 times, playing the ~8,000 games my analytics platform is integrated in for a bit under 600 years in total play time. That's just yesterday. There are lots of different bottlenecks waiting for people operating at scale. Heroku and NodeJS, for my use case, eventually alleviated a whole bunch of them very cheaply. Playtomic began with an almost exclusively Microsoft.NET and Windows architecture which held up for 3 years before being replaced with a complete rewrite using NodeJS. During its lifetime the entire platform grew from shared space on a single server to a full dedicated, then spread to second dedicated, then the API server was offloaded to a VPS pro
5 0.76440173 1188 high scalability-2012-02-06-The Design of 99designs - A Clean Tens of Millions Pageviews Architecture
Introduction: 99designs is a crowdsourced design contest marketplace based out of Melbourne Australia. The idea is that if you have a design you need created you create a contest and designers compete to give you the best design within your budget. If you are a medium sized commerce site this is a clean example architecture of a site that reliably supports a lot of users and a complex workflow on the cloud. Lars Yencken wrote a nicely written overview of the architecture behind 99designs in Infrastructure at 99designs . Here's a gloss on their architecture: Stats Team has 8 devs, 2 dev ops, 2 ux/designers Hundreds of thousands of unique visitors a month Tens of millions pageviews a month Stack Largely an Amazon based stack Elastic Load Balancer (ELB) Varnish PHP with Apache/mod_php S3 Beanstalk for in-memory queing using Pheanstalk bindings Amazon's RDS (MySQL) Memcached MongoDB Redis Rightscale/Chef NewRelic, CloudWatch, Statsd Infrastructure
6 0.76119089 928 high scalability-2010-10-26-Scaling DISQUS to 75 Million Comments and 17,000 RPS
7 0.74942756 808 high scalability-2010-04-12-Poppen.de Architecture
8 0.73206288 72 high scalability-2007-08-22-Wikimedia architecture
9 0.72808492 1228 high scalability-2012-04-16-Instagram Architecture Update: What’s new with Instagram?
10 0.72642946 1220 high scalability-2012-04-02-YouPorn - Targeting 200 Million Views a Day and Beyond
11 0.72398418 821 high scalability-2010-05-03-MocoSpace Architecture - 3 Billion Mobile Page Views a Month
12 0.71928918 1638 high scalability-2014-04-28-How Disqus Went Realtime with 165K Messages Per Second and Less than .2 Seconds Latency
13 0.71579725 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
14 0.71573365 511 high scalability-2009-02-12-MySpace Architecture
15 0.71159792 554 high scalability-2009-04-04-Digg Architecture
16 0.71145248 602 high scalability-2009-05-17-Scaling Django Web Apps by Mike Malone
17 0.70595664 150 high scalability-2007-11-12-Slashdot Architecture - How the Old Man of the Internet Learned to Scale
18 0.70351303 389 high scalability-2008-09-23-How to Scale with Ruby on Rails
19 0.70342666 998 high scalability-2011-03-03-Stack Overflow Architecture Update - Now at 95 Million Page Views a Month
20 0.70100373 106 high scalability-2007-10-02-Secrets to Fotolog's Scaling Success
topicId topicWeight
[(1, 0.162), (2, 0.177), (10, 0.052), (23, 0.175), (30, 0.022), (40, 0.012), (61, 0.124), (77, 0.015), (79, 0.128), (85, 0.021), (94, 0.021)]
simIndex simValue blogId blogTitle
Introduction: Summary In this presentation, a three steps approach for turning your existing stateful tier-based/Spring-application into a dynamically scalable services application using OpenSpaces is demonstrated. The existing programming model is kept the same while focusing on abstracting and replacing the underlying implementations of the middleware stack in a way that will fit the scale-out model. Bio Nati Shalom is the CTO and Founder of GigaSpaces and responsible for the technology roadmap. He has 10 years of experience with distributed technology and architecture namely CORBA, Jini, J2EE, Grid and SOA. Nati is the Head of the Israeli Grid consortium and an evangelist of Space Based Architecture and Data Grid patterns. Blog: Gigaspaces Blog Read the rest of the article here on InfoQ .
2 0.94839281 979 high scalability-2011-01-27-Comet - An Example of the New Key-Code Databases
Introduction: Comet is an active distributed key-value store built at the University of Washington. The paper describing Comet is Comet: An active distributed key-value store , there are also slides , and a MP3 of a presentation given at OSDI '10 . Here's a succinct overview of Comet : Today's cloud storage services, such as Amazon S3 or peer-to-peer DHTs, are highly inflexible and impose a variety of constraints on their clients: specific replication and consistency schemes, fixed data timeouts, limited logging, etc. We witnessed such inflexibility first-hand as part of our Vanish work, where we used a DHT to store encryption keys temporarily. To address this issue, we built Comet, an extensible storage service that allows clients to inject snippets of code that control their data's behavior inside the storage service. I found this paper quite interesting because it takes the initial steps of collocating code with a key-value store, which turns it into what might called a key-code
3 0.94652379 654 high scalability-2009-07-09-No to SQL? Anti-database movement gains steam – My Take
Introduction: In this post i wrote my view on the anti SQL database movement and where the alternative approach fits in: - SQL databases are not going away anytime soon. - The current "one size fit it all" databases thinking was and is wrong. - There is definitely a place for a more a more specialized data management solutions alongside traditional SQL databases. In addition to the options that was mentioned on the original article i pointed out the the in-memory alternative approach and how that fits into the puzzle. I used a real life scenario: scalable Social network based eCommerce site where i outlined how in-memory approach was the only option they could scale and meet their application performance and response time requirements.
same-blog 4 0.93406087 7 high scalability-2007-07-12-FeedBurner Architecture
Introduction: FeedBurner is a news feed management provider launched in 2004. FeedBurner provides custom RSS feeds and management tools to bloggers, podcasters, and other web-based content publishers. Services provided to publishers include traffic analysis and an optional advertising system. Site: http://www.feedburner.com Information Sources FeedBurner - Scalable Web Applications using MySQL and Java What the Web’s most popular sites are running on Platform Java MySQL Hibernate Spring Tomcat Cacti Load balancing: NetScaler Application Switches Routers, switches: HP, Cisco DNS: bind The Stats FeedBurner is growing faster than MySpace and Digg with 385% traffic growth. Total feeds: 808,707, Number of publishers: 471,686. 11 million subscribers in 190 countries Scaling History - July 2004: 300Kbps, 5,600 feeds, 3 app servers, 3 web servers 2 DB servers, Round Robin DNS - April 2005: 5Mbps, 47,700 feeds, 6 app servers, 6 web servers (same mac
5 0.91833282 1105 high scalability-2011-08-25-The Cloud and The Consumer: The Impact on Bandwidth and Broadband
Introduction: Cloud-based services for all things digital will either drive – or die by – bandwidth Consumers, by definition, consume. In the realm of the Internet, they consume far more than they produce. Or so it’s been in the past. Broadband connectivity across all providers have long offered asymmetric network feeds because it mirrored reality: an HTTP request is significantly smaller than its corresponding response, and in general web-based activity is heavily biased toward fat download and thin upload speeds. The term “broadband” is really a misnomer, as it focuses only on the download speed and ignores the very narrowband of a typical consumer’s upload speed. cloud computing , or to be more accurate, cloud-hosted services aimed at consumers may very well change the status quo by necessity. As providers continue to push the notion of storing all things digital “in the cloud”, network providers must consider the impact on them – and the satisfaction of their customer base with performa
6 0.91794038 669 high scalability-2009-08-03-Building a Data Intensive Web Application with Cloudera, Hadoop, Hive, Pig, and EC2
7 0.91710335 1529 high scalability-2013-10-08-F1 and Spanner Holistically Compared
8 0.90679312 74 high scalability-2007-08-23-Product: Varnish
9 0.90434515 1559 high scalability-2013-12-06-Stuff The Internet Says On Scalability For December 6th, 2013
10 0.90176737 1109 high scalability-2011-09-02-Stuff The Internet Says On Scalability For September 2, 2011
11 0.86303818 1264 high scalability-2012-06-15-Cloud Bursting between AWS and Rackspace
12 0.86100703 853 high scalability-2010-07-08-Cloud AWS Infrastructure vs. Physical Infrastructure
13 0.85958111 1002 high scalability-2011-03-09-Productivity vs. Control tradeoffs in PaaS
14 0.8584252 990 high scalability-2011-02-15-Wordnik - 10 million API Requests a Day on MongoDB and Scala
15 0.85778672 1040 high scalability-2011-05-13-Stuff The Internet Says On Scalability For May 13, 2011
16 0.8576811 576 high scalability-2009-04-21-What CDN would you recommend?
17 0.85670012 1448 high scalability-2013-04-29-AWS v GCE Face-off and Why Innovation Needs Lower Cost Infrastructures
18 0.85522854 1180 high scalability-2012-01-24-The State of NoSQL in 2012
19 0.85443509 1289 high scalability-2012-07-23-State of the CDN: More Traffic, Stable Prices, More Products, Profits - Not So Much
20 0.85373116 1302 high scalability-2012-08-10-Stuff The Internet Says On Scalability For August 10, 2012