high_scalability high_scalability-2010 high_scalability-2010-808 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: This is a guest a post by Alvaro Videla describing their architecture for Poppen.de , a popular German dating site. This site is very much NSFW, so be careful before clicking on the link. What I found most interesting is how they manage to sucessfully blend a little of the old with a little of the new, using technologies like Nginx, MySQL, CouchDB, and Erlang, Memcached, RabbitMQ, PHP, Graphite, Red5, and Tsung. What is Poppen.de? Poppen.de (NSFW) is the top dating website in Germany, and while it may be a small site compared to giants like Flickr or Facebook, we believe it's a nice architecture to learn from if you are starting to get some scaling problems. The Stats 2.000.000 users 20.000 concurrent users 300.000 private messages per day 250.000 logins per day We have a team of eleven developers, two designers and two sysadmins for this project. Business Model The site works with a freemium model, where users can do for free things like: Search
sentIndex sentText sentNum sentScore
1 Much more… If they want to send unlimited messages or have unlimited picture uploads then they can pay for different kinds of membership according to their needs. [sent-20, score-0.378]
2 Then we have separate machines to serve the site images. [sent-28, score-0.296]
3 One of the cool things that Nginx lets us do is to deliver many requests out of Memcached, without the need of hitting the PHP machines to get content that is already cached. [sent-34, score-0.455]
4 There are 8000 requests per minute delivered out of the Memcached. [sent-38, score-0.234]
5 If the picture is not in the local cache filesystem, the Nginx will download the picture from the central server, store in its local cache and serve it. [sent-42, score-0.428]
6 This lets us load balance the image distribution and alleviate the load in the main storage machine. [sent-43, score-0.428]
7 On one hand this means extra resource footprint, on the other hand it gives us speed of development and a well know framework that lets us integrate new developers to the team with ease. [sent-54, score-0.316]
8 Thanks to the fact that the framework is easy to customize and configure, we were able to cache most of the expensive calculations that were adding extra load to the servers in APC. [sent-60, score-0.245]
9 We want to partition the data by user id, since most of the information on the site is centered on the user itself, like images, videos, messages, etc. [sent-66, score-0.319]
10 We also have an NDB cluster composed by 4 machines for write intensive data, like the statistics of which user visited which other user's profile. [sent-71, score-0.244]
11 We have a system that lets automatically invalidate the cache every time one record of that table is modified. [sent-82, score-0.391]
12 During the last month we have been moving more and more stuff to the queue, meaning that at the moment the 28 PHP frontend machines are publishing around 500. [sent-88, score-0.234]
13 To enqueue messages we use one of the coolest features provided by PHP-FPM which is the fastcgi_finish_request() function. [sent-91, score-0.249]
14 This allows us to send messages to the queue in an asynchronous fashion. [sent-92, score-0.399]
15 We have two machines dedicated to consume those messages, running at the moment 40 PHP processes in total to consume the jobs. [sent-96, score-0.234]
16 This system lets us improve the resource management. [sent-100, score-0.227]
17 From requests per module/action to Memcached hits/misses, RabbitMQ status monitoring, Unix Load of the servers and much more. [sent-112, score-0.233]
18 We were able to run one version of the site in half of the servers while the new version was running in the others. [sent-119, score-0.385]
19 On mid 2009 we were streaming 17TB of video per month to our users. [sent-128, score-0.239]
20 Then we replayed back that traffic and hit the machines in our lab with thousands of concurrent users generated by Tsung. [sent-132, score-0.294]
wordName wordTfidf (topN-words)
[('nginx', 0.272), ('xhprof', 0.212), ('php', 0.195), ('graphite', 0.191), ('rabbitmq', 0.175), ('messages', 0.159), ('machines', 0.155), ('unix', 0.147), ('site', 0.141), ('lets', 0.138), ('symfony', 0.129), ('cache', 0.121), ('profiles', 0.117), ('memcached', 0.11), ('alvaro', 0.106), ('videla', 0.106), ('erlang', 0.1), ('proved', 0.1), ('per', 0.098), ('nsfw', 0.096), ('couchdb', 0.095), ('picture', 0.093), ('version', 0.091), ('enqueue', 0.09), ('user', 0.089), ('us', 0.089), ('queue', 0.086), ('logs', 0.084), ('pictures', 0.083), ('apc', 0.083), ('mid', 0.08), ('moment', 0.079), ('image', 0.077), ('users', 0.073), ('requests', 0.073), ('logins', 0.071), ('dating', 0.071), ('benchmarking', 0.066), ('lab', 0.066), ('table', 0.066), ('invalidate', 0.066), ('send', 0.065), ('profile', 0.063), ('minute', 0.063), ('load', 0.062), ('servers', 0.062), ('video', 0.061), ('function', 0.061), ('uploads', 0.061), ('login', 0.059)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 808 high scalability-2010-04-12-Poppen.de Architecture
Introduction: This is a guest a post by Alvaro Videla describing their architecture for Poppen.de , a popular German dating site. This site is very much NSFW, so be careful before clicking on the link. What I found most interesting is how they manage to sucessfully blend a little of the old with a little of the new, using technologies like Nginx, MySQL, CouchDB, and Erlang, Memcached, RabbitMQ, PHP, Graphite, Red5, and Tsung. What is Poppen.de? Poppen.de (NSFW) is the top dating website in Germany, and while it may be a small site compared to giants like Flickr or Facebook, we believe it's a nice architecture to learn from if you are starting to get some scaling problems. The Stats 2.000.000 users 20.000 concurrent users 300.000 private messages per day 250.000 logins per day We have a team of eleven developers, two designers and two sysadmins for this project. Business Model The site works with a freemium model, where users can do for free things like: Search
2 0.24916156 314 high scalability-2008-05-03-Product: nginx
Introduction: Update 6 : nginx_http_push_module . Turn nginx into a long-polling message queuing HTTP push server. Update 5 : In Load Balancer Update Barry describes how WordPress.com moved from Pound to Nginx and are now "regularly serving about 8-9k requests/second and about 1.2Gbit/sec through a few Nginx instances and have plenty of room to grow!". Update 4 : Nginx better than Pound for load balancing. Pound spikes at 80% CPU, Nginx uses 3% and is easier to understand and better documented. Update 3 : igvita.com combines two cool tools together for better performance in Nginx and Memcached, a 400% boost! . Update 2 : Software Project on Installing Nginx Web Server w/ PHP and SSL . Breaking away from mother Apache can be a scary proposition and this kind of getting started article really helps easy the separation. Update: Slicehost has some nice tutorials on setting up Nginx . From their website: Nginx ("engine x") is a high-performance HTTP server and reverse proxy, as wel
3 0.1885481 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
Introduction: The primero recommendation for speeding up a website is almost always to add cache and more cache. And after that add a little more cache just in case. Memcached is almost always given as the recommended cache to use. What we don't often hear is how to effectively use a cache in our own products. MySQL hosted two excellent webinars (referenced below) on the subject of how to deploy and use memcached. The star of the show, other than MySQL of course, is Farhan Mashraqi of Fotolog. You may recall we did an earlier article on Fotolog in Secrets to Fotolog's Scaling Success , which was one of my personal favorites. Fotolog, as they themselves point out, is probably the largest site nobody has ever heard of, pulling in more page views than even Flickr. Fotolog has 51 instances of memcached on 21 servers with 175G in use and 254G available. As a large successful photo-blogging site they have very demanding performance and scaling requirements. To meet those requirements they've developed a
4 0.1772912 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
Introduction: With Lavabit shutting down under murky circumstances , it seems fitting to repost an old (2009), yet still very good post by Ladar Levison on Lavabit's architecture. I don't know how much of this information is still current, but it should give you a general idea what Lavabit was all about. Getting to Know You What is the name of your system and where can we find out more about it? Note: these links are no longer valid... Lavabit http://lavabit.com http://lavabit.com/network.html http://lavabit.com/about.html What is your system for? Lavabit is a mid-sized email service provider. We currently have about 140,000 registered users with more than 260,000 email addresses. While most of our accounts belong to individual users, we also provide corporate email services to approximately 70 companies. Why did you decide to build this system? We built the system to compete against the other large free email providers, with an emphasis on serving the privacy c
5 0.17384781 1644 high scalability-2014-05-07-Update on Disqus: It's Still About Realtime, But Go Demolishes Python
Introduction: Our last article on Disqus: How Disqus Went Realtime With 165K Messages Per Second And Less Than .2 Seconds Latency , was a little out of date, but the folks at Disqus have been busy implementing, not talking, so we don't know a lot about what they are doing now, but we do have a short update in C1MM and NGINX by John Watson and an article Trying out this Go thing . So Disqus has grown a bit: 1.3 billion unique visitors 10 billion page views 500 million users engaged in discussions 3 million communities 25 million comments They are still all about realtime, but Go replaced Python in their Realtime system: Original Realtime backend was written in a pretty lightweight Python + gevent. The realtime service is a hybrid of CPU intensive tasks + lots of network IO. Gevent was handling the network IO without an issue, but at higher contention, the CPU was choking everything. Switching over to Go removed that contention, which was the primary issue that was being se
7 0.16925059 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
8 0.16588004 1359 high scalability-2012-11-15-Gone Fishin': Justin.Tv's Live Video Broadcasting Architecture
9 0.16579805 796 high scalability-2010-03-16-Justin.tv's Live Video Broadcasting Architecture
10 0.16564995 33 high scalability-2007-07-26-ThemBid Architecture
11 0.16469802 554 high scalability-2009-04-04-Digg Architecture
12 0.16324478 821 high scalability-2010-05-03-MocoSpace Architecture - 3 Billion Mobile Page Views a Month
13 0.16142789 1577 high scalability-2014-01-13-NYTimes Architecture: No Head, No Master, No Single Point of Failure
14 0.16082653 274 high scalability-2008-03-12-YouTube Architecture
15 0.16009083 313 high scalability-2008-05-02-Friends for Sale Architecture - A 300 Million Page View-Month Facebook RoR App
16 0.15711553 1456 high scalability-2013-05-13-The Secret to 10 Million Concurrent Connections -The Kernel is the Problem, Not the Solution
17 0.15631655 1331 high scalability-2012-10-02-An Epic TripAdvisor Update: Why Not Run on the Cloud? The Grand Experiment.
18 0.15567321 1197 high scalability-2012-02-21-Pixable Architecture - Crawling, Analyzing, and Ranking 20 Million Photos a Day
19 0.15539362 152 high scalability-2007-11-13-Flickr Architecture
20 0.15424171 1486 high scalability-2013-07-03-5 Rockin' Tips for Scaling PHP to 30,000 Concurrent Users Per Server
topicId topicWeight
[(0, 0.317), (1, 0.127), (2, -0.06), (3, -0.219), (4, 0.037), (5, -0.026), (6, 0.041), (7, 0.012), (8, 0.02), (9, 0.03), (10, 0.003), (11, -0.023), (12, 0.087), (13, -0.023), (14, -0.032), (15, -0.03), (16, -0.012), (17, -0.021), (18, -0.011), (19, -0.034), (20, -0.017), (21, -0.02), (22, -0.022), (23, 0.045), (24, 0.026), (25, 0.027), (26, 0.003), (27, 0.053), (28, 0.007), (29, -0.074), (30, 0.006), (31, 0.003), (32, -0.074), (33, 0.034), (34, 0.03), (35, 0.001), (36, -0.009), (37, -0.044), (38, -0.021), (39, 0.082), (40, -0.055), (41, -0.023), (42, 0.002), (43, -0.022), (44, -0.028), (45, -0.044), (46, -0.023), (47, 0.027), (48, -0.009), (49, -0.066)]
simIndex simValue blogId blogTitle
same-blog 1 0.97230512 808 high scalability-2010-04-12-Poppen.de Architecture
Introduction: This is a guest a post by Alvaro Videla describing their architecture for Poppen.de , a popular German dating site. This site is very much NSFW, so be careful before clicking on the link. What I found most interesting is how they manage to sucessfully blend a little of the old with a little of the new, using technologies like Nginx, MySQL, CouchDB, and Erlang, Memcached, RabbitMQ, PHP, Graphite, Red5, and Tsung. What is Poppen.de? Poppen.de (NSFW) is the top dating website in Germany, and while it may be a small site compared to giants like Flickr or Facebook, we believe it's a nice architecture to learn from if you are starting to get some scaling problems. The Stats 2.000.000 users 20.000 concurrent users 300.000 private messages per day 250.000 logins per day We have a team of eleven developers, two designers and two sysadmins for this project. Business Model The site works with a freemium model, where users can do for free things like: Search
2 0.81109846 884 high scalability-2010-08-23-6 Ways to Kill Your Servers - Learning How to Scale the Hard Way
Introduction: This is a guest post by Steffen Konerow, author of the High Performance Blog . Learning how to scale isn’t easy without any prior experience. Nowadays you have plenty of websites like highscalability.com to get some inspiration, but unfortunately there is no solution that fits all websites and needs. You still have to think on your own to find a concept that works for your requirements. So did I. A few years ago, my bosses came to me and said “We’ve got a new project for you. It’s the relaunch of a website that has already 1 million users a month. You have to build the website and make sure we’ll be able to grow afterwards”. I was already an experienced coder, but not in these dimensions, so I had to start learning how to scale – the hard way. The software behind the website was a PHP content management system, based on Smarty and MySQL. The first task was finding a proper hosting company who had the experience and would also manage the servers for us. After some researc
3 0.79990542 711 high scalability-2009-09-22-How Ravelry Scales to 10 Million Requests Using Rails
Introduction: Tim Bray has a wonderful interview with Casey Forbes , creator of Ravelry, a Ruby on Rails site supporting a 400,000+ strong community of dedicated knitters and crocheters. Casey and his small team have done great things with Ravelry. It is a very focused site that provides a lot of value for users. And users absolutely adore the site. That's obvious from their enthusiastic comments and rocket fast adoption of Ravelry. Ten years ago a site like Ravelry would have been a multi-million dollar operation. Today Casey is the sole engineer for Ravelry and to run it takes only a few people. He was able to code it in 4 months working nights and weekends. Take a look down below of all the technologies used to make Ravelry and you'll see how it is constructed almost completely from free of the shelf software that Casey has stitched together into a complete system. There's an amazing amount of leverage in today's ecosystem when you combine all the quality tools, languages, storage, bandwidth
4 0.79440659 312 high scalability-2008-04-30-Rather small site architecture.
Introduction: Website stats: Webserver: Apache 2.2 Database: MySQL 5.0 APC cache for php CMS: Drupal 6.2 (bleeding-edge version)* *Aggressive caching is ON, Page Compression ON, Block Cache ON (can't use CCS),Optimize CSS/JS ON. 2 Servers: Apache/Mysql (low-tech servers - Celeron processors, 512 MB RAM, 7200 RPM HDD) Bandwidth 10 Mb/s The benchmark: Used ab : ab -n 1000 -c 20 howwhatwho.com Server Software: Apache/2.2.3 Server Hostname: howwhatwho.com Server Port: 80 Document Path: / Document Length: 41639 bytes Concurrency Level: 20 Time taken for tests: 13.556796 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Total transferred: 42118000 bytes HTML transferred: 41639000 bytes Requests per second: 73.76 [#/sec] (mean) Time per request: 271.136 [ms] (mean) Time per request: 13.557 [ms] (mean, across all concurrent requests) Transfer rate: 3033.90 [Kbytes/sec] received
Introduction: Here's an Update On Disqus: It's Still About Realtime, But Go Demolishes Python . How do you add realtime functionality to a web scale application? That's what Adam Hitchcock , a Software Engineer at Disqus talks about in an excellent talk: Making DISQUS Realtime ( slides ). Disqus had to take their commenting system and add realtime capabilities to it. Not something that's easy to do when at the time of the talk (2013) they had had just hit a billion unique visitors a month. What Disqus developed is a realtime commenting system called “realertime” that was tested to handle 1.5 million concurrently connected users, 45,000 new connections per second, 165,000 messages/second, with less than .2 seconds latency end-to-end. The nature of a commenting system is that it is IO bound and has a high fanout, that is a comment comes in and must be sent out to a lot of readers. It's a problem very similar to what Twitter must solve . Disqus' solution was quite interesting as was th
6 0.77774543 437 high scalability-2008-11-03-How Sites are Scaling Up for the Election Night Crush
7 0.77317804 33 high scalability-2007-07-26-ThemBid Architecture
8 0.76804096 136 high scalability-2007-10-28-Scaling Early Stage Startups
9 0.76404989 1220 high scalability-2012-04-02-YouPorn - Targeting 200 Million Views a Day and Beyond
10 0.75832516 1197 high scalability-2012-02-21-Pixable Architecture - Crawling, Analyzing, and Ranking 20 Million Photos a Day
11 0.75791442 7 high scalability-2007-07-12-FeedBurner Architecture
12 0.75734991 985 high scalability-2011-02-08-Mollom Architecture - Killing Over 373 Million Spams at 100 Requests Per Second
13 0.75403905 1438 high scalability-2013-04-10-Check Yourself Before You Wreck Yourself - Avocado's 5 Early Stages of Architecture Evolution
14 0.7513569 825 high scalability-2010-05-10-Sify.com Architecture - A Portal at 3900 Requests Per Second
15 0.75075561 1188 high scalability-2012-02-06-The Design of 99designs - A Clean Tens of Millions Pageviews Architecture
16 0.74926716 1486 high scalability-2013-07-03-5 Rockin' Tips for Scaling PHP to 30,000 Concurrent Users Per Server
17 0.74633229 821 high scalability-2010-05-03-MocoSpace Architecture - 3 Billion Mobile Page Views a Month
18 0.74325395 554 high scalability-2009-04-04-Digg Architecture
19 0.74011493 512 high scalability-2009-02-14-Scaling Digg and Other Web Applications
20 0.73902291 1063 high scalability-2011-06-17-Stuff The Internet Says On Scalability For June 17, 2011
topicId topicWeight
[(1, 0.169), (2, 0.22), (4, 0.025), (10, 0.032), (28, 0.021), (30, 0.023), (40, 0.025), (47, 0.012), (49, 0.012), (61, 0.11), (77, 0.025), (79, 0.105), (85, 0.066), (94, 0.044)]
simIndex simValue blogId blogTitle
1 0.98650146 998 high scalability-2011-03-03-Stack Overflow Architecture Update - Now at 95 Million Page Views a Month
Introduction: A lot has happened since my first article on the Stack Overflow Architecture . Contrary to the theme of that last article, which lavished attention on Stack Overflow's dedication to a scale-up strategy, Stack Overflow has both grown up and out in the last few years. Stack Overflow has grown up by more then doubling in size to over 16 million users and multiplying its number of page views nearly 6 times to 95 million page views a month. Stack Overflow has grown out by expanding into the Stack Exchange Network , which includes Stack Overflow, Server Fault, and Super User for a grand total of 43 different sites. That's a lot of fruitful multiplying going on. What hasn't changed is Stack Overflow's openness about what they are doing. And that's what prompted this update. A recent series of posts talks a lot about how they've been handling their growth: Stack Exchange’s Architecture in Bullet Points , Stack Overflow’s New York Data Center , Designing For Scalability of Manageme
2 0.98598963 106 high scalability-2007-10-02-Secrets to Fotolog's Scaling Success
Introduction: Fotolog, a social blogging site centered around photos, grew from about 300 thousand users in 2004 to over 11 million users in 2007. Though they initially experienced the inevitable pains of rapid growth, they overcame their problems and now manage over 300 million photos and 800,000 new photos are added each day. Generating all that fabulous content are 20 million unique monthly visitors and a volunteer army of 30,000 new users each day. They did so well a very impressed suitor bought them out for a cool $90 million. That's scale meets success by anyone standards. How did they do it? Site: http://www.fotolog.com Information Sources Scaling the World's Largest Photo Blogging Community Congrats to Fotolog on $90mm sale to Hi-Media Fotolog overtaking Flickr? Fotolog Hits 11 Million Members and 300 Million Photos Posted Site of the Week: Fotolog.com by PC Magazine CEO John Borthwick's Blog . DBA Frank Mash's Blog Fotolog, lessons learnt by John B
Introduction: This is a guest post by Eric Czech , Chief Architect at Next Big Sound, talks about some unique approaches taken to solving scalability challenges in music analytics. Tracking online activity is hardly a new idea, but doing it for the entire music industry isn't easy. Half a billion music video streams, track downloads, and artist page likes occur each day and measuring all of this activity across platforms such as Spotify, iTunes, YouTube, Facebook, and more, poses some interesting scalability challenges. Next Big Sound collects this type of data from over a hundred sources, standardizes everything, and offers that information to record labels, band managers, and artists through a web-based analytics platform. While many of our applications use open-source systems like Hadoop, HBase, Cassandra, Mongo, RabbitMQ, and MySQL, our usage is fairly standard, but there is one aspect of what we do that is pretty unique. We collect or receive information from 100+ sources and we s
4 0.98283482 903 high scalability-2010-09-17-Hot Scalability Links For Sep 17, 2010
Introduction: Disqus - Scaling the Worlds Largest Django App. Interesting overview of a commenting system with 75 million comments and 250 million visitors. Lots of good details on how they partition their database, testing, continuous integration, feature switches, caching, delayed signals, and more. Things I learnt tracking a billion events in 24 hours : Know your host, Scaling isn't just servers, My servers need to talk to me more, Kill switches for users, What you don't know is the problem, Don't mix server roles, Know your most important users outside of your site. Tweets of Gold: georgebarnett : I read High Scalability for useful articles about large scaling. Sadly though, nothing useful ever shows up. #NoLongerBothering northscale : wow that is fast! :) RT @cgoldberg: was just running > 100k ops/sec against my 2-node #Membase cluster... zazooom #nosql turbofunctor : The root of many (horizontal) scalability problems is an application level access to a writab
5 0.98188269 1302 high scalability-2012-08-10-Stuff The Internet Says On Scalability For August 10, 2012
Introduction: It's HighScalability Time: TNW : On an average day, out of 30 trillion URLs on the web, Google crawls 20B web pages and now serves 100B searches every month. Quotable Quotes: @tapbot_paul : The 2 computers on the Curiosity rover are RAD750 based, they are approximately 1/10th the speed of an iPhone 4s and “only” cost $200k each. @merv : #cassandra12 Why @adrianco loves what he's doing: "You are no longer IO-bound, you’re CPU bound, like you’re supposed to be." @maxtaco : Garbage collection solves a minuscule %age of bugs, that are non-critical (memleaks? big deal!) and easy to find and fix. At a HUGE expense. @merv : #cassandra12 @eddie_satterly describing $1M savings in first year migrating from MS SQL Server with SAN to Cassandra solution - w more data. @mattbrauchler : A slow node is worse than a down node #cassandra12 @practicingEA : "The math of predictive analytics has been around for years, its the computers t
6 0.9815017 1600 high scalability-2014-02-21-Stuff The Internet Says On Scalability For February 21st, 2014
7 0.98048431 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
8 0.97984934 1559 high scalability-2013-12-06-Stuff The Internet Says On Scalability For December 6th, 2013
9 0.97971934 1506 high scalability-2013-08-23-Stuff The Internet Says On Scalability For August 23, 2013
10 0.97915912 1008 high scalability-2011-03-22-Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day
same-blog 11 0.97905475 808 high scalability-2010-04-12-Poppen.de Architecture
12 0.97873628 160 high scalability-2007-11-19-Tailrank Architecture - Learn How to Track Memes Across the Entire Blogosphere
13 0.97852135 1389 high scalability-2013-01-18-Stuff The Internet Says On Scalability For January 18, 2013
14 0.97851682 152 high scalability-2007-11-13-Flickr Architecture
15 0.97833651 1499 high scalability-2013-08-09-Stuff The Internet Says On Scalability For August 9, 2013
17 0.97782445 1431 high scalability-2013-03-29-Stuff The Internet Says On Scalability For March 29, 2013
18 0.97779018 1040 high scalability-2011-05-13-Stuff The Internet Says On Scalability For May 13, 2011
19 0.97756255 331 high scalability-2008-05-27-eBay Architecture
20 0.97710592 1109 high scalability-2011-09-02-Stuff The Internet Says On Scalability For September 2, 2011