high_scalability high_scalability-2007 high_scalability-2007-68 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: TypePad is considered the largest paid blogging service in the world. After experience problems because of their meteoric growth, they eventually transitioned to an architecture patterned after their sister company, LiveJournal. Site: http://www.typepad.com/ The Platform MySQL Memcached Perl MogileFS Apache Linux The Stats As of 2005 TypePad sends 250mbps of traffic using multiple network pipes for 3TB of traffic a day. They were growing by 10-20% each month. I was unable to find more recent statistics. The Architecture Original Architecture: - Single server running Linux, Apache, Postgres, Perl, mod_perl - Storage was NFS on a filer. A Devastating Crash Caused a New Direction - A RAID controller failed and spewed data across all RAID disks. - The database was corrupted and the backups were corrupted. - Their redundant filers suffered from "split brain" syndrome. They move to LiveJournal Architecture type architecture which isn't surprising
sentIndex sentText sentNum sentScore
1 TypePad is considered the largest paid blogging service in the world. [sent-1, score-0.199]
2 After experience problems because of their meteoric growth, they eventually transitioned to an architecture patterned after their sister company, LiveJournal. [sent-2, score-0.604]
3 com/ The Platform MySQL Memcached Perl MogileFS Apache Linux The Stats As of 2005 TypePad sends 250mbps of traffic using multiple network pipes for 3TB of traffic a day. [sent-5, score-0.371]
4 A Devastating Crash Caused a New Direction - A RAID controller failed and spewed data across all RAID disks. [sent-9, score-0.17]
5 - The database was corrupted and the backups were corrupted. [sent-10, score-0.201]
6 - Their redundant filers suffered from "split brain" syndrome. [sent-11, score-0.269]
7 They move to LiveJournal Architecture type architecture which isn't surprising since TypePad and LiveJounral are both owned by Six Apart. [sent-12, score-0.347]
8 - A global DB generated globally unique sequence numbers and mapped users to partitions. [sent-14, score-0.452]
9 - The Linux clustering heartbeat was used to failover using virtual IP addresses . [sent-17, score-0.322]
10 Perlbal is used as reverse proxy and to load balance requests. [sent-19, score-0.272]
11 A reliable, asynchronous job dispatch system called TheSchwartz is used to support moblogging, adding comments, future publishing, cache invalidation, and publishing. [sent-20, score-0.231]
12 Memcached is used to store counts, sets, stats, and heavyweight data. [sent-21, score-0.241]
13 Migration from the old architecture to the new architecture was tricky: - All users were migrated over without service interruption. [sent-22, score-0.443]
14 - During the migration images were served from NFS and MogileFS. [sent-24, score-0.084]
15 Benefits of their new architecture: - Can easily add new machines and adjust workload. [sent-25, score-0.116]
16 - More highly available and is cheaply scalable Lessons Learned Small details are important. [sent-26, score-0.139]
wordName wordTfidf (topN-words)
[('typepad', 0.417), ('mapped', 0.197), ('nfs', 0.194), ('postgres', 0.19), ('architecture', 0.165), ('sister', 0.161), ('filers', 0.151), ('raid', 0.151), ('devastating', 0.144), ('patterned', 0.144), ('cheaply', 0.139), ('transitioned', 0.134), ('heartbeat', 0.128), ('heavyweight', 0.128), ('pipes', 0.125), ('corrupted', 0.12), ('dispatch', 0.118), ('suffered', 0.118), ('adjust', 0.116), ('blogging', 0.116), ('used', 0.113), ('migrated', 0.113), ('invalidation', 0.11), ('unable', 0.107), ('owned', 0.104), ('publishing', 0.101), ('sequence', 0.098), ('linux', 0.098), ('mistake', 0.094), ('coordination', 0.09), ('controller', 0.089), ('crash', 0.088), ('sends', 0.088), ('counts', 0.087), ('migration', 0.084), ('paid', 0.083), ('globally', 0.082), ('backups', 0.081), ('failed', 0.081), ('clustering', 0.081), ('reverse', 0.081), ('six', 0.079), ('traffic', 0.079), ('surprising', 0.078), ('proxy', 0.078), ('brain', 0.077), ('generated', 0.075), ('partitioned', 0.074), ('perl', 0.074), ('caused', 0.072)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 68 high scalability-2007-08-20-TypePad Architecture
Introduction: TypePad is considered the largest paid blogging service in the world. After experience problems because of their meteoric growth, they eventually transitioned to an architecture patterned after their sister company, LiveJournal. Site: http://www.typepad.com/ The Platform MySQL Memcached Perl MogileFS Apache Linux The Stats As of 2005 TypePad sends 250mbps of traffic using multiple network pipes for 3TB of traffic a day. They were growing by 10-20% each month. I was unable to find more recent statistics. The Architecture Original Architecture: - Single server running Linux, Apache, Postgres, Perl, mod_perl - Storage was NFS on a filer. A Devastating Crash Caused a New Direction - A RAID controller failed and spewed data across all RAID disks. - The database was corrupted and the backups were corrupted. - Their redundant filers suffered from "split brain" syndrome. They move to LiveJournal Architecture type architecture which isn't surprising
2 0.15247677 110 high scalability-2007-10-03-Why most large-scale Web sites are not written in Java
Introduction: There i s a l ot of i nformation in the b l ogosphere descr i bing the arch i tecture of many popu l ar s i tes, such as Google, Amazon, eBay, LinkedIn, TypePad, W i kiPedia and others. I've summar i zed th i s issue in a b l og post here I wou l d rea l ly appreciate your opinion on th i s matter.
3 0.13061152 310 high scalability-2008-04-29-High performance file server
Introduction: What have bunch of applications which run on Debian servers, which processes huge amount of data stored in a shared NFS drive. we have 3 applications working as a pipeline, which process data stored in the NFS drive. The first application processes the data and store the output in some folder in the NFS drive, the second app in the pipeline process the data from the previous step and so on. The data load to the pipeline is like 1 GBytes per minute. I think the NFS drive is the bottleneck here. Would buying a specialized file server improve the performance of data read write from the disk ?
4 0.12139256 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
Introduction: With Lavabit shutting down under murky circumstances , it seems fitting to repost an old (2009), yet still very good post by Ladar Levison on Lavabit's architecture. I don't know how much of this information is still current, but it should give you a general idea what Lavabit was all about. Getting to Know You What is the name of your system and where can we find out more about it? Note: these links are no longer valid... Lavabit http://lavabit.com http://lavabit.com/network.html http://lavabit.com/about.html What is your system for? Lavabit is a mid-sized email service provider. We currently have about 140,000 registered users with more than 260,000 email addresses. While most of our accounts belong to individual users, we also provide corporate email services to approximately 70 companies. Why did you decide to build this system? We built the system to compete against the other large free email providers, with an emphasis on serving the privacy c
5 0.11212905 308 high scalability-2008-04-22-Simple NFS failover solution with symbolic link?
Introduction: I've been trying to find a high availability file storage solution without success. I tried GlusterFS which looks very promising but experienced problems with stability and don't want something I can't easily control and rely on. Other solutions are too complicated or have a SPOF. So I'm thinking of the following setup: Two NFS servers, a primary and a warm backup. The primary server will be rsynced with the warm backup every minute or two. I can do it so frequently as a PHP script will know which directories have changed recently from a database and only rsync those. Both servers will be NFS mounted on a cluster of web servers as /mnt/nfs-primary (sym linked as /home/websites) and /mnt/nfs-backup. I'll then use Ucarp (http://www.ucarp.org/project/ucarp) to monitor both NFS servers availability every couple of seconds and when one goes down, the Ucarp up script will be set to change the symbolic link on all web servers for the /home/websites dir from /mnt/nfs-primary to /mn
6 0.10236641 229 high scalability-2008-01-29-Building scalable storage into application - Instead of MogileFS OpenAFS etc.
7 0.10201563 73 high scalability-2007-08-23-Postgresql on high availability websites?
8 0.099187247 276 high scalability-2008-03-15-New Website Design Considerations
9 0.098280467 691 high scalability-2009-08-31-Squarespace Architecture - A Grid Handles Hundreds of Millions of Requests a Month
10 0.097431466 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
11 0.092772841 23 high scalability-2007-07-24-Major Websites Down: Or Why You Want to Run in Two or More Data Centers.
12 0.091278397 274 high scalability-2008-03-12-YouTube Architecture
13 0.09104529 275 high scalability-2008-03-14-Problem: Mobbing the Least Used Resource Error
14 0.089747824 290 high scalability-2008-03-28-How to Get DNS Names of a Web Server
15 0.086975381 170 high scalability-2007-12-02-Database-Clustering: a8cjdbc - update: version 1.3
16 0.086975381 171 high scalability-2007-12-02-a8cjdbc - update verision 1.3
17 0.086958446 297 high scalability-2008-04-05-Skype Plans for PostgreSQL to Scale to 1 Billion Users
18 0.086607821 1188 high scalability-2012-02-06-The Design of 99designs - A Clean Tens of Millions Pageviews Architecture
19 0.084392346 554 high scalability-2009-04-04-Digg Architecture
20 0.084004536 389 high scalability-2008-09-23-How to Scale with Ruby on Rails
topicId topicWeight
[(0, 0.143), (1, 0.066), (2, -0.048), (3, -0.098), (4, -0.001), (5, 0.019), (6, 0.017), (7, -0.092), (8, -0.004), (9, 0.025), (10, -0.024), (11, -0.004), (12, 0.035), (13, 0.006), (14, 0.026), (15, 0.062), (16, 0.011), (17, -0.001), (18, -0.021), (19, 0.026), (20, 0.02), (21, 0.034), (22, -0.063), (23, -0.006), (24, -0.032), (25, 0.057), (26, 0.051), (27, -0.002), (28, -0.005), (29, 0.005), (30, 0.01), (31, -0.052), (32, -0.027), (33, -0.032), (34, 0.018), (35, -0.002), (36, 0.021), (37, 0.012), (38, 0.027), (39, 0.045), (40, 0.018), (41, -0.081), (42, -0.008), (43, 0.038), (44, 0.032), (45, 0.03), (46, -0.038), (47, -0.015), (48, -0.066), (49, 0.035)]
simIndex simValue blogId blogTitle
same-blog 1 0.95794863 68 high scalability-2007-08-20-TypePad Architecture
Introduction: TypePad is considered the largest paid blogging service in the world. After experience problems because of their meteoric growth, they eventually transitioned to an architecture patterned after their sister company, LiveJournal. Site: http://www.typepad.com/ The Platform MySQL Memcached Perl MogileFS Apache Linux The Stats As of 2005 TypePad sends 250mbps of traffic using multiple network pipes for 3TB of traffic a day. They were growing by 10-20% each month. I was unable to find more recent statistics. The Architecture Original Architecture: - Single server running Linux, Apache, Postgres, Perl, mod_perl - Storage was NFS on a filer. A Devastating Crash Caused a New Direction - A RAID controller failed and spewed data across all RAID disks. - The database was corrupted and the backups were corrupted. - Their redundant filers suffered from "split brain" syndrome. They move to LiveJournal Architecture type architecture which isn't surprising
2 0.73717773 7 high scalability-2007-07-12-FeedBurner Architecture
Introduction: FeedBurner is a news feed management provider launched in 2004. FeedBurner provides custom RSS feeds and management tools to bloggers, podcasters, and other web-based content publishers. Services provided to publishers include traffic analysis and an optional advertising system. Site: http://www.feedburner.com Information Sources FeedBurner - Scalable Web Applications using MySQL and Java What the Web’s most popular sites are running on Platform Java MySQL Hibernate Spring Tomcat Cacti Load balancing: NetScaler Application Switches Routers, switches: HP, Cisco DNS: bind The Stats FeedBurner is growing faster than MySpace and Digg with 385% traffic growth. Total feeds: 808,707, Number of publishers: 471,686. 11 million subscribers in 190 countries Scaling History - July 2004: 300Kbps, 5,600 feeds, 3 app servers, 3 web servers 2 DB servers, Round Robin DNS - April 2005: 5Mbps, 47,700 feeds, 6 app servers, 6 web servers (same mac
3 0.70119113 72 high scalability-2007-08-22-Wikimedia architecture
Introduction: Wikimedia is the platform on which Wikipedia, Wiktionary, and the other seven wiki dwarfs are built on. This document is just excellent for the student trying to scale the heights of giant websites. It is full of details and innovative ideas that have been proven on some of the most used websites on the internet. Site: http://wikimedia.org/ Information Sources Wikimedia architecture http://meta.wikimedia.org/wiki/Wikimedia_servers scale-out vs scale-up in the from Oracle to MySQL blog. Platform Apache Linux MySQL PHP Squid LVS Lucene for Search Memcached for Distributed Object Cache Lighttpd Image Server The Stats 8 million articles spread over hundreds of language projects (english, dutch, ...) 10th busiest site in the world (source: Alexa) Exponential growth: doubling every 4-6 months in terms of visitors / traffic / servers 30 000 HTTP requests/s during peak-time 3 Gbit/s of data traffic 3 data centers: Tampa, A
4 0.69076574 5 high scalability-2007-07-10-mixi.jp Architecture
Introduction: Mixi is a fast growing social networking site in Japan. They provide services like: diary, community, message, review, and photo album. Having a lot in common with LiveJournal they also developed many of the same approaches. Their write up on how they scaled their system is easily one of the best out there. Site: http://mixi.jp Information Sources mixi.jp - scaling out with open source Platform Linux Apache MySQL Perl Memcached Squid Shard What's Inside? They grew to approximately 4 million users in two years and add over 15,000 new users/day. Ranks 35th on Alexa and 3rd in Japan. More than 100 MySQL servers Add more than 10 servers/month Use non-persistent connections. Diary traffic is 85% read and 15% write. Message traffic is is 75% read and 25% write. Ran into replication performance problems so they had to split the database. Considered splitting vertically by user or splitting horizontally by table type. The ende
5 0.6854341 473 high scalability-2008-12-20-Second Life Architecture - The Grid
Introduction: Update: Presentation: Second Life’s Architecture . Ian Wilkes, VP of Systems Engineering, describes the architecture used by the popular game named Second Life. Ian presents how the architecture was at its debut and how it evolved over years as users and features have been added. Second Life is a 3-D virtual world created by its Residents. Virtual Worlds are expected to be more and more popular on the internet so their architecture might be of interest. Especially important is the appearance of open virtual worlds or metaverses. What happens when video games meet Web 2.0? What happens is the metaverse . Information Sources Second Life runs MySQL Interview with Ian Wilkes TechTrends: Inside Linden Lab Town Hall with Cory Linden InformationWeek articles ( 1 , 2 ) and blog Second Life Wiki: Server Architecture Wikipedia: Second Life Server Second Life Blog Second Life: A Guide to Your Virtual World Platform
6 0.67611396 308 high scalability-2008-04-22-Simple NFS failover solution with symbolic link?
7 0.6663062 511 high scalability-2009-02-12-MySpace Architecture
8 0.66450852 297 high scalability-2008-04-05-Skype Plans for PostgreSQL to Scale to 1 Billion Users
9 0.65638989 1046 high scalability-2011-05-23-Evernote Architecture - 9 Million Users and 150 Million Requests a Day
10 0.65573913 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
11 0.65385592 57 high scalability-2007-08-03-Scaling IMAP and POP3
12 0.65281844 302 high scalability-2008-04-10-Mysql scalability and failover...
13 0.65045035 81 high scalability-2007-09-06-Scaling IMAP and POP3
14 0.64939392 1041 high scalability-2011-05-15-Building a Database remote availability site
15 0.64878172 160 high scalability-2007-11-19-Tailrank Architecture - Learn How to Track Memes Across the Entire Blogosphere
16 0.6412828 1261 high scalability-2012-06-08-Stuff The Internet Says On Scalability For June 8, 2012
17 0.64050281 391 high scalability-2008-09-23-The 7 Stages of Scaling Web Apps
18 0.64016372 1521 high scalability-2013-09-23-Salesforce Architecture - How they Handle 1.3 Billion Transactions a Day
19 0.63701534 103 high scalability-2007-09-28-Kosmos File System (KFS) is a New High End Google File System Option
20 0.63410139 271 high scalability-2008-03-08-Product: DRBD - Distributed Replicated Block Device
topicId topicWeight
[(1, 0.155), (2, 0.186), (7, 0.204), (10, 0.07), (30, 0.077), (40, 0.026), (47, 0.026), (61, 0.04), (79, 0.043), (94, 0.084)]
simIndex simValue blogId blogTitle
1 0.89523166 518 high scalability-2009-02-22-Building and Scaling a Startup on Rails: 12 Things We Learned the Hard Way
Introduction: Garry Tan, cofounder of Posterous, lists 12 lessons for scaling that apply to more than just Rails. Use cloud storage for static files. Use HTTP Cache Control to tell the browser what it can cache. Use Sphinx for text search. Use InnoDB for more crash resistant and faster writes. Don't use textbook Rails ActiveRecord objects. Use New Relic to find exactly what is slow in your system. Use memcache later so you find your database bottlenecks now. Use mongrel proctitle to find your slow queries. You are only as fast as your slowest queries. Use asynchronous job queuing to do work in parallel. Use monitoring so you'll know when your site went down and why. Learn by reading the source code, fixing problems, and submitting them back to the community. Use new plugins. Old plugins can't be trusted. Use new information. Old information can't be trusted.
same-blog 2 0.86988658 68 high scalability-2007-08-20-TypePad Architecture
Introduction: TypePad is considered the largest paid blogging service in the world. After experience problems because of their meteoric growth, they eventually transitioned to an architecture patterned after their sister company, LiveJournal. Site: http://www.typepad.com/ The Platform MySQL Memcached Perl MogileFS Apache Linux The Stats As of 2005 TypePad sends 250mbps of traffic using multiple network pipes for 3TB of traffic a day. They were growing by 10-20% each month. I was unable to find more recent statistics. The Architecture Original Architecture: - Single server running Linux, Apache, Postgres, Perl, mod_perl - Storage was NFS on a filer. A Devastating Crash Caused a New Direction - A RAID controller failed and spewed data across all RAID disks. - The database was corrupted and the backups were corrupted. - Their redundant filers suffered from "split brain" syndrome. They move to LiveJournal Architecture type architecture which isn't surprising
3 0.86548799 1081 high scalability-2011-07-18-Building your own Facebook Realtime Analytics System
Introduction: Recently, I was reading Todd Hoff's write-up on FaceBook real time analytics system . As usual, Todd did an excellent job in summarizing this video from Engineering Manager at Facebook Alex Himel . In the first post , I’d like to summarize the case study, and consider some things that weren't mentioned in the summaries. This will lead to an architecture for building your own Realtime Time Analytics for Big-Data that might be easier to implement, using Facebook's experience as a starting point and guide as well as the experience gathered through a recent work with few of GigaSpaces customers. The second post provide a summary of that new approach as well as a pattern and a demo for building your own Real Time Analytics system.. References Real Time analytics for Big Data: Facebook's New Realtime Analytics System Real Time Analytics for Big Data: An Alternative Approach
4 0.85375768 26 high scalability-2007-07-25-Paper: Lightweight Web servers
Introduction: This paper is a great overview of different lightweight web servers. A lot of websites use lightweight web servers to serve images and static content. YouTube is one example: YouTube Architecture . So if you need to improve performance consider changing over a different web server for some types of content. Overview: Recent years have enjoyed a florescence of interesting implementations of Web servers, including lighttpd, litespeed, and mongrel, among others. These Web servers boast different combinations of performance, ease of administration, portability, security, and related values. The following engineering study surveys the field of lightweight Web servers to help you find one likely to meet the technical requirements of your next project. "Lightweight" Web servers like lighttpd, litespeed, and mongrel can offer dramatic benefits for your projects. This article surveys the possibilities and shows how they apply to you. Important dimensions for evaluation of a Web serv
5 0.81226075 325 high scalability-2008-05-25-How do you explain cloud computing to your grandma?
Introduction: Update 2: Nice introductory New York Time's article Cloud Computing: So You Don’t Have to Stand Still . Good example of how Animoto used RightScale and Amazon to meet a Facebook driven demand of 25,000 test drives an hour. Update: Peter Laird in Understanding the Cloud Computing/SaaS/PaaS markets: a Map of the Players in the Industry paints a very cool visual map of all the cloud service players. It's a larger industry than you might think. Once upon a time I worked at an Asynchronous Transfer Mode (ATM) switch startup. Over a delicious Christmas punch my grandma asked me what I did for a living that I could afford such extravagantly inexpensive gifts. Always so subtle. I explained I worked on an ATM switch. Mistake. She sniffed, said that's nice, and asked me why the Automated Teller Machine ate her bank card that morning. No matter how hard I tried I couldn't convince her I didn't work on bank ATMs. To all future job interrogations I waxed off, protesting I do boring soft
6 0.80626124 219 high scalability-2008-01-21-Product: Hyperic
7 0.80041528 964 high scalability-2010-12-28-Netflix: Continually Test by Failing Servers with Chaos Monkey
9 0.78962553 1397 high scalability-2013-02-01-Stuff The Internet Says On Scalability For February 1, 2013
10 0.78372931 1267 high scalability-2012-06-18-The Clever Ways Chrome Hides Latency by Anticipating Your Every Need
11 0.77435791 256 high scalability-2008-02-21-Tracking usage of public resources - throttling accesses per hour
13 0.76867151 1434 high scalability-2013-04-03-5 Steps to Benchmarking Managed NoSQL - DynamoDB vs Cassandra
14 0.76658332 996 high scalability-2011-02-28-A Practical Guide to Varnish - Why Varnish Matters
15 0.76476741 136 high scalability-2007-10-28-Scaling Early Stage Startups
16 0.76382709 788 high scalability-2010-03-04-How MySpace Tested Their Live Site with 1 Million Concurrent Users
17 0.76357746 1220 high scalability-2012-04-02-YouPorn - Targeting 200 Million Views a Day and Beyond
18 0.76356161 881 high scalability-2010-08-16-Scaling an AWS infrastructure - Tools and Patterns
19 0.76349747 1329 high scalability-2012-09-26-WordPress.com Serves 70,000 req-sec and over 15 Gbit-sec of Traffic using NGINX
20 0.76306355 233 high scalability-2008-01-30-How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data