high_scalability high_scalability-2007 high_scalability-2007-33 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: ThemBid provides a market where people needing work done broadcast their request and accept bids from people competing for the job. Unlike many of the sites profiled at HighScalability, ThemBid is not in the popular press as often as Paris Hilton. It's not a media darling or a giant of the industry. But what I like is they have a strategy, a point-of-view for building websites and were gracious enough to share very detailed instructions on how to go about building a website. They even delve into actual installation details of the various software packages they use. Anyone can benefit by taking a look at their work. Site: http://www.thembid.com/ Information Sources Build Scalable Web 2.0 Sites with Ubuntu, Symfony, and Lighttpd Platform Linux (Ubuntu) Symfony Lighttpd PHP eAccelerator Eclipse Munin AWStats What's Inside? The Stats Started work in December of 2006 and had a full demo by March 2007. One developer/sys admin worked with a pa
sentIndex sentText sentNum sentScore
1 ThemBid provides a market where people needing work done broadcast their request and accept bids from people competing for the job. [sent-1, score-0.14]
2 Unlike many of the sites profiled at HighScalability, ThemBid is not in the popular press as often as Paris Hilton. [sent-2, score-0.182]
3 They even delve into actual installation details of the various software packages they use. [sent-5, score-0.218]
4 They went with with Layeredtech for the managed server because of past positive experiences. [sent-18, score-0.146]
5 They chose the server distribution of Ubuntu because that's what they use on the client side and Ubuntu supports "simpler installation and easier maintenance than typical IT deployments. [sent-22, score-0.366]
6 Lighttpd is used to handle static content and forward the dynamic PHP page requests to FastCGI. [sent-24, score-0.119]
7 When growth is necessary the idea is to move to a master-slave arrangement and them maybe MySQL cluster. [sent-27, score-0.266]
8 They chose Symfony as there framework because of its nice documentation and active development community. [sent-30, score-0.348]
9 eAccelerator is used to compile and cache PHP scripts. [sent-34, score-0.289]
10 Many of the pieces are used over and over again so putting them in memory will speed up the entire system and take pressure off the database and the IO system. [sent-38, score-0.201]
11 Initially the used a SQLite cache on top of of a memory based file system. [sent-39, score-0.297]
12 Lighttp's mod_expire module is used to prevent Javascript, style sheets, and images that rarely change from being uncessarily redownloaded by the browser. [sent-43, score-0.119]
13 AWStats is used to track hits and types of requests. [sent-49, score-0.119]
14 - Move to a distributed memory cache using memcached. [sent-58, score-0.178]
15 Lessons Learned It's possible to create a nice site fairly quickly with just a few people using commonly available low cost tools . [sent-63, score-0.158]
16 Good documentation and an active community draw people . [sent-67, score-0.379]
17 These are very attractive qualities for people making decisions about what to use. [sent-68, score-0.149]
18 It's hard to go with a tool chain when it looks like you may get stuck in the future with no way out and no help. [sent-69, score-0.125]
19 You don't want to delay releasing your site so you can learn a completely different tool chain that may make your life somewhat easier and in some projected future. [sent-73, score-0.43]
20 It also means there's an active community that can help you when you have problems. [sent-78, score-0.191]
wordName wordTfidf (topN-words)
[('ubuntu', 0.346), ('thembid', 0.211), ('symfony', 0.189), ('php', 0.178), ('munin', 0.155), ('yahoo', 0.141), ('chain', 0.125), ('used', 0.119), ('documentation', 0.118), ('active', 0.118), ('installation', 0.113), ('digg', 0.112), ('chose', 0.112), ('trend', 0.109), ('cut', 0.107), ('eaccelerator', 0.105), ('arrangement', 0.105), ('awstats', 0.105), ('darling', 0.105), ('delve', 0.105), ('layeredtech', 0.105), ('learnedit', 0.105), ('paris', 0.105), ('gracious', 0.099), ('sites', 0.096), ('cache', 0.096), ('sheets', 0.095), ('sqlite', 0.095), ('statsstarted', 0.095), ('site', 0.088), ('lvs', 0.086), ('profiled', 0.086), ('move', 0.085), ('memory', 0.082), ('visiting', 0.082), ('frees', 0.08), ('qualities', 0.079), ('bid', 0.077), ('worked', 0.077), ('plugin', 0.076), ('went', 0.076), ('maybe', 0.076), ('releasing', 0.075), ('webservers', 0.075), ('compile', 0.074), ('community', 0.073), ('projected', 0.071), ('easier', 0.071), ('people', 0.07), ('server', 0.07)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 33 high scalability-2007-07-26-ThemBid Architecture
Introduction: ThemBid provides a market where people needing work done broadcast their request and accept bids from people competing for the job. Unlike many of the sites profiled at HighScalability, ThemBid is not in the popular press as often as Paris Hilton. It's not a media darling or a giant of the industry. But what I like is they have a strategy, a point-of-view for building websites and were gracious enough to share very detailed instructions on how to go about building a website. They even delve into actual installation details of the various software packages they use. Anyone can benefit by taking a look at their work. Site: http://www.thembid.com/ Information Sources Build Scalable Web 2.0 Sites with Ubuntu, Symfony, and Lighttpd Platform Linux (Ubuntu) Symfony Lighttpd PHP eAccelerator Eclipse Munin AWStats What's Inside? The Stats Started work in December of 2006 and had a full demo by March 2007. One developer/sys admin worked with a pa
2 0.24064074 248 high scalability-2008-02-13-What's your scalability plan?
Introduction: How do you plan to scale your system as you reach predictable milestones? This topic came up in another venue and it reminded me about a great comment an Anonymous wrote a while ago and I wanted to make sure that comment didn't get lost. The Anonymous scaling plan was relatively simple and direct: My two cents on what I'm using to start a website from scratch using a single server for now. Later, I'll scale out horizontally when the need arises. Phase 1 Single Server, Dual Quad-Core 2.66, 8gb RAM, 500gb Disk Raid 10 OS: Fedora 8. You could go with pretty much any Linux though. I like Fedora 8 best for servers. Proxy Cache: Varnish - it is way faster than Squid per my own benchmarks. Squid chokes bigtime. Web Server: Lighttpd - faster than Apache 2 and easier to configure for me. Object Cache: Memcached. Very scalable. PHP Cache: APC. Easy to configure and seems to work fine. Language: PHP 5 - no bloated frameworks, waste of time for me. You spend too mu
3 0.17018582 554 high scalability-2009-04-04-Digg Architecture
Introduction: Update 4: : Introducing Digg’s IDDB Infrastructure by Joe Stump. IDDB is a way to partition both indexes (e.g. integer sequences and unique character indexes) and actual tables across multiple storage servers (MySQL and MemcacheDB are currently supported with more to follow). Update 3: : Scaling Digg and Other Web Applications . Update 2: : How Digg Works and How Digg Really Works (wear ear plugs). Brought to you straight from Digg's blog. A very succinct explanation of the major elements of the Digg architecture while tracing a request through the system. I've updated this profile with the new information. Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades . Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. How does Digg handle billions of req
4 0.16564995 808 high scalability-2010-04-12-Poppen.de Architecture
Introduction: This is a guest a post by Alvaro Videla describing their architecture for Poppen.de , a popular German dating site. This site is very much NSFW, so be careful before clicking on the link. What I found most interesting is how they manage to sucessfully blend a little of the old with a little of the new, using technologies like Nginx, MySQL, CouchDB, and Erlang, Memcached, RabbitMQ, PHP, Graphite, Red5, and Tsung. What is Poppen.de? Poppen.de (NSFW) is the top dating website in Germany, and while it may be a small site compared to giants like Flickr or Facebook, we believe it's a nice architecture to learn from if you are starting to get some scaling problems. The Stats 2.000.000 users 20.000 concurrent users 300.000 private messages per day 250.000 logins per day We have a team of eleven developers, two designers and two sysadmins for this project. Business Model The site works with a freemium model, where users can do for free things like: Search
5 0.13966726 512 high scalability-2009-02-14-Scaling Digg and Other Web Applications
Introduction: Joe Stump, Lead Architect at Digg, gave this presentation at the Web 2.0 Expo. I couldn't find the actual presentation, but fortunately Kris Jordan took some great notes. That's how key moments in history are accidentally captured forever. Joe was also kind enough to respond to my email questions with a phone call. In this first part of the post Joe shares some timeless wisdom that you may or may not have read before. I of course take some pains to extract all the wit from the original presentation in favor of simple rules. What really struck me however was how Joe thought MemcacheDB Will be the biggest new kid on the block in scaling . MemcacheDB has been around for a little while and I've never thought of it in that way. Well learn why Joe is so excited by MemcacheDB at the end of the post. Impressive Stats 80th-100th largest site in the world 26 million uniques a month 30 million users. Uniques are only half that traffic. Traffic = unique web visitors + APIs + Digg
6 0.13587034 721 high scalability-2009-10-13-Why are Facebook, Digg, and Twitter so hard to scale?
7 0.13157651 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
8 0.12621966 1469 high scalability-2013-06-03-GOV.UK - Not Your Father's Stack
9 0.12250311 32 high scalability-2007-07-26-Product: eAccelerator a PHP Accelerator
10 0.11850988 31 high scalability-2007-07-26-Product: Symfony a Web Framework
12 0.11714092 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud
13 0.11634554 617 high scalability-2009-06-04-New Book: Even Faster Web Sites: Performance Best Practices for Web Developers
14 0.11383105 1492 high scalability-2013-07-17-How do you create a 100th Monkey software development culture?
15 0.11332304 274 high scalability-2008-03-12-YouTube Architecture
16 0.11292974 1102 high scalability-2011-08-22-Strategy: Run a Scalable, Available, and Cheap Static Site on S3 or GitHub
17 0.1111852 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
18 0.11094876 1617 high scalability-2014-03-21-Stuff The Internet Says On Scalability For March 21st, 2014
19 0.1108782 106 high scalability-2007-10-02-Secrets to Fotolog's Scaling Success
20 0.11007187 1 high scalability-2007-07-06-Start Here
topicId topicWeight
[(0, 0.223), (1, 0.086), (2, -0.04), (3, -0.142), (4, 0.039), (5, -0.046), (6, -0.049), (7, -0.013), (8, -0.014), (9, 0.052), (10, -0.049), (11, -0.019), (12, 0.032), (13, 0.029), (14, 0.013), (15, -0.103), (16, 0.007), (17, -0.013), (18, 0.004), (19, -0.022), (20, -0.053), (21, 0.036), (22, 0.003), (23, 0.06), (24, -0.017), (25, 0.059), (26, 0.039), (27, -0.026), (28, -0.034), (29, -0.13), (30, -0.004), (31, 0.056), (32, -0.001), (33, -0.033), (34, 0.007), (35, -0.032), (36, -0.073), (37, -0.015), (38, -0.072), (39, 0.004), (40, -0.05), (41, 0.001), (42, 0.035), (43, -0.004), (44, -0.009), (45, -0.05), (46, -0.024), (47, -0.017), (48, 0.043), (49, -0.025)]
simIndex simValue blogId blogTitle
same-blog 1 0.96759629 33 high scalability-2007-07-26-ThemBid Architecture
Introduction: ThemBid provides a market where people needing work done broadcast their request and accept bids from people competing for the job. Unlike many of the sites profiled at HighScalability, ThemBid is not in the popular press as often as Paris Hilton. It's not a media darling or a giant of the industry. But what I like is they have a strategy, a point-of-view for building websites and were gracious enough to share very detailed instructions on how to go about building a website. They even delve into actual installation details of the various software packages they use. Anyone can benefit by taking a look at their work. Site: http://www.thembid.com/ Information Sources Build Scalable Web 2.0 Sites with Ubuntu, Symfony, and Lighttpd Platform Linux (Ubuntu) Symfony Lighttpd PHP eAccelerator Eclipse Munin AWStats What's Inside? The Stats Started work in December of 2006 and had a full demo by March 2007. One developer/sys admin worked with a pa
2 0.76535982 808 high scalability-2010-04-12-Poppen.de Architecture
Introduction: This is a guest a post by Alvaro Videla describing their architecture for Poppen.de , a popular German dating site. This site is very much NSFW, so be careful before clicking on the link. What I found most interesting is how they manage to sucessfully blend a little of the old with a little of the new, using technologies like Nginx, MySQL, CouchDB, and Erlang, Memcached, RabbitMQ, PHP, Graphite, Red5, and Tsung. What is Poppen.de? Poppen.de (NSFW) is the top dating website in Germany, and while it may be a small site compared to giants like Flickr or Facebook, we believe it's a nice architecture to learn from if you are starting to get some scaling problems. The Stats 2.000.000 users 20.000 concurrent users 300.000 private messages per day 250.000 logins per day We have a team of eleven developers, two designers and two sysadmins for this project. Business Model The site works with a freemium model, where users can do for free things like: Search
3 0.75331485 218 high scalability-2008-01-17-Moving old to new. Do not be afraid of the re-write -- but take some help
Introduction: Recently I had to help users on one of my opensource project ISPMan. http://ispman.net This project started in 2001 as I was too unwilling to take care of the DNS and VitualHosting stuff as it was a side-thing to the company I worked for (so i wrote a software that took care of all these little details) Summary: A large project that needs a rewrite can be done in a matter of day. I will not give you a full case study about a project that went through a re-write but a case study about how easy it is to re-write something. Details: My boss was cool enough to let me open-source the project and obviously, I got a lot of cool-cred out of it. Later on I also did some support and implementation and earned quiet some money with it. Eventually I had to let the project go out of my hand to the community as I only did it to facilitate a job that wasnt williing to do. (Setup DNS zones of multiple servers, find out which host should host the website and put VirtualHost
4 0.74481332 136 high scalability-2007-10-28-Scaling Early Stage Startups
Introduction: Mark Maunder of No VC Required --who advocates not taking VC money lest you be turned into a frog instead of the prince (or princess) you were dreaming of--has an excellent slide deck on how to scale an early stage startup. His blog also has some good SEO tips and a very spooky widget showing the geographical location of his readers. Perfect for Halloween! What is Mark's other worldly scaling strategies for startups? Site: http://novcrequired.com/ Information Sources Slides from Seattle Tech Startup Talk . Scaling Early Stage Startups blog post by Mark Maunder. The Platform Linxux An ISAM type data store. Perl Httperf is used for benchmarking. Websitepulse.com is used for perf monitoring. The Architecture Performance matters because being slow could cost you 20% of your revenue. The UIE guys disagree saying this ain't necessarily so. They explain their reasoning in Usability Tools Podcast: The Truth About Page Download Time . The idea i
5 0.73672515 1486 high scalability-2013-07-03-5 Rockin' Tips for Scaling PHP to 30,000 Concurrent Users Per Server
Introduction: Jonathan Block , CTO at RockThePost.com , a crowdfunding company, has written a nice set of tips for smaller sites on how to scale a service on EC2 using a small two person development team. Their service has a typical small scale structure: PHP's Zend Framework 2 Two m1.medium for web servers ELB to split the load master/slave MySQL database Siege for load testing The very sensible tips that can handle 30,000 concurrent users per web server: Use PHP's APC feature . APC is opcode cache that is " really a requirement in order for a website to have a chance at performing well." Put everything that's not a .php request on a CDN . Don't serve static files from your web server. They put everything on S3 and use CloudFront as their CDN. Recent CloudFront problems have caused them to serve directly from S3. Don't make connections to other servers in your PHP code . Making connections to other servers blocks the server and slows down processing. Use the APC k
6 0.72383225 1102 high scalability-2011-08-22-Strategy: Run a Scalable, Available, and Cheap Static Site on S3 or GitHub
7 0.72009182 1469 high scalability-2013-06-03-GOV.UK - Not Your Father's Stack
8 0.71651024 248 high scalability-2008-02-13-What's your scalability plan?
9 0.70927912 66 high scalability-2007-08-16-What tech is used to build your favorite site?
10 0.70405936 884 high scalability-2010-08-23-6 Ways to Kill Your Servers - Learning How to Scale the Hard Way
11 0.7016449 434 high scalability-2008-10-30-Olio Web2.0 Toolkit - Evaluate Web Technologies and Tools
12 0.6948235 711 high scalability-2009-09-22-How Ravelry Scales to 10 Million Requests Using Rails
13 0.6941762 996 high scalability-2011-02-28-A Practical Guide to Varnish - Why Varnish Matters
14 0.69036609 584 high scalability-2009-04-27-Some Questions from a newbie
15 0.68538505 344 high scalability-2008-06-09-FaceStat's Rousing Tale of Scaling Woe and Wisdom Won
16 0.68392378 512 high scalability-2009-02-14-Scaling Digg and Other Web Applications
17 0.67900616 232 high scalability-2008-01-29-When things aren't scalable
18 0.67769831 298 high scalability-2008-04-07-Lazy web sites run faster
19 0.6739586 1617 high scalability-2014-03-21-Stuff The Internet Says On Scalability For March 21st, 2014
20 0.67367727 118 high scalability-2007-10-09-High Load on production Webservers after Sourcecode sync
topicId topicWeight
[(1, 0.103), (2, 0.192), (10, 0.044), (26, 0.02), (30, 0.047), (40, 0.018), (47, 0.011), (61, 0.097), (73, 0.17), (79, 0.136), (85, 0.03), (94, 0.065)]
simIndex simValue blogId blogTitle
Introduction: It's time to do something a little different and for me that doesn't mean cutting off my hair and joining a monastery, nor does it mean buying a cherry red convertible (yet), it means doing a webinar! On December 14th, 2:00 PM - 3:00 PM EST, I'll be hosting What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications . The webinar is sponsored by VoltDB, but it will be completely vendor independent, as that's the only honor preserving and technically accurate way of doing these things. The webinar will run about 60 minutes, with 40 minutes of speechifying and 20 minutes for questions. The hashtag for the event on Twitter will be SQLNoSQL . I'll be monitoring that hashtag if you have any suggestions for the webinar or if you would like to ask questions during the webinar. The motivation for me to do the webinar was a talk I had with another audience member at the NoSQL Evening in Palo Alto . He said he came from a Java background and was confused ab
Introduction: It's time to do something a little different and for me that doesn't mean cutting off my hair and joining a monastery, nor does it mean buying a cherry red convertible (yet), it means doing a webinar! On December 14th, 2:00 PM - 3:00 PM EST, I'll be hosting What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications . The webinar is sponsored by VoltDB, but it will be completely vendor independent, as that's the only honor preserving and technically accurate way of doing these things. The webinar will run about 60 minutes, with 40 minutes of speechifying and 20 minutes for questions. The hashtag for the event on Twitter will be SQLNoSQL . I'll be monitoring that hashtag if you have any suggestions for the webinar or if you would like to ask questions during the webinar. The motivation for me to do the webinar was a talk I had with another audience member at the NoSQL Evening in Palo Alto . He said he came from a Java background and was confused ab
3 0.93854463 1587 high scalability-2014-01-29-10 Things Bitly Should Have Monitored
Introduction: Monitor, monitor, monitor. That's the advice every startup gives once they reach a certain size. But can you ever monitor enough? If you are Bitly and everyone will complain when you are down, probably not. Here are 10 Things We Forgot to Monitor from Bitly, along with good stories and copious amounts of code snippets. Well worth reading, especially after you've already started monitoring the lower hanging fruit. An interesting revelation from the article is that: We run bitly split across two data centers, one is a managed environment with DELL hardware, and the second is Amazon EC2. Fork Rate . A strange configuration issue caused processes to be created at a rate of several hundred a second rather than the expected 1-10/second. Flow control packets . A network configuration that honors flow control packets and isn’t configured to disable them, can temporarily cause dropped traffic. Swap In/Out Rate . Measure the right thing. It's the rate memory is swapped
4 0.92328495 1175 high scalability-2012-01-17-Paper: Feeding Frenzy: Selectively Materializing Users’ Event Feeds
Introduction: How do you scale an inbox that has multiple highly volatile feeds? That's a problem faced by social networks like Tumblr, Facebook, and Twitter. Follow a few hundred event sources and it's hard to scalably order an inbox so that you see a correct view as event sources continually publish new events. This can be considered like a view materialization problem in a database. In a database a view is a virtual table defined by a query that can be accessed like a table. Materialization refers to when the data behind the view is created. If a view is a join on several tables and that join is performed when the view is accessed, then performance will be slow. If the view is precomputed access to the view will be fast, but more resources are used, especially considering that the view may never be accessed. Your wall/inbox/stream is a view on all the people/things you follow. If you never look at your inbox then materializing the view in your inbox is a waste of resources, yet you'll be ma
5 0.91150039 980 high scalability-2011-01-28-Stuff The Internet Says On Scalability For January 28, 2011
Introduction: Submitted for your reading pleasure... Something we get to say more often than you might expect - funny NoSQL comic: How to Write a CV (SFW) Playtomic shows hows how to handle over 300 million events per day, in real time, on a budget . More Speed, at $80,000 a Millisecond . Does latency matter ? Oh yes... “On the Chicago to New York route in the US, three milliseconds can mean the difference between US$2,000 a month and US$250,000 a month.” Quotable Quotes @jkalucki : Throwing 1,920 CPUs and 4TB of RAM at an annoyance, as you do. @jointheflock @hkanji : Scale can come quick and come hard. Be prepared. @elenacarstoiu : When you say #Cloud, everybody's thinking lower cost. Agility, scalability and fast access are advantages far more important. @BillGates : From Melinda - Research proves we can save newborn lives at scale Kosmix with a fascinating look at Cassandra on SSD , summarizing some of what they've learned over the past year runni
same-blog 6 0.90861261 33 high scalability-2007-07-26-ThemBid Architecture
7 0.8964358 125 high scalability-2007-10-18-another approach to replication
8 0.89471519 471 high scalability-2008-12-19-Gigaspaces curbs latency outliers with Java Real Time
9 0.89371979 795 high scalability-2010-03-16-1 Billion Reasons Why Adobe Chose HBase
10 0.89159966 284 high scalability-2008-03-19-RAD Lab is Creating a Datacenter Operating System
11 0.88567752 1642 high scalability-2014-05-02-Stuff The Internet Says On Scalability For May 2nd, 2014
12 0.87698191 1196 high scalability-2012-02-20-Berkeley DB Architecture - NoSQL Before NoSQL was Cool
13 0.87251127 709 high scalability-2009-09-19-Space Based Programming in .NET
14 0.86694747 1313 high scalability-2012-08-28-Making Hadoop Run Faster
15 0.8491528 986 high scalability-2011-02-10-Database Isolation Levels And Their Effects on Performance and Scalability
16 0.84729463 1649 high scalability-2014-05-16-Stuff The Internet Says On Scalability For May 16th, 2014
17 0.84687757 181 high scalability-2007-12-11-Hosting and CDN for startup video sharing site
18 0.84613621 1112 high scalability-2011-09-07-What Google App Engine Price Changes Say About the Future of Web Architecture
19 0.84566516 890 high scalability-2010-09-01-Paper: The Case for Determinism in Database Systems
20 0.84527659 301 high scalability-2008-04-08-Google AppEngine - A First Look