high_scalability high_scalability-2013 high_scalability-2013-1473 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: In the moral realm there may be 7 deadly sins, but scalability maven Sean Hull has come up Five More Things Deadly to Scalability that when added to his earlier 5 Things That are Toxic to Scalability , make for a numerologically satisfying 10 sins again scalability: Slow Disk I/O – RAID 5 – Multi-tenant EBS . Use RAID 10, it provides good protection along with good read and write performance. The design of RAID 5 means poor performance and long repair times on failure. On AWS consider Provisioned IOPS as a way around IO bottlenecks. Using the database for Queuing. The database may seem like the perfect place to keep work queues, but under load locking and scanning overhead kills performance. Use specialized products like RabbitMQ and SQS to remove this bottleneck. Using Database for full-text searching. Search seems like another perfect database feature. At scale search doesn't perform well. Use specialized technologies like Solr or Sphinx. Insufficient Caching
sentIndex sentText sentNum sentScore
1 Use RAID 10, it provides good protection along with good read and write performance. [sent-2, score-0.087]
2 The design of RAID 5 means poor performance and long repair times on failure. [sent-3, score-0.177]
3 The database may seem like the perfect place to keep work queues, but under load locking and scanning overhead kills performance. [sent-6, score-0.769]
4 Use specialized products like RabbitMQ and SQS to remove this bottleneck. [sent-7, score-0.248]
5 Use a page like cache like Varnish between users and your webserver. [sent-14, score-0.194]
6 Rewrite problem code instead of continually paying a implementation tax for poorly written code. [sent-17, score-0.399]
7 Locks are like stop signs, traffic circles keep the traffic flowing. [sent-22, score-0.402]
8 Row level locking is better than table level locking. [sent-23, score-0.246]
9 Create parallel databases and let a driver select between them. [sent-28, score-0.237]
10 Be able to turn off features via a flag so when a spike hits features can be turned off to reduce load. [sent-32, score-0.217]
wordName wordTfidf (topN-words)
[('sins', 0.311), ('deadly', 0.269), ('locking', 0.246), ('raid', 0.233), ('specialized', 0.151), ('select', 0.147), ('circles', 0.143), ('toxic', 0.138), ('maven', 0.135), ('satisfying', 0.128), ('choke', 0.126), ('hull', 0.126), ('flag', 0.123), ('poorly', 0.123), ('moral', 0.123), ('signs', 0.121), ('realm', 0.119), ('scanning', 0.114), ('sean', 0.114), ('tax', 0.113), ('perfect', 0.108), ('pays', 0.108), ('sqs', 0.105), ('serial', 0.104), ('kills', 0.103), ('repair', 0.103), ('scalability', 0.103), ('provisioned', 0.102), ('database', 0.101), ('rabbitmq', 0.1), ('solr', 0.1), ('like', 0.097), ('coupled', 0.096), ('async', 0.094), ('spike', 0.094), ('varnish', 0.093), ('driver', 0.09), ('earlier', 0.088), ('iops', 0.088), ('use', 0.088), ('protection', 0.087), ('html', 0.085), ('proper', 0.085), ('paying', 0.084), ('row', 0.083), ('eventual', 0.081), ('traffic', 0.081), ('continually', 0.079), ('caching', 0.077), ('poor', 0.074)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 1473 high scalability-2013-06-10-The 10 Deadly Sins Against Scalability
Introduction: In the moral realm there may be 7 deadly sins, but scalability maven Sean Hull has come up Five More Things Deadly to Scalability that when added to his earlier 5 Things That are Toxic to Scalability , make for a numerologically satisfying 10 sins again scalability: Slow Disk I/O – RAID 5 – Multi-tenant EBS . Use RAID 10, it provides good protection along with good read and write performance. The design of RAID 5 means poor performance and long repair times on failure. On AWS consider Provisioned IOPS as a way around IO bottlenecks. Using the database for Queuing. The database may seem like the perfect place to keep work queues, but under load locking and scanning overhead kills performance. Use specialized products like RabbitMQ and SQS to remove this bottleneck. Using Database for full-text searching. Search seems like another perfect database feature. At scale search doesn't perform well. Use specialized technologies like Solr or Sphinx. Insufficient Caching
2 0.44003043 1121 high scalability-2011-09-21-5 Scalability Poisons and 3 Cloud Scalability Antidotes
Introduction: Sean Hull with two helpful posts: 5 Things That are Toxic to Scalability : Object Relational Mappers. Create complex queries that hard to optimize and tweak. Synchronous, Serial, Coupled or Locking Processes. Locks are like stop signs, traffic circles keep the traffic flowing. Row level locking is better than table level locking. Use async replication. Use eventual consistency for clusters. One Copy of Your Database. A single database server is a choke point. Create parallel databases and let a driver select between them. Having No Metrics. Visualize what's happening to your system using one of the many monitoring packages. Lack of Feature Flags. Be able to turn off features via a flag so when a spike hits features can be turned off to reduce load. 3 Ways to Boost Cloud Scalability : Use Auto-scaling. Spin-up new instances when a threshold is passed and back down again when traffic drops. Horizontally Scale the Database Tier. MySQL in a master
3 0.1439849 1508 high scalability-2013-08-28-Sean Hull's 20 Biggest Bottlenecks that Reduce and Slow Down Scalability
Introduction: This article is a lightly edited version of 20 Obstacles to Scalability by Sean Hull ( with permission) from the always excellent and thought provoking ACM Queue . 1. TWO-PHASE COMMIT Normally when data is changed in a database, it is written both to memory and to disk. When a commit happens, a relational database makes a commitment to freeze the data somewhere on real storage media. Remember, memory doesn't survive a crash or reboot. Even if the data is cached in memory, the database still has to write it to disk. MySQL binary logs or Oracle redo logs fit the bill. With a MySQL cluster or distributed file system such as DRBD (Distributed Replicated Block Device) or Amazon Multi-AZ (Multi-Availability Zone), a commit occurs not only locally, but also at the remote end. A two-phase commit means waiting for an acknowledgment from the far end. Because of network and other latency, those commits can be slowed down by milliseconds, as though all the cars on a highway were slowe
4 0.13216104 1279 high scalability-2012-07-09-Data Replication in NoSQL Databases
Introduction: This is the third guest post ( part 1 , part 2 ) of a series by Greg Lindahl, CTO of blekko, the spam free search engine. Previously, Greg was Founder and Distinguished Engineer at PathScale, at which he was the architect of the InfiniPath low-latency InfiniBand HCA, used to build tightly-coupled supercomputing clusters. blekko's home-grown NoSQL database was designed from the start to support a web-scale search engine, with 1,000s of servers and petabytes of disk. Data replication is a very important part of keeping the database up and serving queries. Like many NoSQL database authors, we decided to keep R=3 copies of each piece of data in the database, and not use RAID to improve reliability. The key goal we were shooting for was a database which degrades gracefully when there are many small failures over time, without needing human intervention. Why don't we like RAID for big NoSQL databases? Most big storage systems use RAID levels like 3, 4, 5, or 10 to improve relia
5 0.13020517 53 high scalability-2007-08-01-Product: MogileFS
Introduction: MogileFS is an open source distributed filesystem. Its properties and features include: Application level, No single point of failure, Automatic file replication, Better than RAID, Flat Namespace, Shared-Nothing, No RAID required, Local filesystem agnostic.
6 0.11336645 1114 high scalability-2011-09-13-Must see: 5 Steps to Scaling MongoDB (Or Any DB) in 8 Minutes
8 0.10540308 1378 high scalability-2012-12-28-Stuff The Internet Says On Scalability For December 28, 2012
9 0.10139599 1204 high scalability-2012-03-06-Ask For Forgiveness Programming - Or How We'll Program 1000 Cores
10 0.100256 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
11 0.098503396 825 high scalability-2010-05-10-Sify.com Architecture - A Portal at 3900 Requests Per Second
12 0.096644677 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
13 0.094505347 1065 high scalability-2011-06-21-Running TPC-C on MySQL-RDS
14 0.093633331 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
15 0.093582571 1101 high scalability-2011-08-19-Stuff The Internet Says On Scalability For August 19, 2011
16 0.093174219 313 high scalability-2008-05-02-Friends for Sale Architecture - A 300 Million Page View-Month Facebook RoR App
17 0.092047408 1032 high scalability-2011-05-02-Stack Overflow Makes Slow Pages 100x Faster by Simple SQL Tuning
18 0.091913357 1233 high scalability-2012-04-25-The Anatomy of Search Technology: blekko’s NoSQL database
19 0.090755396 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
20 0.090072848 849 high scalability-2010-06-28-VoltDB Decapitates Six SQL Urban Myths and Delivers Internet Scale OLTP in the Process
topicId topicWeight
[(0, 0.173), (1, 0.09), (2, -0.052), (3, -0.049), (4, 0.02), (5, 0.046), (6, 0.001), (7, -0.043), (8, -0.013), (9, -0.072), (10, -0.028), (11, -0.035), (12, -0.059), (13, 0.014), (14, -0.035), (15, -0.03), (16, 0.006), (17, 0.026), (18, -0.029), (19, -0.001), (20, 0.021), (21, 0.002), (22, 0.02), (23, 0.003), (24, -0.035), (25, -0.023), (26, -0.01), (27, 0.048), (28, -0.017), (29, 0.012), (30, -0.061), (31, 0.036), (32, 0.007), (33, 0.011), (34, -0.006), (35, -0.022), (36, 0.016), (37, -0.023), (38, -0.028), (39, -0.089), (40, 0.03), (41, -0.025), (42, -0.098), (43, -0.031), (44, -0.047), (45, -0.025), (46, 0.043), (47, 0.057), (48, 0.006), (49, 0.08)]
simIndex simValue blogId blogTitle
same-blog 1 0.94776213 1473 high scalability-2013-06-10-The 10 Deadly Sins Against Scalability
Introduction: In the moral realm there may be 7 deadly sins, but scalability maven Sean Hull has come up Five More Things Deadly to Scalability that when added to his earlier 5 Things That are Toxic to Scalability , make for a numerologically satisfying 10 sins again scalability: Slow Disk I/O – RAID 5 – Multi-tenant EBS . Use RAID 10, it provides good protection along with good read and write performance. The design of RAID 5 means poor performance and long repair times on failure. On AWS consider Provisioned IOPS as a way around IO bottlenecks. Using the database for Queuing. The database may seem like the perfect place to keep work queues, but under load locking and scanning overhead kills performance. Use specialized products like RabbitMQ and SQS to remove this bottleneck. Using Database for full-text searching. Search seems like another perfect database feature. At scale search doesn't perform well. Use specialized technologies like Solr or Sphinx. Insufficient Caching
2 0.84122878 1121 high scalability-2011-09-21-5 Scalability Poisons and 3 Cloud Scalability Antidotes
Introduction: Sean Hull with two helpful posts: 5 Things That are Toxic to Scalability : Object Relational Mappers. Create complex queries that hard to optimize and tweak. Synchronous, Serial, Coupled or Locking Processes. Locks are like stop signs, traffic circles keep the traffic flowing. Row level locking is better than table level locking. Use async replication. Use eventual consistency for clusters. One Copy of Your Database. A single database server is a choke point. Create parallel databases and let a driver select between them. Having No Metrics. Visualize what's happening to your system using one of the many monitoring packages. Lack of Feature Flags. Be able to turn off features via a flag so when a spike hits features can be turned off to reduce load. 3 Ways to Boost Cloud Scalability : Use Auto-scaling. Spin-up new instances when a threshold is passed and back down again when traffic drops. Horizontally Scale the Database Tier. MySQL in a master
Introduction: What do you get when you take a SQL database and start a new implementation from scratch, taking advantage of the latest research and modern hardware? Mike Stonebraker , the sword wielding Johnny Appleseed of the database world, hopes you get something like his new database, VoltDB : a pure SQL, pure ACID, pure OLTP, shared nothing, sharded, scalable, lockless, open source, in-memory DBMS, purpose-built for running hundreds of thousands of transactions a second. VoltDB claims to be 100 times faster than MySQL, up to 13 times faster than Cassandra , and 45 times faster than Oracle, with near-linear scaling. Will VoltDB kill off the new NoSQL upstarts? Will VoltDB cause a mass extinction of ancient databases? Probably no and no to both questions, but it's a product with a definite point-of-view and is worth a look as the transaction component in your system. But will it be right for you? Let's see... I first heard the details about VoltDB at Gluecon , where Mr. Stonebraker pres
4 0.65994298 1101 high scalability-2011-08-19-Stuff The Internet Says On Scalability For August 19, 2011
Introduction: You may not scale often, but when you scale, please drink HighScalability: Akamai: - 95,811 Servers, 1,000 Networks, 70 Countries . Quotably quotable quotes: @segphault : Linus talking about the kernel's scalability. Beneficial to have one kernel used from embedded to high-end bc improvements span use cases. suspended : I am sure that scalability is the future, there are just too many platforms and screen sizes out there @russferriday : Just completed a proposal for a rare bird data gathering system using #CouchDB *and* #Cassandra. Nice project. #NoSQL @drelu : Oracle - everything is very convenient until it fails. #nosql How do you model Google+ circles with MongoDB? Some ideas in this Google Groups thread . More on MongoDB with Mat Wall explaining Why I Chose MongoDB for guardian.co.uk . ACM SIGCOMM Test of Time Paper Award . Award winning papers through the years. A lot of good ones, worth a peruse. Read Amplification Factor . Mark C
5 0.65621638 1318 high scalability-2012-09-07-Stuff The Internet Says On Scalability For September 7, 2012
Introduction: It's HighScalability Time: Quotable Quotes: Where did all the supercomputers go? Inside Intel. @Jacattell : I love the smell of high scalability in the morning :-) @nkohari : Post on HN about GitHub scalability. Top comment? “…someone wasted valuable time making the dashboard look so pretty” Evolution of SoundCloud’s Architecture : The way we develop SoundCloud is to identify the points of scale then isolate and optimize the read and write paths individually, in anticipation of the next magnitude of growth. How We Build Our 60-Node (Almost Distributed Web Crawler . Semantics3 crawls 1-3 million pages a day at a cost of ~$3 a day (excluding storage costs) using micro-instances, Grearman, redis, perl, chef, and capistrano. Werner Vogels continues his 50 Shades of Programming book club with Back-to-Basics Weekend Reading - Granularity of locks . Highlight is a touching remembrance of Jim Gray. Speaking of locks and stories, the MySQL Per
6 0.6551283 602 high scalability-2009-05-17-Scaling Django Web Apps by Mike Malone
7 0.65286505 1080 high scalability-2011-07-15-Stuff The Internet Says On Scalability For July 15, 2011
8 0.65130496 1026 high scalability-2011-04-18-6 Ways Not to Scale that Will Make You Hip, Popular and Loved By VCs
9 0.64789063 799 high scalability-2010-03-23-Digg: 4000% Performance Increase by Sorting in PHP Rather than MySQL
10 0.64775515 1508 high scalability-2013-08-28-Sean Hull's 20 Biggest Bottlenecks that Reduce and Slow Down Scalability
11 0.64726889 1633 high scalability-2014-04-16-Six Lessons Learned the Hard Way About Scaling a Million User System
12 0.64442843 1114 high scalability-2011-09-13-Must see: 5 Steps to Scaling MongoDB (Or Any DB) in 8 Minutes
13 0.64438421 1065 high scalability-2011-06-21-Running TPC-C on MySQL-RDS
14 0.64428371 1097 high scalability-2011-08-12-Stuff The Internet Says On Scalability For August 12, 2011
15 0.64147979 1322 high scalability-2012-09-14-Stuff The Internet Says On Scalability For September 14, 2012
16 0.64129698 1334 high scalability-2012-10-04-Stuff The Internet Says On Scalability For October 5, 2012
17 0.63963592 828 high scalability-2010-05-17-7 Lessons Learned While Building Reddit to 270 Million Page Views a Month
18 0.63025355 1032 high scalability-2011-05-02-Stack Overflow Makes Slow Pages 100x Faster by Simple SQL Tuning
19 0.62842667 1443 high scalability-2013-04-19-Stuff The Internet Says On Scalability For April 19, 2013
20 0.62725389 1279 high scalability-2012-07-09-Data Replication in NoSQL Databases
topicId topicWeight
[(1, 0.177), (2, 0.257), (10, 0.049), (30, 0.018), (38, 0.079), (61, 0.064), (79, 0.029), (85, 0.089), (96, 0.152)]
simIndex simValue blogId blogTitle
same-blog 1 0.93025327 1473 high scalability-2013-06-10-The 10 Deadly Sins Against Scalability
Introduction: In the moral realm there may be 7 deadly sins, but scalability maven Sean Hull has come up Five More Things Deadly to Scalability that when added to his earlier 5 Things That are Toxic to Scalability , make for a numerologically satisfying 10 sins again scalability: Slow Disk I/O – RAID 5 – Multi-tenant EBS . Use RAID 10, it provides good protection along with good read and write performance. The design of RAID 5 means poor performance and long repair times on failure. On AWS consider Provisioned IOPS as a way around IO bottlenecks. Using the database for Queuing. The database may seem like the perfect place to keep work queues, but under load locking and scanning overhead kills performance. Use specialized products like RabbitMQ and SQS to remove this bottleneck. Using Database for full-text searching. Search seems like another perfect database feature. At scale search doesn't perform well. Use specialized technologies like Solr or Sphinx. Insufficient Caching
2 0.91628724 348 high scalability-2008-07-09-Federation at Flickr: Doing Billions of Queries Per Day
Introduction: Flickr's lone database guy Dathan Pattishall made his excellent presentation available on how on how Flickr scales its backend to handle tremendous loads. Some of this information is available in Flickr Architecture , but the paper is so good it's worth another read. If you want to see sharding done right, at scale, take a look.
3 0.90172815 1528 high scalability-2013-10-07-Ask HS: Is Microsoft the Right Technology for a Scalable Web-based System?
Introduction: This question was asked over email and I thought a larger audience might want to take a whack at it. I have a problem I’d like to have your view on. I’ve looked around a lot, and I haven’t found a definite answer. The question is this: Is it true that for a scalable web-based system targeting millions of users (hopefully), using Microsoft technology(.Net/SQL Server) over open source technologies like python/ruby/php and mysql (mariadb) / postgresql will cost you more? Is there any justification for paying up for Microsoft licenses(OS, SQL Server, …)? I am in charge of selecting the technology toolbox for a startup which is going to build a scalable public web platform. I’ve worked as a developer and database developer/admin (mainly as a DBA) using different platforms and technologies, but my main focus is on Microsoft technology. I’ve considered all other important factors for this decision, and at the end, I always come back to the question of money. When I finish developing th
4 0.90075672 1212 high scalability-2012-03-21-The Conspecific Hybrid Cloud
Introduction: When you’re looking to add new tank mates to an existing aquarium ecosystem, one of the concerns you must have is whether a particular breed of fish is amenable to conspecific cohabitants. Many species are not, which means if you put them together in a confined space, they’re going to fight. Viciously. To the death. Responsible aquarists try to avoid such situations, so careful attention to the conspecificity of animals is a must. Now, while in many respects the data center ecosystem correlates well to an aquarium ecosystem, in this case it does not. It’s what you usually get, today, but its not actually the best model. That’s because what you want in the data center ecosystem – particularly when it extends to include public cloud computing resources – is conspecificity in infrastructure. This desire and practice is being seen both in enterprise data center decision making as well as in startups suddenly dealing with massive growth and increasingly encountering pe
5 0.89706236 828 high scalability-2010-05-17-7 Lessons Learned While Building Reddit to 270 Million Page Views a Month
Introduction: Steve Huffman , co-founder of social news site Reddit , gave an excellent presentation ( slides , transcript ) on the lessons he learned while building and growing Reddit to 7.5 million users per month, 270 million page views per month, and 20+ database servers. Steve says a lot of the lessons were really obvious, so you may not find a lot of completely new ideas in the presentation. But Steve has an earnestness and genuineness about him that is so obviously grounded in experience that you can't help but think deeply about what you could be doing different. And if Steve didn't know about these lessons, I'm betting others don't either. There are seven lessons, each has their own summary section: Lesson one: Crash Often; Lesson 2: Separation of Services; Lesson 3: Open Schema; Lesson 4: Keep it Stateless; Lesson 5: Memcache; Lesson 6: Store Redundant Data; Lesson 7: Work Offline. By far the most surprising feature of their architecture is in Lesson Six, whose essential idea is:
6 0.89581555 868 high scalability-2010-07-30-Basho Lives up to their Name With Consistent Smashing
7 0.8945027 1549 high scalability-2013-11-15-Stuff The Internet Says On Scalability For November 15th, 2013
8 0.88546377 422 high scalability-2008-10-17-Scaling Spam Eradication Using Purposeful Games: Die Spammer Die!
9 0.883331 281 high scalability-2008-03-18-Database Design 101
10 0.88218039 397 high scalability-2008-09-28-Product: Happy = Hadoop + Python
11 0.88141084 1511 high scalability-2013-09-04-Wide Fast SATA: the Recipe for Hot Performance
12 0.8784368 1121 high scalability-2011-09-21-5 Scalability Poisons and 3 Cloud Scalability Antidotes
13 0.87470824 117 high scalability-2007-10-08-Paper: Understanding and Building High Availability-Load Balanced Clusters
14 0.87418884 162 high scalability-2007-11-20-what is j2ee stack
15 0.87370896 1052 high scalability-2011-06-03-Stuff The Internet Says On Scalability For June 3, 2011
16 0.86778259 1228 high scalability-2012-04-16-Instagram Architecture Update: What’s new with Instagram?
17 0.86583662 313 high scalability-2008-05-02-Friends for Sale Architecture - A 300 Million Page View-Month Facebook RoR App
18 0.86534536 151 high scalability-2007-11-12-a8cjdbc - Database Clustering via JDBC
19 0.86464369 821 high scalability-2010-05-03-MocoSpace Architecture - 3 Billion Mobile Page Views a Month
20 0.86462146 1401 high scalability-2013-02-06-Super Bowl Advertisers Ready for the Traffic? Nope..It's Lights Out.