high_scalability high_scalability-2009 high_scalability-2009-492 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: The most important aspect of a scalable web architecture is data partitioning. Most components in a modern data center are completely stateless, meaning they just do batches of work that is handed to them, but don't store any data long-term. This is true of most web application servers, caches like memcached, and all of the network infrastructure that connects them. Data storage is becoming a specialized function, delegated most often to relational databases. This makes sense, because stateless servers are easiest to scale - you just keep adding more. Since they don't store anything, failures are easy to handle too - just take it out of rotation. Stateful servers require more careful attention. If you are storing all of your data in a relational database, and the load on that database exceeds its capacity, there is no automatic solution that allows you to simply add more hardware and scale up. (One day, there will be, but that's for another post). In the meantime, most websites
sentIndex sentText sentNum sentScore
1 The most important aspect of a scalable web architecture is data partitioning. [sent-1, score-0.518]
2 Most components in a modern data center are completely stateless, meaning they just do batches of work that is handed to them, but don't store any data long-term. [sent-2, score-1.365]
3 This is true of most web application servers, caches like memcached, and all of the network infrastructure that connects them. [sent-3, score-0.495]
4 Data storage is becoming a specialized function, delegated most often to relational databases. [sent-4, score-0.774]
5 This makes sense, because stateless servers are easiest to scale - you just keep adding more. [sent-5, score-0.835]
6 Since they don't store anything, failures are easy to handle too - just take it out of rotation. [sent-6, score-0.325]
7 Stateful servers require more careful attention. [sent-7, score-0.384]
8 If you are storing all of your data in a relational database, and the load on that database exceeds its capacity, there is no automatic solution that allows you to simply add more hardware and scale up. [sent-8, score-1.22]
9 (One day, there will be, but that's for another post). [sent-9, score-0.064]
10 In the meantime, most websites are building their own scalable clusters using sharding. [sent-10, score-0.311]
wordName wordTfidf (topN-words)
[('stateless', 0.347), ('meantime', 0.278), ('handed', 0.247), ('delegated', 0.247), ('exceeds', 0.201), ('batches', 0.196), ('relational', 0.191), ('easiest', 0.186), ('connects', 0.183), ('meaning', 0.176), ('careful', 0.164), ('aspect', 0.157), ('automatic', 0.15), ('store', 0.142), ('specialized', 0.135), ('caches', 0.132), ('becoming', 0.128), ('servers', 0.127), ('function', 0.124), ('failures', 0.115), ('clusters', 0.111), ('data', 0.111), ('true', 0.108), ('scalable', 0.107), ('modern', 0.106), ('storing', 0.104), ('sense', 0.098), ('blog', 0.096), ('completely', 0.095), ('anything', 0.094), ('require', 0.093), ('websites', 0.093), ('adding', 0.092), ('components', 0.091), ('center', 0.09), ('simply', 0.089), ('memcached', 0.085), ('scale', 0.083), ('since', 0.079), ('add', 0.079), ('capacity', 0.076), ('allows', 0.075), ('often', 0.073), ('web', 0.072), ('database', 0.072), ('important', 0.071), ('easy', 0.068), ('day', 0.066), ('solution', 0.065), ('another', 0.064)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 492 high scalability-2009-01-16-Database Sharding for startups
Introduction: The most important aspect of a scalable web architecture is data partitioning. Most components in a modern data center are completely stateless, meaning they just do batches of work that is handed to them, but don't store any data long-term. This is true of most web application servers, caches like memcached, and all of the network infrastructure that connects them. Data storage is becoming a specialized function, delegated most often to relational databases. This makes sense, because stateless servers are easiest to scale - you just keep adding more. Since they don't store anything, failures are easy to handle too - just take it out of rotation. Stateful servers require more careful attention. If you are storing all of your data in a relational database, and the load on that database exceeds its capacity, there is no automatic solution that allows you to simply add more hardware and scale up. (One day, there will be, but that's for another post). In the meantime, most websites
2 0.13223544 70 high scalability-2007-08-22-How many machines do you need to run your site?
Introduction: Amazingly TechCrunch runs their website on one web server and one database server, according to the fascinating survey What the Web’s most popular sites are running on by Pingdom , a provider of uptime and response time monitoring. Early we learned PlentyOfFish catches and releases many millions of hits a day on just 1 web server and three database servers. Google runs a Dalek army full of servers. YouSendIt , a company making it easy to send and receive large files, has 24 web servers, 3 database servers, 170 storage servers, and a few miscellaneous servers. Vimeo , a video sharing company, has 100 servers for streaming video, 4 web servers, and 2 database servers. Meebo , an AJAX based instant messaging company, uses 40 servers to handle messaging, over 40 web servers, and 10 servers for forums, jabber, testing, and so on. FeedBurner , a news feed management company, has 70 web servers, 15 database servers, and 10 miscellaneous servers. Now
3 0.12016897 881 high scalability-2010-08-16-Scaling an AWS infrastructure - Tools and Patterns
Introduction: This is a guest post by Frédéric Faure (architect at Ysance ), you can follow him on twitter . How do you scale an AWS (Amazon Web Services) infrastructure? This article will give you a detailed reply in two parts: the tools you can use to make the most of Amazon’s dynamic approach, and the architectural model you should adopt for a scalable infrastructure. I base my report on my experience gained in several AWS production projects in casual gaming (Facebook), e-commerce infrastructures and within the mainstream GIS (Geographic Information System). It’s true that my experience in gaming ( IsCool, The Game ) is currently the most representative in terms of scalability, due to the number of users (over 800 thousand DAU – daily active users – at peak usage and over 20 million page views every day), however my experiences in e-commerce and GIS (currently underway) provide a different view of scalability, taking into account the various problems of availability and da
4 0.1140549 1138 high scalability-2011-11-07-10 Core Architecture Pattern Variations for Achieving Scalability
Introduction: Srinath Perera has put together a strong list of architecture patterns based on three meta patterns: distribution, caching, and asynchronous processing. He contends these three are the primal patterns and the following patterns are but different combinations: LB (Load Balancers) + Shared nothing Units . Units that do not share anything with each other fronted with a load balancer that routes incoming messages to a unit based on some criteria. LB + Stateless Nodes + Scalable Storage . Several stateless nodes talking to a scalable storage, and a load balancer distributes load among the nodes. Peer to Peer Architectures (Distributed Hash Table (DHT) and Content Addressable Networks (CAN)) . Algorithm for scaling up logarithmically. Distributed Queues . Queue implementation (FIFO delivery) implemented as a network service. Publish/Subscribe Paradigm . Network publish subscribe brokers that route messages to each other. Gossip and Nature-inspired Architectures . Each
5 0.1107155 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
Introduction: We are on the edge of two potent technological changes: Clouds and Memory Based Architectures. This evolution will rip open a chasm where new players can enter and prosper. Google is the master of disk. You can't beat them at a game they perfected. Disk based databases like SimpleDB and BigTable are complicated beasts, typical last gasp products of any aging technology before a change. The next era is the age of Memory and Cloud which will allow for new players to succeed. The tipping point will be soon. Let's take a short trip down web architecture lane: It's 1993: Yahoo runs on FreeBSD, Apache, Perl scripts and a SQL database It's 1995: Scale-up the database. It's 1998: LAMP It's 1999: Stateless + Load Balanced + Database + SAN It's 2001: In-memory data-grid. It's 2003: Add a caching layer. It's 2004: Add scale-out and partitioning. It's 2005: Add asynchronous job scheduling and maybe a distributed file system. It's 2007: Move it all into the cloud. It's 2008: C
6 0.10583532 589 high scalability-2009-05-05-Drop ACID and Think About Data
7 0.10308553 1068 high scalability-2011-06-27-TripAdvisor Architecture - 40M Visitors, 200M Dynamic Page Views, 30TB Data
8 0.1017696 785 high scalability-2010-02-26-MySQL and Memcached: End of an Era?
9 0.10076656 906 high scalability-2010-09-22-Applying Scalability Patterns to Infrastructure Architecture
10 0.10076249 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
11 0.10065489 554 high scalability-2009-04-04-Digg Architecture
12 0.10052039 448 high scalability-2008-11-22-Google Architecture
13 0.098647356 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
14 0.095409073 1064 high scalability-2011-06-20-35+ Use Cases for Choosing Your Next NoSQL Database
15 0.095385924 327 high scalability-2008-05-27-How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale
16 0.094798557 728 high scalability-2009-10-26-Facebook's Memcached Multiget Hole: More machines != More Capacity
17 0.091310553 250 high scalability-2008-02-17-Web Accelerators - snake oil or miracle remedy?
18 0.088753313 674 high scalability-2009-08-07-The Canonical Cloud Architecture
19 0.087271392 1631 high scalability-2014-04-14-How do you even do anything without using EBS?
20 0.086310051 757 high scalability-2010-01-04-11 Strategies to Rock Your Startup’s Scalability in 2010
topicId topicWeight
[(0, 0.168), (1, 0.058), (2, -0.0), (3, -0.053), (4, -0.008), (5, 0.046), (6, 0.004), (7, -0.099), (8, -0.031), (9, 0.021), (10, 0.022), (11, 0.003), (12, -0.047), (13, 0.01), (14, 0.03), (15, 0.005), (16, 0.038), (17, 0.003), (18, 0.004), (19, 0.043), (20, -0.01), (21, 0.012), (22, 0.032), (23, -0.036), (24, 0.05), (25, -0.046), (26, 0.015), (27, 0.061), (28, -0.018), (29, 0.024), (30, -0.037), (31, 0.026), (32, 0.017), (33, 0.021), (34, -0.041), (35, 0.009), (36, 0.004), (37, -0.011), (38, 0.003), (39, -0.012), (40, 0.025), (41, -0.062), (42, -0.05), (43, 0.012), (44, 0.047), (45, 0.057), (46, 0.016), (47, -0.073), (48, -0.041), (49, 0.001)]
simIndex simValue blogId blogTitle
same-blog 1 0.9697367 492 high scalability-2009-01-16-Database Sharding for startups
Introduction: The most important aspect of a scalable web architecture is data partitioning. Most components in a modern data center are completely stateless, meaning they just do batches of work that is handed to them, but don't store any data long-term. This is true of most web application servers, caches like memcached, and all of the network infrastructure that connects them. Data storage is becoming a specialized function, delegated most often to relational databases. This makes sense, because stateless servers are easiest to scale - you just keep adding more. Since they don't store anything, failures are easy to handle too - just take it out of rotation. Stateful servers require more careful attention. If you are storing all of your data in a relational database, and the load on that database exceeds its capacity, there is no automatic solution that allows you to simply add more hardware and scale up. (One day, there will be, but that's for another post). In the meantime, most websites
2 0.76718175 897 high scalability-2010-09-08-4 General Core Scalability Patterns
Introduction: Jesper Söderlund put together an excellent list of four general scalability patterns and four subpatterns in his post Scalability patterns and an interesting story : Load distribution - Spread the system load across multiple processing units Load balancing / load sharing - Spreading the load across many components with equal properties for handling the request Partitioning - Spreading the load across many components by routing an individual request to a component that owns that data specific Vertical partitioning - Spreading the load across the functional boundaries of a problem space, separate functions being handled by different processing units Horizontal partitioning - Spreading a single type of data element across many instances, according to some partitioning key, e.g. hashing the player id and doing a modulus operation, etc. Quite often referred to as sharding. Queuing and batch - Achieve efficiencies of scale by
3 0.7659089 65 high scalability-2007-08-16-Scaling Secret #2: Denormalizing Your Way to Speed and Profit
Introduction: Alan Watts once observed how after we accepted Descartes' separation of the mind and body we've been trying to smash them back together again ever since when really they were never separate to begin with. The database normalization-denormalization dualism has the same mobius shaped reverberations as Descartes' error. We separate data into a million jagged little pieces and then spend all our time stooping over, picking them and up, and joining them back together again. Normalization has been standard practice now for decades. But times are changing. Many mega-website architects are concluding Watts was right: the data was never separate to begin with. And even more radical, we may even need to store multiple copies of data. Information Sources Normalization Is for Sissies by Pat Helland Data normalization, is it really that good? by Arnon Rotem-Gal-Oz When Not to Normalize your SQL Database by Dare Obasanjo MegaData by Joe Gregorio Audio
4 0.75988609 799 high scalability-2010-03-23-Digg: 4000% Performance Increase by Sorting in PHP Rather than MySQL
Introduction: O'Reilly Radar's James Turner conducted a very informative interview with Joe Stump, current CTO of SimpleGeo and former lead architect at Digg , in which Joe makes some of his usually insightful comments on his experience using Cassandra vs MySQL. As Digg started out with a MySQL oriented architecture and has recently been moving full speed to Cassandra, his observations on some of their lessons learned and the motivation for the move are especially valuable. Here are some of the key takeaways you find useful: Precompute on writes, make reads fast . This is an oldie as a scaling strategy, but it's valuable to see how SimpleGeo is applying it to their problem of finding entities within a certain geographical region. Using Cassandra they've built two clusters: one for indexes and one for records. The records cluster, as you might imagine, is a simple data lookup. The index cluster has a carefully constructed key for every lookup scenario. The indexes are computed on the wr
5 0.75071895 906 high scalability-2010-09-22-Applying Scalability Patterns to Infrastructure Architecture
Introduction: Too often software design patterns are overlooked by network and application delivery network architects but these patterns are often equally applicable to addressing a broad range of architectural challenges in the application delivery tier of the data center. By Lori Mac Vittie, F5 Networks The “ High Scalability ” blog is fast becoming one of my favorite reads. Last week did not disappoint with a post highlighting a set of scalability design patterns that was, apparently, inspired by yet another High Scalability post on “ 6 Ways to Kill Your Servers: Learning to Scale the Hard Way. ” Credit:Michael Chow/azcentral.com This particular post caught my attention primarily because although I’ve touched on many of these patterns in the past, I’ve never thought to call them what they are: scalability patterns. That’s probably a side-effect of forgetting that building an architecture of any kind is at its core computer science and thus
6 0.74381882 391 high scalability-2008-09-23-The 7 Stages of Scaling Web Apps
7 0.73322463 1138 high scalability-2011-11-07-10 Core Architecture Pattern Variations for Achieving Scalability
8 0.73012543 1276 high scalability-2012-07-04-Top Features of a Scalable Database
9 0.72293305 70 high scalability-2007-08-22-How many machines do you need to run your site?
10 0.72084504 658 high scalability-2009-07-17-Against all the odds
11 0.71352315 481 high scalability-2009-01-02-Strategy: Understanding Your Data Leads to the Best Scalability Solutions
12 0.71120065 1161 high scalability-2011-12-22-Architecting Massively-Scalable Near-Real-Time Risk Analysis Solutions
13 0.71015745 72 high scalability-2007-08-22-Wikimedia architecture
14 0.70931453 126 high scalability-2007-10-20-Should you build your next website using 3tera's grid OS?
15 0.709207 151 high scalability-2007-11-12-a8cjdbc - Database Clustering via JDBC
16 0.7083565 1542 high scalability-2013-11-04-ESPN's Architecture at Scale - Operating at 100,000 Duh Nuh Nuhs Per Second
17 0.70710868 310 high scalability-2008-04-29-High performance file server
18 0.70466155 351 high scalability-2008-07-16-The Mother of All Database Normalization Debates on Coding Horror
19 0.70264268 297 high scalability-2008-04-05-Skype Plans for PostgreSQL to Scale to 1 Billion Users
20 0.69996071 1077 high scalability-2011-07-11-ATMCash Exploits Virtualization for Security - Immutability and Reversion
topicId topicWeight
[(1, 0.174), (2, 0.184), (85, 0.521)]
simIndex simValue blogId blogTitle
1 0.97060013 191 high scalability-2007-12-23-Synchronizing Memcached application
Introduction: I have an application with couple of web servers that uses MemcacheD. How can i synchronize concurrent put to the cache? The value of the entry is list. Atomic append operation could have been helpful, but unfortunately memcahe doesn't support atomic append.
2 0.96764588 1049 high scalability-2011-05-31-Awesome List of Advanced Distributed Systems Papers
Introduction: As part of Dr. Indranil Gupta 's CS 525 Spring 2011 Advanced Distributed Systems class, he has collected an incredible list of resources on distributed systems . His research group is also doing some interesting work. The various topics include: Before there Were Clouds, Cloud Computing, P2P Systems, Basic Distributed Computing Concepts, Sensor Networks, Overlays and DHTs, Cloud Programming, Cloud Scheduling, Key-Value Stores, Storage, Sensor Net Routing, Geo-Distribution, P2P Apps, In-network processing, Epidemics, Probabilistic Membership Protocols, Distributed Monitoring and Management, Publish-Subscribe/CDNs, Measurement Studies, Old Wine: Stale or Vintage?, In Byzantium, Cloud Pricing, Other Industrial Systems, Structure of Networks, Completing the Circle, Green Clouds, Distributed Debugging, Flash!, The Middle or the End?, Availability-Aware Systems, Design Methodologies, Handling Stress, Sources of unreliability in networks, Handling Stress, Selfish algorithms, Securi
3 0.95252275 59 high scalability-2007-08-04-Try Squid as a Reverse Proxy
Introduction: This scalability strategy is brought to you by Erik Osterman: My recommendations for anyone dealing with explosive growth on a limited budget with lots of cachable content (e.g. content capable of returning valid expiration headers) is employ a reverse proxy as mentioned in this article. In the last week, we had a site get AP'd, triggering 100K unique visitors to a single IIS server in under 5 hours. It took out the IIS server. Placing a single squid infront of the server handled the entire onslaught with a max server load of 0.10 on a modest Intel IV 3Ghz. It's trivial to implement for anyone interested...
4 0.90975392 102 high scalability-2007-09-27-Product: Sequoia Database Clustering Technology
Introduction: Sequoia is a transparent middleware solution offering clustering, load balancing and failover services for any database. Sequoia is the continuation of the C-JDBC project. The database is distributed and replicated among several nodes and Sequoia balances the queries among these nodes. Sequoia handles node and network failures with transparent failover. It also provides support for hot recovery, online maintenance operations and online upgrades. Features in a nutshell No modification of existing applications or databases. Operational with any database providing a JDBC driver. High availability provided by advanced RAIDb technology. Transparent failover and recovery capabilities. Performance scalability with unique load balancing and query result caching features. Integrated JMX-based administration and monitoring. 100% Java implementation allowing portability across platforms with a JRE 1.4 or greater. Open source licensed under Apache v2 license. Professi
5 0.8943252 820 high scalability-2010-05-03-100 Node Hazelcast cluster on Amazon EC2
Introduction: Deploying, running and monitoring application on a big cluster is a challenging task. Recently Hazelcast team deployed a demo application on Amazon EC2 platform to show how Hazelcast p2p cluster scales and screen recorded the entire process from deployment to monitoring. Hazelcast is open source (Apache License), transactional, distributed caching solution for Java. It is a little more than a cache though as it provides distributed implementation of map, multimap, queue, topic, lock and executor service. Details of running 100 node Hazelcast cluster on Amazon EC2 can be found here . Make sure to watch the screencast !
6 0.89324111 143 high scalability-2007-11-06-Product: ChironFS
7 0.88831985 1039 high scalability-2011-05-12-Paper: Mind the Gap: Reconnecting Architecture and OS Research
8 0.85282695 646 high scalability-2009-07-01-Podcast about Facebook's Cassandra Project and the New Wave of Distributed Databases
9 0.83946234 447 high scalability-2008-11-19-High Definition Video Delivery on the Web?
same-blog 10 0.83288789 492 high scalability-2009-01-16-Database Sharding for startups
11 0.81737125 1032 high scalability-2011-05-02-Stack Overflow Makes Slow Pages 100x Faster by Simple SQL Tuning
12 0.80776584 1500 high scalability-2013-08-12-100 Curse Free Lessons from Gordon Ramsay on Building Great Software
13 0.80091 53 high scalability-2007-08-01-Product: MogileFS
14 0.79590893 1577 high scalability-2014-01-13-NYTimes Architecture: No Head, No Master, No Single Point of Failure
15 0.79317117 1239 high scalability-2012-05-04-Stuff The Internet Says On Scalability For May 4, 2012
16 0.77579141 118 high scalability-2007-10-09-High Load on production Webservers after Sourcecode sync
17 0.75801051 1024 high scalability-2011-04-15-Stuff The Internet Says On Scalability For April 15, 2011
18 0.74602765 1592 high scalability-2014-02-07-Stuff The Internet Says On Scalability For February 7th, 2014
19 0.74234033 638 high scalability-2009-06-26-PlentyOfFish Architecture
20 0.73889196 1361 high scalability-2012-11-22-Gone Fishin': PlentyOfFish Architecture