high_scalability high_scalability-2009 high_scalability-2009-668 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: These are from Laura Thomson of OmniTi : Profile early, profile often. Pick a profiling tool and learn it in and out. Dev-ops cooperation is essential. The most critical difference in organizations that handles crises well. Test on production data. Code behavior (especially performance) is often data driven. Track and trend. Understanding your historical performance characteristics is essential for spotting emerging problems. Assumptions will burn you. Systems are complex and often break in unexpected ways. Decouple. Isolate performance failures. Cache. Caching is the core of most optimizations. Federate. Data federation is taking a single data set and spreading it across multiple database/application servers. Replicate. Replication is making synchronized copies of data available in more than one place. Avoid straining hard-to-scale resources. Some resources are inherently hard to scale: Uncacheable’ data, Data with a very high read+write rate
sentIndex sentText sentNum sentScore
1 These are from Laura Thomson of OmniTi : Profile early, profile often. [sent-1, score-0.179]
2 The most critical difference in organizations that handles crises well. [sent-4, score-0.524]
3 Code behavior (especially performance) is often data driven. [sent-6, score-0.363]
4 Understanding your historical performance characteristics is essential for spotting emerging problems. [sent-8, score-0.493]
5 Systems are complex and often break in unexpected ways. [sent-10, score-0.304]
6 Data federation is taking a single data set and spreading it across multiple database/application servers. [sent-16, score-0.501]
7 Replication is making synchronized copies of data available in more than one place. [sent-18, score-0.403]
8 Some resources are inherently hard to scale: Uncacheable’ data, Data with a very high read+write rate, Non-federatable data, Data in a black-box Use a compiler cache. [sent-20, score-0.375]
9 A compiler cache sits inside the engine and caches the parsed optrees. [sent-21, score-0.702]
10 External data (RDBMS, App Server, 3rd Party data feeds) are the number one cause of application bottlenecks. [sent-23, score-0.379]
11 Don’t try to work around perceived inefficiencies in PHP (at least not in userspace code! [sent-27, score-0.511]
12 Caching is the most important tool in your tool box. [sent-29, score-0.362]
wordName wordTfidf (topN-words)
[('recursive', 0.324), ('compiler', 0.242), ('outsmart', 0.204), ('userspace', 0.204), ('mindful', 0.183), ('crises', 0.183), ('thomson', 0.183), ('tool', 0.181), ('profile', 0.179), ('inefficiencies', 0.166), ('parsed', 0.162), ('cooperation', 0.162), ('looping', 0.162), ('burn', 0.159), ('data', 0.153), ('federation', 0.153), ('synchronized', 0.147), ('perceived', 0.141), ('omniti', 0.134), ('inherently', 0.133), ('sits', 0.131), ('spreading', 0.129), ('profiling', 0.116), ('historical', 0.111), ('unexpected', 0.11), ('behavior', 0.109), ('characteristics', 0.105), ('essential', 0.105), ('copies', 0.103), ('organizations', 0.102), ('rdbms', 0.102), ('often', 0.101), ('emerging', 0.101), ('feeds', 0.097), ('break', 0.093), ('caches', 0.091), ('external', 0.087), ('difference', 0.085), ('party', 0.083), ('heavy', 0.081), ('critical', 0.077), ('handles', 0.077), ('inside', 0.076), ('cause', 0.073), ('performance', 0.071), ('code', 0.07), ('php', 0.069), ('rate', 0.068), ('taking', 0.066), ('early', 0.066)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 668 high scalability-2009-08-01-15 Scalability and Performance Best Practices
Introduction: These are from Laura Thomson of OmniTi : Profile early, profile often. Pick a profiling tool and learn it in and out. Dev-ops cooperation is essential. The most critical difference in organizations that handles crises well. Test on production data. Code behavior (especially performance) is often data driven. Track and trend. Understanding your historical performance characteristics is essential for spotting emerging problems. Assumptions will burn you. Systems are complex and often break in unexpected ways. Decouple. Isolate performance failures. Cache. Caching is the core of most optimizations. Federate. Data federation is taking a single data set and spreading it across multiple database/application servers. Replicate. Replication is making synchronized copies of data available in more than one place. Avoid straining hard-to-scale resources. Some resources are inherently hard to scale: Uncacheable’ data, Data with a very high read+write rate
2 0.091558918 897 high scalability-2010-09-08-4 General Core Scalability Patterns
Introduction: Jesper Söderlund put together an excellent list of four general scalability patterns and four subpatterns in his post Scalability patterns and an interesting story : Load distribution - Spread the system load across multiple processing units Load balancing / load sharing - Spreading the load across many components with equal properties for handling the request Partitioning - Spreading the load across many components by routing an individual request to a component that owns that data specific Vertical partitioning - Spreading the load across the functional boundaries of a problem space, separate functions being handled by different processing units Horizontal partitioning - Spreading a single type of data element across many instances, according to some partitioning key, e.g. hashing the player id and doing a modulus operation, etc. Quite often referred to as sharding. Queuing and batch - Achieve efficiencies of scale by
3 0.091079809 672 high scalability-2009-08-06-An Unorthodox Approach to Database Design : The Coming of the Shard
Introduction: Update 4: Why you don’t want to shard. by Morgon on the MySQL Performance Blog. Optimize everything else first, and then if performance still isn’t good enough, it’s time to take a very bitter medicine. Update 3: Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes by Dare Obasanjo. Excellent discussion of why and when you would choose a sharding architecture, how to shard, and problems with sharding. Update 2: Mr. Moore gets to punt on sharding by Alan Rimm-Kaufman of 37signals. Insightful article on design tradeoffs and the evils of premature optimization. With more memory, more CPU, and new tech like SSD, problems can be avoided before more exotic architectures like sharding are needed. Add features not infrastructure. Jeremy Zawodny says he's wrong wrong wrong. we're running multi-core CPUs at slower clock speeds. Moore won't save you. Update: Dan Pritchett shares some excellent Sharding Lessons : Size Your Shards, Use Math on Shard C
4 0.087795556 859 high scalability-2010-07-14-DynaTrace's Top 10 Performance Problems taken from Zappos, Monster, Thomson and Co
Introduction: DynaTrace in Top 10 Performance Problems taken from Zappos, Monster, Thomson and Co , has provided a useful compilation of performance problems, with potential solutions, that they've found while working with their clients. Too Many Database Calls - too many database query per request/transaction. Synchronized to Death - in a high-load or production environment over-synchronization results in severe performance and scalability problems . Too chatty on the remoting channels - too many calls across these remoting boundaries and in the end causes performance and scalability problems. Wrong usage of O/R-Mappers - incorrect usage of the framework itself too often results in unexpected performance and scalability problems within these frameworks. Memory Leaks - GC does not prevent memory leaks, it is important to release object references as soon as they are no longer needed. Problematic 3rd Party Code/Components - check of every framework before introducing i
5 0.078987703 1246 high scalability-2012-05-16-Big List of 20 Common Bottlenecks
Introduction: In Zen And The Art Of Scaling - A Koan And Epigram Approach , Russell Sullivan offered an interesting conjecture: there are 20 classic bottlenecks. This sounds suspiciously like the idea that there only 20 basic story plots . And depending on how you chunkify things, it may be true, but in practice we all know bottlenecks come in infinite flavors, all tasting of sour and ash. One day Aurelien Broszniowski from Terracotta emailed me his list of bottlenecks, we cc’ed Russell in on the conversation, he gave me his list, I have a list, and here’s the resulting stone soup. Russell said this is his “I wish I knew when I was younger" list and I think that’s an enriching way to look at it. The more experience you have, the more different types of projects you tackle, the more lessons you’ll be able add to a list like this. So when you read this list, and when you make your own, you are stepping through years of accumulated experience and more than a little frustration, but in ea
6 0.078562766 71 high scalability-2007-08-22-Profiling WEB applications
7 0.078353807 89 high scalability-2007-09-10-Is there a difference between partitioning and federation and sharding?
8 0.077818528 274 high scalability-2008-03-12-YouTube Architecture
9 0.07371515 771 high scalability-2010-02-04-Hot Scalability Links for February 4, 2010
12 0.071874 517 high scalability-2009-02-21-Google AppEngine - A Second Look
13 0.070139997 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
14 0.069750339 88 high scalability-2007-09-10-Blog: Scalable Web Architectures by Royans Tharakan
18 0.069304675 1259 high scalability-2012-06-07-3 Secrets to Lightning Fast Mobile Design at Instagram
19 0.06844373 464 high scalability-2008-12-13-Strategy: Facebook Tweaks to Handle 6 Time as Many Memcached Requests
topicId topicWeight
[(0, 0.129), (1, 0.037), (2, -0.034), (3, -0.019), (4, 0.018), (5, 0.043), (6, 0.04), (7, 0.003), (8, -0.017), (9, -0.001), (10, -0.005), (11, -0.003), (12, -0.001), (13, 0.023), (14, 0.002), (15, -0.03), (16, 0.023), (17, -0.035), (18, 0.05), (19, 0.016), (20, -0.03), (21, 0.018), (22, 0.045), (23, 0.014), (24, -0.016), (25, 0.025), (26, -0.046), (27, 0.002), (28, -0.014), (29, 0.01), (30, -0.013), (31, -0.003), (32, 0.027), (33, 0.04), (34, -0.015), (35, 0.024), (36, 0.013), (37, -0.01), (38, -0.001), (39, 0.019), (40, -0.06), (41, -0.004), (42, -0.001), (43, -0.008), (44, 0.041), (45, -0.017), (46, -0.021), (47, -0.021), (48, -0.009), (49, -0.007)]
simIndex simValue blogId blogTitle
same-blog 1 0.89745438 668 high scalability-2009-08-01-15 Scalability and Performance Best Practices
Introduction: These are from Laura Thomson of OmniTi : Profile early, profile often. Pick a profiling tool and learn it in and out. Dev-ops cooperation is essential. The most critical difference in organizations that handles crises well. Test on production data. Code behavior (especially performance) is often data driven. Track and trend. Understanding your historical performance characteristics is essential for spotting emerging problems. Assumptions will burn you. Systems are complex and often break in unexpected ways. Decouple. Isolate performance failures. Cache. Caching is the core of most optimizations. Federate. Data federation is taking a single data set and spreading it across multiple database/application servers. Replicate. Replication is making synchronized copies of data available in more than one place. Avoid straining hard-to-scale resources. Some resources are inherently hard to scale: Uncacheable’ data, Data with a very high read+write rate
2 0.75447035 359 high scalability-2008-07-29-Ehcache - A Java Distributed Cache
Introduction: Ehcache is a pure Java cache with the following features: fast, simple, small foot print, minimal dependencies, provides memory and disk stores for scalability into gigabytes, scalable to hundreds of caches is a pluggable cache for Hibernate, tuned for high concurrent load on large multi-cpu servers, provides LRU, LFU and FIFO cache eviction policies, and is production tested. Ehcache is used by LinkedIn to cache member profiles. The user guide says it's possible to get at 2.5 times system speedup for persistent Object Relational Caching, a 1000 times system speedup for Web Page Caching, and a 1.6 times system speedup Web Page Fragment Caching. From the website: Introduction Ehcache is a cache library. Before getting into ehcache, it is worth stepping back and thinking about caching generally. About Caches Wiktionary defines a cache as A store of things that will be required in future, and can be retrieved rapidly . That is the nub of it. In computer science terms, a cac
3 0.75384229 633 high scalability-2009-06-19-GemFire 6.0: New innovations in data management
Introduction: GemStone has unveiled GemFire 6.0 which is the culmination of several years of development and the continuous solving of the hardest data management problems in the world. With this release GemFire touts some of the latest innovative features in data management. In this release: - GemFire introduces a resource manager to continuously monitor and protect cache instances from running out of memory, triggering rebalancing to migrate data to less loaded nodes or allow dynamic increase/decrease in the number of nodes hosting data for linear scalability without impeding ongoing operations (no contention points). - GemFire provides explicit control over when rebalancing can be triggered, on what class of data and even allows the administrator to simulate a "rebalance" operation to quantify the benefits before actually doing it. - With built in instrumentation that captures throughput and latency metrics, GemFire now enables applications to sense changing performance patterns and proactiv
4 0.74850219 351 high scalability-2008-07-16-The Mother of All Database Normalization Debates on Coding Horror
Introduction: Jeff Atwood started a barn burner of a conversation in Maybe Normalizing Isn't Normal on how to create a fast scalable tagging system. Jeff eventually asks that terrible question: which is better -- a normalized database, or a denormalized database? And all hell breaks loose. I know, it's hard to imagine database debates becoming contentious, but it does happen :-) It's lucky developers don't have temporal power or rivers of blood would flow. Here are a few of the pithier points (summarized): Normalization is not magical fairy dust you sprinkle over your database to cure all ills; it often creates as many problems as it solves. (Jeff) Normalize until it hurts, denormalize until it works. (Jeff) Use materialized views which are tables created and maintained by your RDBMS. So a materialized view will act exactly like a de-normalized table would - except you keep you original normalized structure and any change to original data will propagate to the view automat
5 0.72332811 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O
Introduction: This is a guest post by Zardosht Kasheff , Software Developer at Tokutek , a storage engine company that delivers 21st-Century capabilities to the leading open source data management platforms. As software developers, we value abstraction. The simpler the API, the more attractive it becomes. Arguably, MongoDB’s greatest strengths are its elegant API and its agility , which let developers simply code. But when MongoDB runs into scalability problems on big data , developers need to peek underneath the covers to understand the underlying issues and how to fix them. Without understanding, one may end up with an inefficient solution that costs time and money. For example, one may shard prematurely, increasing hardware and management costs, when a simpler replication setup would do. Or, one may increase the size of a replica set when upgrading to SSDs would suffice. This article shows how to reason about some big data scalability problems in an effort to find efficient solut
6 0.72306937 188 high scalability-2007-12-19-How can I learn to scale my project?
7 0.71485877 1135 high scalability-2011-10-31-15 Ways to Make Your Application Feel More Responsive under Google App Engine
8 0.7059921 1336 high scalability-2012-10-09-Batoo JPA - The new JPA Implementation that runs over 15 times faster...
9 0.70367193 673 high scalability-2009-08-07-Strategy: Break Up the Memcache Dog Pile
10 0.70189536 1337 high scalability-2012-10-10-Antirez: You Need to Think in Terms of Organizing Your Data for Fetching
11 0.6994648 1246 high scalability-2012-05-16-Big List of 20 Common Bottlenecks
12 0.69791192 910 high scalability-2010-09-30-Facebook and Site Failures Caused by Complex, Weakly Interacting, Layered Systems
13 0.69528055 189 high scalability-2007-12-21-Strategy: Limit Result Sets
14 0.6927588 697 high scalability-2009-09-09-GridwiseTech revolutionizes data management
15 0.69230127 609 high scalability-2009-05-28-Scaling PostgreSQL using CUDA
16 0.69075108 1570 high scalability-2014-01-01-Paper: Nanocubes: Nanocubes for Real-Time Exploration of Spatiotemporal Datasets
17 0.68391889 1179 high scalability-2012-01-23-Facebook Timeline: Brought to You by the Power of Denormalization
18 0.68029404 250 high scalability-2008-02-17-Web Accelerators - snake oil or miracle remedy?
19 0.67947382 1082 high scalability-2011-07-18-New Relic Architecture - Collecting 20+ Billion Metrics a Day
20 0.67892462 815 high scalability-2010-04-27-Paper: Dapper, Google's Large-Scale Distributed Systems Tracing Infrastructure
topicId topicWeight
[(1, 0.187), (2, 0.17), (10, 0.066), (11, 0.406), (61, 0.018), (94, 0.037)]
simIndex simValue blogId blogTitle
1 0.82328051 25 high scalability-2007-07-25-Paper: Designing Disaster Tolerant High Availability Clusters
Introduction: A very detailed (339 pages) paper on how to use HP products to create a highly available cluster. It's somewhat dated and obviously concentrates on HP products, but it is still good information. Table of contents: 1. Disaster Tolerance and Recovery in a Serviceguard Cluster 2. Building an Extended Distance Cluster Using ServiceGuard 3. Designing a Metropolitan Cluster 4. Designing a Continental Cluster 5. Building Disaster-Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP 6. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF 7. Cascading Failover in a Continental Cluster Evaluating the Need for Disaster Tolerance What is a Disaster Tolerant Architecture? Types of Disaster Tolerant Clusters Extended Distance Clusters Metropolitan Cluster Continental Cluster Continental Cluster With Cascading Failover Disaster Tolerant Architecture Guidelines Protecting Nodes through Geographic Dispersion Protecting Data th
same-blog 2 0.81338108 668 high scalability-2009-08-01-15 Scalability and Performance Best Practices
Introduction: These are from Laura Thomson of OmniTi : Profile early, profile often. Pick a profiling tool and learn it in and out. Dev-ops cooperation is essential. The most critical difference in organizations that handles crises well. Test on production data. Code behavior (especially performance) is often data driven. Track and trend. Understanding your historical performance characteristics is essential for spotting emerging problems. Assumptions will burn you. Systems are complex and often break in unexpected ways. Decouple. Isolate performance failures. Cache. Caching is the core of most optimizations. Federate. Data federation is taking a single data set and spreading it across multiple database/application servers. Replicate. Replication is making synchronized copies of data available in more than one place. Avoid straining hard-to-scale resources. Some resources are inherently hard to scale: Uncacheable’ data, Data with a very high read+write rate
Introduction: Sun FireTM X4540 Server as Backup Server for Zmanda's Amanda Enterprise 2.6 Software by Thomas Hanvey (Sun Microsystems) and Dmitri Joukovski and Ken Crandall (Zmanda) September, 2008 Explosive data growth, combined with demanding requirements for data availability, has placed a tremendous burden on IT operations staff at businesses of all sizes. Yet, many organizations do not have the staff or budget to purchase and manage complex and expensive backup and recovery software products. The Sun FireTM X4540 server can deliver massive storage capacity and remarkable throughput so it is well-suited as a nearline storage platform for backup and restore applications. Combining the power of the SolarisTM 10 Operating System with the data integrity and simplified administration of ZFS, the Sun Fire X4540 server can be an ideal candidate for streamlining and improving backup and restore operations. Amanda Enterprise Edition from Zmanda was designed to address these challenges,
4 0.71034294 134 high scalability-2007-10-26-Paper: Wikipedia's Site Internals, Configuration, Code Examples and Management Issues
Introduction: Wikipedia and Wikimedia have some of the best, most complete real-world documentation on how to build highly scalable systems. This paper by Domas Mituzas covers a lot of details about how Wikipedia works, including: an overview of the different packages used (Linux, PowerDNS, LVS, Squid, lighttpd, Apache, PHP5, Lucene, Mono, Memcached), how they use their CDN, how caching works, how they profile their code, how they store their media, how they structure their database access, how they handle search, how they handle load balancing and administration. All with real code examples and examples of configuration files. This is a really useful resource. Related Articles Wikimedia Architecture Domas Mituzas' Blog
Introduction: Snooze is an open-source, scalable, autonomic, and energy-efficient virtual machine (VM) management framework for private clouds. Similarly to other VM management frameworks such as Nimbus, OpenNebula, Eucalyptus, and OpenStack it allows to build compute infrastructures from virtualized resources. Particularly, once installed and configured users can submit and control the life-cycle of a large number of VMs. However, contrary to existing frameworks for scalability and fault tolerance, Snooze employs a self-organizing and healing (based on Apache ZooKeeper) hierarchical architecture. Moreover, it performs distributed VM management and is designed to be energy efficient. Therefore, it implements features to monitor and estimate VM resource (CPU, memory, network Rx, network Tx) demands, detect and resolve overload/underload situations, perform dynamic VM consolidation through live migration, and finally power management to save energy. Last but not least, it integrates a g
6 0.63263863 908 high scalability-2010-09-28-6 Strategies for Scaling BBC iPlayer
7 0.62628937 771 high scalability-2010-02-04-Hot Scalability Links for February 4, 2010
8 0.62361544 1236 high scalability-2012-04-30-Masstree - Much Faster than MongoDB, VoltDB, Redis, and Competitive with Memcached
9 0.59616512 105 high scalability-2007-10-01-Statistics Logging Scalability
10 0.59001088 1055 high scalability-2011-06-08-Stuff to Watch from Google IO 2011
11 0.58656609 136 high scalability-2007-10-28-Scaling Early Stage Startups
12 0.58440953 824 high scalability-2010-05-06-Going global on EC2
13 0.5695911 699 high scalability-2009-09-10-How to handle so many socket connection
14 0.54645228 442 high scalability-2008-11-13-Plenty of Fish Says Scaling for Free Doesn't Pay
15 0.53098226 292 high scalability-2008-03-30-Scaling Out MySQL
16 0.53070861 5 high scalability-2007-07-10-mixi.jp Architecture
17 0.52331829 310 high scalability-2008-04-29-High performance file server
18 0.52237278 1517 high scalability-2013-09-16-The Hidden DNS Tax - Cascading Timeouts and Errors
19 0.52199066 876 high scalability-2010-08-10-Sponsored Post: Okta, EzRez, VoltDB, Digg, Cloud Sigma, Applications Manager, Site24x7
20 0.52191162 866 high scalability-2010-07-27-Sponsored Post: Okta, EzRez, VoltDB, Digg, Cloud Sigma, Applications Manager, Site24x7