high_scalability high_scalability-2007 high_scalability-2007-188 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: This is a question asked on the ycombinator list and there are some good responses. I gave a quick response, but I particularly like neilk's knock out of the park insightful answer: Read Cal Henderson's book. (I'd add in Theo's book and Release It! too) The center of your design should be the data store, not a process. You transition the data store from state to state, securely and reliably, in small increments. Avoid globals and session state. The more "pure" your function is, the easier it will be to cache or partition. Don't make your data store too smart. Calculations and renderings should happen in a separate, asynchronous process. The data store should be able to handle lots of concurrent connections. Minimize locking. (Read about optimistic locking). Protect your algorithm from the implementation of the data store, with a helper class or module or whatever. But don't (DO NOT) try to build a framework for any conceivable query. Just the ones your algorithm needs. V
sentIndex sentText sentNum sentScore
1 This is a question asked on the ycombinator list and there are some good responses. [sent-1, score-0.082]
2 I gave a quick response, but I particularly like neilk's knock out of the park insightful answer: Read Cal Henderson's book. [sent-2, score-0.526]
3 too) The center of your design should be the data store, not a process. [sent-4, score-0.184]
4 You transition the data store from state to state, securely and reliably, in small increments. [sent-5, score-0.65]
5 The data store should be able to handle lots of concurrent connections. [sent-10, score-0.29]
6 Protect your algorithm from the implementation of the data store, with a helper class or module or whatever. [sent-13, score-0.493]
7 But don't (DO NOT) try to build a framework for any conceivable query. [sent-14, score-0.158]
8 Viewing an application as a series of state transitions instead of a blizzard of actions and events is a way under appreciated design perspective. [sent-16, score-0.614]
9 This is one of they key design approaches for making robust embedded systems. [sent-17, score-0.347]
10 A great paper talking about this sort of stuff is Mission Planning and Execution Within the Mission Data System - an effort to make engineering flight software more straightforward and less prone to error through the explicit modeling of spacecraft state. [sent-18, score-0.667]
11 Another interesting paper is CLEaR: Closed Loop Execution and Recovery High-Level Onboard Autonomy for Rover Operations . [sent-19, score-0.109]
wordName wordTfidf (topN-words)
[('neilk', 0.389), ('store', 0.211), ('globals', 0.176), ('helper', 0.166), ('henderson', 0.158), ('cal', 0.158), ('conceivable', 0.158), ('onboard', 0.158), ('rover', 0.152), ('state', 0.148), ('algorithm', 0.147), ('execution', 0.146), ('glad', 0.143), ('park', 0.143), ('flight', 0.137), ('transitions', 0.137), ('autonomy', 0.134), ('appreciated', 0.129), ('securely', 0.122), ('prone', 0.12), ('knock', 0.116), ('theo', 0.111), ('insightful', 0.111), ('reliably', 0.11), ('explicit', 0.11), ('paper', 0.109), ('design', 0.105), ('closed', 0.104), ('module', 0.101), ('straightforward', 0.1), ('optimistic', 0.098), ('loop', 0.098), ('calculations', 0.096), ('actions', 0.095), ('mission', 0.092), ('modeling', 0.091), ('transition', 0.09), ('embedded', 0.088), ('pure', 0.088), ('locking', 0.088), ('brought', 0.085), ('asked', 0.082), ('book', 0.082), ('particularly', 0.08), ('data', 0.079), ('robust', 0.078), ('recovery', 0.077), ('ones', 0.077), ('gave', 0.076), ('approaches', 0.076)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 188 high scalability-2007-12-19-How can I learn to scale my project?
Introduction: This is a question asked on the ycombinator list and there are some good responses. I gave a quick response, but I particularly like neilk's knock out of the park insightful answer: Read Cal Henderson's book. (I'd add in Theo's book and Release It! too) The center of your design should be the data store, not a process. You transition the data store from state to state, securely and reliably, in small increments. Avoid globals and session state. The more "pure" your function is, the easier it will be to cache or partition. Don't make your data store too smart. Calculations and renderings should happen in a separate, asynchronous process. The data store should be able to handle lots of concurrent connections. Minimize locking. (Read about optimistic locking). Protect your algorithm from the implementation of the data store, with a helper class or module or whatever. But don't (DO NOT) try to build a framework for any conceivable query. Just the ones your algorithm needs. V
2 0.15283127 282 high scalability-2008-03-18-Database War Stories #3: Flickr
Introduction: [Tim O'Reilly] Continuing my series of queries about how "Web 2.0" companies used databases, I asked Cal Henderson of Flickr to tell me "how the folksonomy model intersects with the traditional database. How do you manage a tag cloud?"
3 0.1017306 1302 high scalability-2012-08-10-Stuff The Internet Says On Scalability For August 10, 2012
Introduction: It's HighScalability Time: TNW : On an average day, out of 30 trillion URLs on the web, Google crawls 20B web pages and now serves 100B searches every month. Quotable Quotes: @tapbot_paul : The 2 computers on the Curiosity rover are RAD750 based, they are approximately 1/10th the speed of an iPhone 4s and “only” cost $200k each. @merv : #cassandra12 Why @adrianco loves what he's doing: "You are no longer IO-bound, you’re CPU bound, like you’re supposed to be." @maxtaco : Garbage collection solves a minuscule %age of bugs, that are non-critical (memleaks? big deal!) and easy to find and fix. At a HUGE expense. @merv : #cassandra12 @eddie_satterly describing $1M savings in first year migrating from MS SQL Server with SAN to Cassandra solution - w more data. @mattbrauchler : A slow node is worse than a down node #cassandra12 @practicingEA : "The math of predictive analytics has been around for years, its the computers t
Introduction: Update 2: Velocity 09: John Allspaw, 10+ Deploys Per Day: Dev and Ops Cooperation at Flickr . Insightful talk. Some highlights: Change is good if you can build tools and culture to lower the risk of change. Operations and developers need to become of one mind and respect each other. An automated infrastructure is the one tool you need most. Common source control. One step build. One step deploy. Don't be a pussy, deploy. Always ship trunk. Feature flags - don't branch code, make features runtime configurable in code. Dark launch - release data paths early without UI component. Shared metrics. Adaptive feedback to prioritize important features. IRC for communication for human context. Best solutions occur when dev and op work together and trust each other. Trust is earned by helping each other solve their problems. Look at what new features imply for operations, what can go wrong, and how to recover. Provide knobs and levers to help operations. Devs should have access to production
5 0.098795161 527 high scalability-2009-03-06-Cloud Programming Directly Feeds Cost Allocation Back into Software Design
Introduction: Update 6 : CARS = Cost Aware Runtimes and Services by William Louth. Update 5 : Damn You Google, Damn You Yahoo! Why D'Ya Do This to Us? Free accounts on a cloud platform are a constant drain of money. Update 4: Caching becomes even more important in CPU based billing environments . Avoiding the CPU means saving money. Update 3: An interesting simple example of this idea showed up on the Google AppEngine list. With one paging algorithm and one use of AJAX the yearly cost of the site was $1000. By changing those algorithms the site went under quota and became free again. This will make life a lot more interesting for developers. Update 2: Business Model Influencing Software Architecture by Brandon Watson. The profitability of your project could disappear overnight on account of code behaving badly . Update: Amazon adds Elastic Block Store at $0.10 per 1 million I/O requests. Now I need some cost minimization storage algorithms! In the GAE Meetup yesterday a very in
6 0.097900189 589 high scalability-2009-05-05-Drop ACID and Think About Data
7 0.091798216 507 high scalability-2009-02-03-Paper: Optimistic Replication
8 0.091420174 1305 high scalability-2012-08-16-Paper: A Provably Correct Scalable Concurrent Skip List
9 0.083646387 705 high scalability-2009-09-16-Paper: A practical scalable distributed B-tree
10 0.080394983 978 high scalability-2011-01-26-Google Pro Tip: Use Back-of-the-envelope-calculations to Choose the Best Design
11 0.079709336 357 high scalability-2008-07-26-Google's Paxos Made Live – An Engineering Perspective
12 0.079494469 240 high scalability-2008-02-05-Handling of Session for a site running from more than 1 data center
13 0.079098187 97 high scalability-2007-09-18-Session management in highly scalable web sites
14 0.079082504 1509 high scalability-2013-08-30-Stuff The Internet Says On Scalability For August 30, 2013
15 0.07868427 1425 high scalability-2013-03-18-Beyond Threads and Callbacks - Application Architecture Pros and Cons
16 0.072864868 1136 high scalability-2011-11-03-Paper: G2 : A Graph Processing System for Diagnosing Distributed Systems
17 0.072500281 132 high scalability-2007-10-25-Who can answer or analyze the image store and visit solution about alibaba.com?Thanks
18 0.072023436 1596 high scalability-2014-02-14-Stuff The Internet Says On Scalability For February 14th, 2014
19 0.071725957 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
20 0.07044407 1565 high scalability-2013-12-16-22 Recommendations for Building Effective High Traffic Web Software
topicId topicWeight
[(0, 0.119), (1, 0.081), (2, -0.006), (3, 0.034), (4, 0.018), (5, 0.052), (6, 0.01), (7, 0.002), (8, -0.028), (9, -0.002), (10, -0.008), (11, 0.023), (12, -0.042), (13, 0.004), (14, 0.016), (15, -0.02), (16, 0.044), (17, -0.016), (18, 0.055), (19, -0.021), (20, 0.002), (21, -0.017), (22, 0.005), (23, 0.016), (24, -0.047), (25, -0.02), (26, -0.0), (27, 0.035), (28, 0.015), (29, 0.011), (30, -0.005), (31, 0.038), (32, 0.025), (33, 0.027), (34, -0.038), (35, -0.017), (36, 0.023), (37, -0.009), (38, 0.026), (39, 0.005), (40, -0.046), (41, 0.031), (42, 0.022), (43, 0.032), (44, 0.059), (45, 0.036), (46, -0.042), (47, -0.016), (48, -0.043), (49, -0.037)]
simIndex simValue blogId blogTitle
same-blog 1 0.92918158 188 high scalability-2007-12-19-How can I learn to scale my project?
Introduction: This is a question asked on the ycombinator list and there are some good responses. I gave a quick response, but I particularly like neilk's knock out of the park insightful answer: Read Cal Henderson's book. (I'd add in Theo's book and Release It! too) The center of your design should be the data store, not a process. You transition the data store from state to state, securely and reliably, in small increments. Avoid globals and session state. The more "pure" your function is, the easier it will be to cache or partition. Don't make your data store too smart. Calculations and renderings should happen in a separate, asynchronous process. The data store should be able to handle lots of concurrent connections. Minimize locking. (Read about optimistic locking). Protect your algorithm from the implementation of the data store, with a helper class or module or whatever. But don't (DO NOT) try to build a framework for any conceivable query. Just the ones your algorithm needs. V
2 0.76818746 1305 high scalability-2012-08-16-Paper: A Provably Correct Scalable Concurrent Skip List
Introduction: In MemSQL Architecture we learned one of the core strategies MemSQL uses to achieve their need for speed is lock-free skip lists. Skip lists are used to efficiently handle range queries. Making the skip-lists lock-free helps eliminate contention and make writes fast. If this all sounds a little pie-in-the-sky then here's a very good paper on the subject that might help make it clearer: A Provably Correct Scalable Concurrent Skip List . From the abstract: We propose a new concurrent skip list algorithm distinguished by a combination of simplicity and scalability. The algorithm employs optimistic synchronization, searching without acquiring locks, followed by short lock-based validation before adding or removing nodes. It also logically removes an item before physically unlinking it. Unlike some other concurrent skip list algorithms, this algorithm preserves the skiplist properties at all times, which facilitates reasoning about its correctness. Experimental evidence shows that
3 0.74064034 507 high scalability-2009-02-03-Paper: Optimistic Replication
Introduction: To scale in the large you have to partition. Data has to be spread around, replicated, and kept consistent (keeping replicas sufficiently similar to one another despite operations being submitted independently at different sites). The result is a highly available, well performing, and scalable system. Partitioning is required, but it's a pain to do efficiently and correctly. Until Quantum teleportation becomes a reality how data is kept consistent across a bewildering number of failure scenarios is a key design decision. This excellent paper by Yasushi Saito and Marc Shapiro takes us on a wild ride (OK, maybe not so wild) of different approaches to achieving consistency. What's cool about this paper is they go over some real systems that we are familiar with and cover how they work: DNS (single-master, state-transfer), Usenet (multi-master), PDAs (multi-master, state-transfer, manual or application-specific conflict resolution), Bayou (multi-master, operation-transfer, epidemic
4 0.73150557 963 high scalability-2010-12-23-Paper: CRDTs: Consistency without concurrency control
Introduction: For a great Christmas read forget The Night Before Christmas , a heart warming poem written by Clement Moore for his children, that created the modern idea of Santa Clause we all know and anticipate each Christmas eve. Instead, curl up with a some potent eggnog , nog being any drink made with rum, and read CRDTs: Consistency without concurrency control  by Mihai Letia, Nuno Preguiça, and Marc Shapiro, which talks about CRDTs (Commutative Replicated Data Type), a data type whose operations commute when they are concurrent . From the introduction, which also serves as a nice concise overview of distributed consistency issues: Shared read-only data is easy to scale by using well-understood replication techniques. However, sharing mutable data at a large scale is a difficult problem, because of the CAP impossibility result [5]. Two approaches dominate in practice. One ensures scalability by giving up consistency guarantees, for instance using the Last-Writer-Wins (LWW) approach [
5 0.6977911 357 high scalability-2008-07-26-Google's Paxos Made Live – An Engineering Perspective
Introduction: This is an unusually well written and useful paper . It talks in detail about experiences implementing a complex project, something we don't see very often. They shockingly even admit that creating a working implementation of Paxos was more difficult than just translating the pseudo code. Imagine that, programmers aren't merely typists! I particularly like the explanation of the Paxos algorithm and why anyone would care about it, working with disk corruption, using leases to support simultaneous reads, using epoch numbers to indicate a new master election, using snapshots to prevent unbounded logs, using MultiOp to implement database transactions, how they tested the system, and their openness with the various problems they had. A lot to learn here. From the paper: We describe our experience building a fault-tolerant data-base using the Paxos consensus algorithm. Despite the existing literature in the field, building such a database proved to be non-trivial. We describe selected alg
6 0.68881285 1431 high scalability-2013-03-29-Stuff The Internet Says On Scalability For March 29, 2013
7 0.68765175 1611 high scalability-2014-03-12-Paper: Scalable Eventually Consistent Counters over Unreliable Networks
8 0.67977804 958 high scalability-2010-12-16-7 Design Patterns for Almost-infinite Scalability
10 0.67089832 1509 high scalability-2013-08-30-Stuff The Internet Says On Scalability For August 30, 2013
11 0.66809511 1567 high scalability-2013-12-20-Stuff The Internet Says On Scalability For December 20th, 2013
12 0.66763943 850 high scalability-2010-06-30-Paper: GraphLab: A New Framework For Parallel Machine Learning
13 0.66136891 1337 high scalability-2012-10-10-Antirez: You Need to Think in Terms of Organizing Your Data for Fetching
14 0.66108072 733 high scalability-2009-10-29-Paper: No Relation: The Mixed Blessings of Non-Relational Databases
15 0.66062105 1222 high scalability-2012-04-05-Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory
16 0.65303051 1062 high scalability-2011-06-15-101 Questions to Ask When Considering a NoSQL Database
17 0.65114844 1017 high scalability-2011-04-06-Netflix: Run Consistency Checkers All the time to Fixup Transactions
18 0.65048355 351 high scalability-2008-07-16-The Mother of All Database Normalization Debates on Coding Horror
19 0.65038812 844 high scalability-2010-06-18-Paper: The Declarative Imperative: Experiences and Conjectures in Distributed Logic
20 0.64675057 1239 high scalability-2012-05-04-Stuff The Internet Says On Scalability For May 4, 2012
topicId topicWeight
[(1, 0.083), (2, 0.253), (4, 0.034), (10, 0.01), (30, 0.022), (61, 0.05), (79, 0.158), (90, 0.219), (94, 0.077)]
simIndex simValue blogId blogTitle
same-blog 1 0.90015936 188 high scalability-2007-12-19-How can I learn to scale my project?
Introduction: This is a question asked on the ycombinator list and there are some good responses. I gave a quick response, but I particularly like neilk's knock out of the park insightful answer: Read Cal Henderson's book. (I'd add in Theo's book and Release It! too) The center of your design should be the data store, not a process. You transition the data store from state to state, securely and reliably, in small increments. Avoid globals and session state. The more "pure" your function is, the easier it will be to cache or partition. Don't make your data store too smart. Calculations and renderings should happen in a separate, asynchronous process. The data store should be able to handle lots of concurrent connections. Minimize locking. (Read about optimistic locking). Protect your algorithm from the implementation of the data store, with a helper class or module or whatever. But don't (DO NOT) try to build a framework for any conceivable query. Just the ones your algorithm needs. V
2 0.87473053 1546 high scalability-2013-11-11-Ask HS: What is a good OLAP database choice with node.js?
Introduction: This question was asked over email and I thought a larger audience might want to take a whack at it. With a business associate, I am trying to develop a financial software that handles financial reports of listed companies. We managed to create this database with all the data necessary to do financial analysis. My associate is a Business Intelligence specialist so he is keen to use OLAPs databases like Microsoft Analysis Services or Jedox Palo, which enables in-memory calculations and very fast aggregation, slicing and dicing of data or write-backs. At the same time I did an online course (MOOC) from Stanford CS184 called Startup Engineering which promoted/talked a lot about javascript and especially node.js as the language of the future for servers. As I am keen to use open-source technologies (would be keen to avoid MS SSAS) for the development of a website to access this financial data , and there are so many choices for databases out there (Postgre, MongoDB, MySQL etc..but d
3 0.85587198 344 high scalability-2008-06-09-FaceStat's Rousing Tale of Scaling Woe and Wisdom Won
Introduction: Lukas Biewald shares a fascinating slam by slam recount of how his FaceStat (upload your picture and be judged by the masses) site was battered by a link on Yahoo's main page that caused an almost instantaneous 650,000 page view jump on their site. Yahoo spends considerable effort making sure its own properties can handle the truly massive flow from the main page. Turning the Great Eye of the Internet towards an unsuspecting newborn site must be quite the diaper ready experience. Theo Schlossnagle eerily prophesized about such events in The Implications of Punctuated Scalabilium for Website Architecture : massive, unexpected and sudden traffic spikes will become more common as a fickle internet seeks ever for new entertainments (my summary). Exactly FaceStat's situation. This is also one of our first exposures to an application written on Merb, a popular Ruby on Rails competitor. For those who think Ruby is the problem, their architecture now serves 100 times the original load
4 0.85296977 1404 high scalability-2013-02-11-At Scale Even Little Wins Pay Off Big - Google and Facebook Examples
Introduction: There's a popular line of thought that says don't waste time on optimization because developing features is more important than saving money. True, you can always add resources, but at some point, especially in a more mature part of a product lifecycle: performance equals $$$. Two great examples of this evolution come from Facebook and Google. The upshot is that when you spend time and money on optimizing your tool chain you can get huge wins in performance, control, and costs. Certainly, don’t bother if you are just starting, but at some point you may want to switch to big development efforts in improving efficiency. Facebook and HipHop The Facebook example is quite well known: HipHop , a static PHP compiler released in 2010, after two years of development. PHP because Facebook implements their web tier in PHP . They've now developed a dynamic compiler, HipHop VM , using techniques like JIT, side exits, HipHop bytecode, type prediction, and parallel tracelet l
5 0.84033847 673 high scalability-2009-08-07-Strategy: Break Up the Memcache Dog Pile
Introduction: Update: Asynchronous HTTP cache validations . A proposed HTTP caching extension: if your application can afford to show slightly out of date content, then stale-while-revalidate can guarantee that the user will always be served directly from the cache, hence guaranteeing a consistent response-time user-experience. Caching is like aspirin for headaches. Head hurts: pop a 'sprin. Slow site: add caching. Facebook must have a lot of headaches because they popped 805 memcached servers between 10,000 web servers and 1,800 MySQL servers and they reportedly have a 99% cache hit rate. But what's the best way for you to cache for your application? It's a remarkably complex and rich topic. Alexey Kovyrin talks about one common caching problem called the Dog Pile Effect in Dog-pile Effect and How to Avoid it with Ruby on Rails . Glenn Franxman also has a Django solution in MintCache . Data is usually cached because it's too expensive to calculate for every hit. Maybe it's a gnarly S
6 0.82040912 1374 high scalability-2012-12-18-Georeplication: When Bad Things Happen to Good Systems
7 0.81226963 1380 high scalability-2013-01-02-Why Pinterest Uses the Cloud Instead of Going Solo - To Be Or Not To Be
8 0.81096417 1204 high scalability-2012-03-06-Ask For Forgiveness Programming - Or How We'll Program 1000 Cores
9 0.80804908 935 high scalability-2010-11-05-Hot Scalability Links For November 5th, 2010
10 0.80707043 17 high scalability-2007-07-16-Paper: Guide to Cost-effective Database Scale-Out using MySQL
11 0.80489552 1589 high scalability-2014-02-03-How Google Backs Up the Internet Along With Exabytes of Other Data
12 0.80390286 409 high scalability-2008-10-13-Challenges from large scale computing at Google
13 0.80230713 1143 high scalability-2011-11-16-Google+ Infrastructure Update - the JavaScript Story
14 0.80229175 978 high scalability-2011-01-26-Google Pro Tip: Use Back-of-the-envelope-calculations to Choose the Best Design
15 0.80196708 119 high scalability-2007-10-10-WAN Accelerate Your Way to Lightening Fast Transfers Between Data Centers
16 0.80151409 1313 high scalability-2012-08-28-Making Hadoop Run Faster
17 0.80146539 601 high scalability-2009-05-17-Product: Hadoop
18 0.80072606 76 high scalability-2007-08-29-Skype Failed the Boot Scalability Test: Is P2P fundamentally flawed?
19 0.79948813 1436 high scalability-2013-04-05-Stuff The Internet Says On Scalability For April 5, 2013
20 0.79927945 1561 high scalability-2013-12-09-Site Moves from PHP to Facebook's HipHop, Now Pages Load in .6 Seconds Instead of Five