high_scalability high_scalability-2009 high_scalability-2009-689 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Solve only 80% of a problem. That's usually good enough and you'll not only get done faster, you'll actually have a chance of getting done at all. This strategy is given by Amix in HOW TWITTER (AND FACEBOOK) SOLVE PROBLEMS PARTIALLY . The idea is solving 100% of a complex problem can be so hard and so expensive that you'll end up wasting all your bullets on a problem that could have been satisfactoraly solved in a much simpler way. The example given is for Twitter's real-time search. Real-time search almost by definition is focussed on recent events. So in the design should you be able to search historically back from the beginning of time or should you just be able to search for recent time periods? A complete historical search is the 100% solution. The recent data only search is the 80% solution. Which should you choose? The 100% solution is dramatically more difficult to solve. It requires searching disk in real-time which is a killer. So it makes more sense to work on the
sentIndex sentText sentNum sentScore
1 The idea is solving 100% of a complex problem can be so hard and so expensive that you'll end up wasting all your bullets on a problem that could have been satisfactoraly solved in a much simpler way. [sent-4, score-0.419]
2 Real-time search almost by definition is focussed on recent events. [sent-6, score-0.396]
3 So in the design should you be able to search historically back from the beginning of time or should you just be able to search for recent time periods? [sent-7, score-0.631]
4 By reducing the amount of data you need to search it's possible to make some simplifying design choices, like using fixed sized buffers that reside completely in memory. [sent-14, score-0.338]
5 Sometimes as programmers we are blinded by the glory of the challenge of solving the 100% solution when there's a more reasonable, rational alternative that's almost as good. [sent-18, score-0.576]
6 The lesson to be learned from this is that it is often undesirable to go for the right thing first. [sent-24, score-0.33]
7 It is better to get half of the right thing available so that it spreads like a virus. [sent-25, score-0.31]
8 Once people are hooked on it, take the time to improve it to 90% of the right thing. [sent-26, score-0.208]
9 Unix, C, C++, Twitter and almost every product that has experienced wide adoption has followed this philosophy. [sent-27, score-0.211]
10 Worse-is-Better solutions have the following characteristics: Simplicity - The design must be simple, both in implementation and interface. [sent-28, score-0.395]
11 It is more important for the implementation to be simpler than the interface. [sent-29, score-0.285]
12 Correctness - The design must be correct in all observable aspects. [sent-31, score-0.388]
13 Consistency - The design must not be overly inconsistent. [sent-33, score-0.379]
14 Consistency can be sacrificed for simplicity in some cases, but it is better to drop those parts of the design that deal with less common circumstances than to introduce either implementational complexity or inconsistency. [sent-34, score-0.757]
15 Completeness - The design must cover as many important situations as is practical. [sent-35, score-0.369]
16 Completeness can be sacrificed in favor of any other quality. [sent-37, score-0.365]
17 In fact, completeness must be sacrificed whenever implementation simplicity is jeopardized. [sent-38, score-1.099]
18 Consistency can be sacrificed to achieve completeness if simplicity is retained; especially worthless is consistency of interface. [sent-39, score-1.083]
19 In my gut I think Worse-is-Better is different than "Solve Only 80 Percent of the Problem" primarily because Worse-is-Better is more about product adoption curves and 80% is more a design heuristic. [sent-40, score-0.439]
20 After some cogitating this seems a false distinction so I have to concluded I'm wrong and have added Worse-is-Better to this post. [sent-41, score-0.167]
wordName wordTfidf (topN-words)
[('sacrificed', 0.365), ('completeness', 0.285), ('simplicity', 0.223), ('search', 0.169), ('design', 0.169), ('news', 0.144), ('twitter', 0.131), ('must', 0.124), ('recent', 0.124), ('thing', 0.118), ('right', 0.117), ('adoption', 0.108), ('simpler', 0.107), ('blinded', 0.106), ('glory', 0.106), ('worthless', 0.106), ('consistency', 0.104), ('almost', 0.103), ('implementation', 0.102), ('breakthe', 0.099), ('strategy', 0.097), ('solve', 0.096), ('retained', 0.095), ('observable', 0.095), ('undesirable', 0.095), ('solution', 0.093), ('blisteringly', 0.091), ('concluded', 0.091), ('hooked', 0.091), ('solving', 0.089), ('gabriel', 0.088), ('overly', 0.086), ('inhow', 0.086), ('hacker', 0.085), ('systemsby', 0.084), ('joseph', 0.084), ('praise', 0.084), ('gut', 0.082), ('clay', 0.082), ('curves', 0.08), ('rational', 0.079), ('distinction', 0.076), ('important', 0.076), ('spreads', 0.075), ('reasonably', 0.075), ('characteristic', 0.075), ('problem', 0.075), ('cases', 0.074), ('consideration', 0.074), ('wasting', 0.073)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 689 high scalability-2009-08-28-Strategy: Solve Only 80 Percent of the Problem
Introduction: Solve only 80% of a problem. That's usually good enough and you'll not only get done faster, you'll actually have a chance of getting done at all. This strategy is given by Amix in HOW TWITTER (AND FACEBOOK) SOLVE PROBLEMS PARTIALLY . The idea is solving 100% of a complex problem can be so hard and so expensive that you'll end up wasting all your bullets on a problem that could have been satisfactoraly solved in a much simpler way. The example given is for Twitter's real-time search. Real-time search almost by definition is focussed on recent events. So in the design should you be able to search historically back from the beginning of time or should you just be able to search for recent time periods? A complete historical search is the 100% solution. The recent data only search is the 80% solution. Which should you choose? The 100% solution is dramatically more difficult to solve. It requires searching disk in real-time which is a killer. So it makes more sense to work on the
2 0.12341341 837 high scalability-2010-06-07-Six Ways Twitter May Reach its Big Hairy Audacious Goal of One Billion Users
Introduction: Twitter has a big hairy audacious goal of reaching one billion users by 2013. Three forces stand against Twitter. The world will end in 2012 . But let's be optimistic and assume we'll make it. Next is Facebook. Currently Facebook is the user leader with over 400 million users . Will Facebook stumble or will they rocket to one billion users before Twitter? And lastly, there's Twitter's "low" starting point and "slow" growth rate. Twitter currently has 106 million registered users and adds about 300,000 new users a day. That doesn't add up to a billion in three years. Twitter needs to triple the number of registered users they add per day. How will Twitter reach its goal of over one billion users served? From recent infrastructure announcements and information gleaned at Chirp ( videos ) and other talks, it has become a little clearer how they hope to reach their billion user goal: 1) Make a Big Hairy Audacious Goal 2) Hire Lots of Quality People 3) Hug Developers and Users 4) D
3 0.11614029 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
Introduction: It's a truism that we should choose the right tool for the job . Everyone says that. And who can disagree? The problem is this is not helpful advice without being able to answer more specific questions like: What jobs are the tools good at? Will they work on jobs like mine? Is it worth the risk to try something new when all my people know something else and we have a deadline to meet? How can I make all the tools work together? In the NoSQL space this kind of real-world data is still a bit vague. When asked, vendors tend to give very general answers like NoSQL is good for BigData or key-value access. What does that mean for for the developer in the trenches faced with the task of solving a specific problem and there are a dozen confusing choices and no obvious winner? Not a lot. It's often hard to take that next step and imagine how their specific problems could be solved in a way that's worth taking the trouble and risk. Let's change that. What problems are you using NoSQL to sol
4 0.1136869 1395 high scalability-2013-01-28-DuckDuckGo Architecture - 1 Million Deep Searches a Day and Growing
Introduction: This is an interview with Gabriel Weinberg , founder of Duck Duck Go and general all around startup guru , on what DDG’s architecture looks like in 2012. Innovative search engine upstart DuckDuckGo had 30 million searches in February 2012 and averages over 1 million searches a day. It’s being positioned by super investor Fred Wilson as a clean, private, impartial and fast search engine. After talking with Gabriel I like what Fred Wilson said earlier, it seems closer to the heart of the matter: We invested in DuckDuckGo for the Reddit, Hacker News anarchists . Choosing DuckDuckGo can be thought of as not just a technical choice, but a vote for revolution. In an age when knowing your essence is not about about love or friendship, but about more effectively selling you to advertisers, DDG is positioning themselves as the do not track alternative , keepers of the privacy flame . You will still be monetized of course, but in a more civilized and an
5 0.10933993 930 high scalability-2010-10-28-NoSQL Took Away the Relational Model and Gave Nothing Back
Introduction: Update : Benjamin Black said he was the source of the quote and also said I was wrong about what he meant. His real point: The meaning of the statement was that NoSQL systems (really the various map-reduce systems) are lacking a standard model for describing and querying and that developing one should be a high priority task for them. At the A NoSQL Evening in Palo Alto , an audience member, sorry, I couldn't tell who, said something I found really interesting: NoSQL took away the relational model and gave nothing back. The idea being that NoSQL has focussed on ease of use, scalability, performance, etc, but it has lost the idea of how data relates to other data. True to its name, the relational model is very good at capturing a managing relationships. With NoSQL all relationships have been pushed back onto the poor programmer to implement in code rather than the database managing it. We've sacrificed usability. NoSQL is about concurrency, latency, and scalability, but it
6 0.1091129 332 high scalability-2008-05-28-Job queue and search engine
7 0.10660512 398 high scalability-2008-09-30-Scalability Worst Practices
8 0.10369477 269 high scalability-2008-03-08-Audiogalaxy.com Architecture
9 0.10250484 639 high scalability-2009-06-27-Scaling Twitter: Making Twitter 10000 Percent Faster
10 0.096434705 780 high scalability-2010-02-19-Twitter’s Plan to Analyze 100 Billion Tweets
11 0.093974203 840 high scalability-2010-06-10-The Four Meta Secrets of Scaling at Facebook
12 0.0868086 746 high scalability-2009-11-26-Kngine Snippet Search New Indexing Technology
13 0.084789276 855 high scalability-2010-07-11-So, Why is Twitter Really Not Using Cassandra to Store Tweets?
15 0.084273838 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud
16 0.083316267 96 high scalability-2007-09-18-Amazon Architecture
17 0.081848852 1017 high scalability-2011-04-06-Netflix: Run Consistency Checkers All the time to Fixup Transactions
18 0.081458345 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O
19 0.081073746 931 high scalability-2010-10-28-Notes from A NOSQL Evening in Palo Alto
20 0.080572411 721 high scalability-2009-10-13-Why are Facebook, Digg, and Twitter so hard to scale?
topicId topicWeight
[(0, 0.157), (1, 0.093), (2, -0.016), (3, 0.047), (4, 0.039), (5, -0.008), (6, -0.06), (7, 0.046), (8, 0.008), (9, -0.031), (10, -0.003), (11, 0.058), (12, -0.05), (13, 0.005), (14, 0.022), (15, 0.001), (16, 0.044), (17, -0.031), (18, 0.011), (19, 0.004), (20, 0.044), (21, -0.053), (22, 0.048), (23, 0.034), (24, -0.044), (25, -0.028), (26, -0.002), (27, -0.016), (28, -0.009), (29, 0.08), (30, -0.027), (31, 0.023), (32, -0.057), (33, 0.024), (34, -0.012), (35, 0.033), (36, 0.007), (37, 0.031), (38, -0.074), (39, -0.025), (40, 0.086), (41, 0.015), (42, -0.044), (43, 0.032), (44, 0.013), (45, 0.006), (46, 0.001), (47, 0.017), (48, 0.029), (49, -0.059)]
simIndex simValue blogId blogTitle
same-blog 1 0.9868558 689 high scalability-2009-08-28-Strategy: Solve Only 80 Percent of the Problem
Introduction: Solve only 80% of a problem. That's usually good enough and you'll not only get done faster, you'll actually have a chance of getting done at all. This strategy is given by Amix in HOW TWITTER (AND FACEBOOK) SOLVE PROBLEMS PARTIALLY . The idea is solving 100% of a complex problem can be so hard and so expensive that you'll end up wasting all your bullets on a problem that could have been satisfactoraly solved in a much simpler way. The example given is for Twitter's real-time search. Real-time search almost by definition is focussed on recent events. So in the design should you be able to search historically back from the beginning of time or should you just be able to search for recent time periods? A complete historical search is the 100% solution. The recent data only search is the 80% solution. Which should you choose? The 100% solution is dramatically more difficult to solve. It requires searching disk in real-time which is a killer. So it makes more sense to work on the
2 0.7507894 1601 high scalability-2014-02-25-Peter Norvig's 9 Master Steps to Improving a Program
Introduction: Inspired by a xkcd comic , Peter Norvig , Director of Research at Google and all around interesting and nice guy, has created an above par code kata involving a regex program that demonstrates the core inner loop of many successful systems profiled on HighScalability. The original code is at xkcd 1313: Regex Golf , which comes up with an algorithm to find a short regex that matches the winners and not the losers from two arbitrary lists. The Python code is readable, the process is TDDish, and the problem, which sounds simple, but soon explodes into regex weirdness, as does most regex code. If you find regular expressions confusing you'll definitely benefit from Peter's deliberate strategy for finding a regex. The post demonstrating the iterated improvement of the program is at xkcd 1313: Regex Golf (Part 2: Infinite Problems) . As with most first solutions it wasn't optimal. To improve the program Peter recommends the following steps: Profiling : Figure out wher
3 0.71970952 1395 high scalability-2013-01-28-DuckDuckGo Architecture - 1 Million Deep Searches a Day and Growing
Introduction: This is an interview with Gabriel Weinberg , founder of Duck Duck Go and general all around startup guru , on what DDG’s architecture looks like in 2012. Innovative search engine upstart DuckDuckGo had 30 million searches in February 2012 and averages over 1 million searches a day. It’s being positioned by super investor Fred Wilson as a clean, private, impartial and fast search engine. After talking with Gabriel I like what Fred Wilson said earlier, it seems closer to the heart of the matter: We invested in DuckDuckGo for the Reddit, Hacker News anarchists . Choosing DuckDuckGo can be thought of as not just a technical choice, but a vote for revolution. In an age when knowing your essence is not about about love or friendship, but about more effectively selling you to advertisers, DDG is positioning themselves as the do not track alternative , keepers of the privacy flame . You will still be monetized of course, but in a more civilized and an
4 0.6919713 1239 high scalability-2012-05-04-Stuff The Internet Says On Scalability For May 4, 2012
Introduction: It's HighScalability Time: Quotable quotes: Richard Feynman: Suppose that little things behave very differently than anything big @orgnet : "Data, data everywhere, but not a thought to think" -- John Allen Paulos, Mathematician @ bcarlso : just throw out the word " scalability ". That'll bring em out @ codypo : Here are the steps to the Scalability Shuffle. 1: log everything. 2: analyze logs. 3: profile. 4: refactor. 5: repeat. @ FoggSeastack : If math had been taught in a relevant way I might have been a # BigData person today. @secboffin : I know a programming joke about 10,000 mutexes, but it's a bit contentious. Twitter gets personal with Improved personalization algorithms and real-time indexing , a tale of a real-time tool chain. Earlybird is Twitter's real-time search system. Every Tweet has its URLs extracted and expanded. URL contents are fetched via SpiderDuck . Cassovary, a graph processing library, is used t
5 0.67541277 1148 high scalability-2011-11-29-DataSift Architecture: Realtime Datamining at 120,000 Tweets Per Second
Introduction: I remember the excitement of when Twitter first opened up their firehose. As an early adopter of the Twitter API I could easily imagine some of the cool things you could do with all that data. I also remember the disappointment of learning that in the land of BigData, data has a price, and that price would be too high for little fish like me. It was like learning for the first time there would be no BigData Santa Clause. For a while though I had the pleasure of pondering just how I would handle all that data. It's a fascinating problem. You have to be able to reliably consume it, normalize it, merge it with other data, apply functions on it, store it, query it, distribute it, and oh yah, monetize it. Most of that in realish-time. And if you are trying to create a platform for allowing the entire Internet do to the same thing to the firehose, the challenge is exponentially harder. DataSift is in the exciting position of creating just such a firehose eating, data chomping machine. Y
6 0.67082685 1141 high scalability-2011-11-11-Stuff The Internet Says On Scalability For November 11, 2011
7 0.66722143 1491 high scalability-2013-07-15-Ask HS: What's Wrong with Twitter, Why Isn't One Machine Enough?
8 0.66472423 1447 high scalability-2013-04-26-Stuff The Internet Says On Scalability For April 26, 2013
9 0.65394682 780 high scalability-2010-02-19-Twitter’s Plan to Analyze 100 Billion Tweets
10 0.65343988 1403 high scalability-2013-02-08-Stuff The Internet Says On Scalability For February 8, 2013
11 0.65270764 1431 high scalability-2013-03-29-Stuff The Internet Says On Scalability For March 29, 2013
12 0.65269709 1458 high scalability-2013-05-15-Lesson from Airbnb: Give Yourself Permission to Experiment with Non-scalable Changes
13 0.65233999 1389 high scalability-2013-01-18-Stuff The Internet Says On Scalability For January 18, 2013
14 0.64941806 166 high scalability-2007-11-27-Solving the Client Side API Scalability Problem with a Little Game Theory
15 0.6491971 855 high scalability-2010-07-11-So, Why is Twitter Really Not Using Cassandra to Store Tweets?
16 0.64907557 1235 high scalability-2012-04-27-Stuff The Internet Says On Scalability For April 27, 2012
17 0.64716762 347 high scalability-2008-07-07-Five Ways to Stop Framework Fixation from Crashing Your Scaling Strategy
18 0.64649868 330 high scalability-2008-05-27-Should Twitter be an All-You-Can-Eat Buffet or a Vending Machine?
19 0.64323139 741 high scalability-2009-11-16-Building Scalable Systems Using Data as a Composite Material
20 0.64317179 1253 high scalability-2012-05-28-The Anatomy of Search Technology: Crawling using Combinators
topicId topicWeight
[(1, 0.073), (2, 0.206), (10, 0.44), (40, 0.014), (61, 0.114), (85, 0.016), (94, 0.054)]
simIndex simValue blogId blogTitle
1 0.9763093 178 high scalability-2007-12-10-1 Master, N Slaves
Introduction: Hello all, Reading the site you can note that "1 Master for writes, N Slaves for reads" scheme is used offen. How is this implemented? Who decides where writes and reads go? Something in application level or specific database proxies, like Slony-I? Thanks.
2 0.97282165 874 high scalability-2010-08-07-ArchCamp: Scalable Databases (NoSQL)
Introduction: ArchCamp: Scalable Databasess (NoSQL) The ArchCamp unconference was held this past Friday at HackerDojo in Mountain View, CA. There was plenty of pizza, beer, and great conversation. This session started out free-form, but shaped up pretty quickly into a discussion of the popular open source scalable NoSQL databases and the architectural categories in which they belong.
Introduction: If you stayed up all night watching the life reaffirming Curiosity landing on Mars , then this paper, High-Performance Concurrency Control Mechanisms for Main-Memory Databases , has nothing to do with that at all, but it is an excellent look at how to use optimistic MVCC schemes to reduce lock overhead on in-memory datastructures: A database system optimized for in-memory storage can support much higher transaction rates than current systems. However, standard concurrency control methods used today do not scale to the high transaction rates achievable by such systems. In this paper we introduce two efficient concurrency control methods specifically designed for main-memory databases. Both use multiversioning to isolate read-only transactions from updates but differ in how atomicity is ensured: one is optimistic and one is pessimistic. To avoid expensive context switching, transactions never block during normal processing but they may have to wait before commit to ensure corr
4 0.9476456 1066 high scalability-2011-06-22-It's the Fraking IOPS - 1 SSD is 44,000 IOPS, Hard Drive is 180
Introduction: Planning your next buildout and thinking SSDs are still far in the future? Still too expensive, too low density. Hard disks are cheap, familiar, and store lots of stuff. In this short and entertaining video Wikia's Artur Bergman wants to change your mind about SSDs. SSDs are for today, get with the math already. Here's Artur's logic: Wikia is all SSD in production. The new Wikia file servers have a theoretical read rate of ~10GB/sec sequential, 6GB/sec random and 1.2 million IOPs. If you can't do math or love the past, you love spinning rust. If you are awesome you love SSDs. SSDs are cheaper than drives using the most relevant metric: $/GB/IOPS. 1 SSD is 44,000 IOPS and one hard drive is 180 IOPS. Need 1 SSD instead of 50 hard drives. With 8 million files there's a 9 minute fsck. Full backup in 12 minutes (X-25M based). 4 GB/sec random read average latency 1 msec. 2.2 GB/sec random write average latency 1 msec. 50TBs of SSDs in one machine for $80,000. With the densi
Introduction: This is a guest post by Ali Khajeh-Hosseini , Technical Lead at PlanForCloud . The original article was published on their site . With 29 cloud price reductions I thought it would be interesting to see how the bottom line would change compared to an article we published last year . The result is surprisingly little for TripAdvisor because prices for On Demand instances have not dropped as fast as for other other instances types. Over the last year and a half, we counted 29 price reductions in cloud services provided by AWS, Google Compute Engine, Windows Azure, and Rackspace Cloud. Price reductions have a direct effect on cloud users, but given the usual tiny reductions, how significant is that effect on the bottom line? Last year I wrote about cloud cost forecasts for TripAdvisor and Pinterest . TripAdvisor was experimenting with AWS and attempted to process 700K HTTP requests per minute on a replica of its live site, and Pinterest was growing massively on AWS . In th
6 0.9205665 430 high scalability-2008-10-26-Should you use a SAN to scale your architecture?
7 0.91165489 1635 high scalability-2014-04-21-This is why Microsoft won. And why they lost.
same-blog 8 0.90935272 689 high scalability-2009-08-28-Strategy: Solve Only 80 Percent of the Problem
9 0.89195895 584 high scalability-2009-04-27-Some Questions from a newbie
10 0.89169252 171 high scalability-2007-12-02-a8cjdbc - update verision 1.3
11 0.89167261 170 high scalability-2007-12-02-Database-Clustering: a8cjdbc - update: version 1.3
12 0.8828904 792 high scalability-2010-03-10-How FarmVille Scales - The Follow-up
13 0.88088608 1046 high scalability-2011-05-23-Evernote Architecture - 9 Million Users and 150 Million Requests a Day
14 0.87730157 1631 high scalability-2014-04-14-How do you even do anything without using EBS?
15 0.86878604 767 high scalability-2010-01-27-Hot Scalability Links for January 28 2010
16 0.85389841 1585 high scalability-2014-01-24-Stuff The Internet Says On Scalability For January 24th, 2014
17 0.81094742 142 high scalability-2007-11-05-Strategy: Diagonal Scaling - Don't Forget to Scale Out AND Up
19 0.79621464 1331 high scalability-2012-10-02-An Epic TripAdvisor Update: Why Not Run on the Cloud? The Grand Experiment.
20 0.78811908 1004 high scalability-2011-03-14-Twitter by the Numbers - 460,000 New Accounts and 140 Million Tweets Per Day