high_scalability high_scalability-2013 high_scalability-2013-1462 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: What data structure is more sacred than the link list? If we get rid of it what silly interview questions would we use instead? But not using linked-lists is exactly what Aater Suleman recommends in Should you ever use Linked-Lists? In The Secret To 10 Million Concurrent Connections one of the important strategies is not scribbling data all over memory via pointers because following pointers increases cache misses which reduces performance . And there’s nothing more iconic of pointers than the link list. Here are Aeter's reasons to be anti-linked-list: They reduce the benefit of out-of-order execution. They throw off hardware prefetching. They reduce DRAM and TLB locality. They cannot leverage SIMD. They are harder to send to GPUs. He also demolishes the pros of linked-lists, finding arrays a better option in almost every case. Good discussion in the comment section as not everyone agrees. Patrick Wyatt details how a linked-list threading bug repeated
sentIndex sentText sentNum sentScore
1 What data structure is more sacred than the link list? [sent-1, score-0.448]
2 If we get rid of it what silly interview questions would we use instead? [sent-2, score-0.082]
3 But not using linked-lists is exactly what Aater Suleman recommends in Should you ever use Linked-Lists? [sent-3, score-0.194]
4 In The Secret To 10 Million Concurrent Connections one of the important strategies is not scribbling data all over memory via pointers because following pointers increases cache misses which reduces performance . [sent-4, score-1.118]
5 And there’s nothing more iconic of pointers than the link list. [sent-5, score-0.738]
6 Here are Aeter's reasons to be anti-linked-list: They reduce the benefit of out-of-order execution. [sent-6, score-0.076]
7 He also demolishes the pros of linked-lists, finding arrays a better option in almost every case. [sent-11, score-0.359]
8 Good discussion in the comment section as not everyone agrees. [sent-12, score-0.25]
9 Patrick Wyatt details how a linked-list threading bug repeatedly crashed Starcraft. [sent-13, score-0.174]
10 Also a good comment discussion with a lot of disagreement. [sent-14, score-0.25]
11 The idea is mathematical complexity does not reflect real-life performance in situ on actual hardware: Big-O notations tells nothing about how one (data structure with algorithm) fare against another. [sent-16, score-0.576]
12 Big-O will only tell you how performance will degrade as n increases. [sent-17, score-0.096]
13 So comparing one a data structure that is RAM intensive to another data structure that is cache friendly from an abstract Big-O point-of-view is just pointless. [sent-18, score-0.435]
14 When memory is local and you can avoid cache misses and avoid paging then operations like copying are dirt cheap, when in the mind of the programmer they are expensive. [sent-20, score-0.806]
15 When you don't have locality of reference the thing that programmers think is dirt cheap, following pointers, is extremely expensive. [sent-21, score-0.35]
16 While the first part of the article is frustratingly vague, the last half is full of excellent explanatory detail. [sent-22, score-0.218]
17 And there are even more good comments in the comment section. [sent-24, score-0.166]
18 Probably as much as people used to love goto statements. [sent-26, score-0.124]
19 So if performance really matters to you then this is something to think about rather than immediately dismiss as heresy. [sent-27, score-0.124]
20 Related Articles Discuss on Hacker News and on Reddit On C Linked Lists (Profiling and Optimizing) High performance alternative to bounded queues for exchanging data between concurrent threads - linked lists make poor queues also. [sent-28, score-0.725]
wordName wordTfidf (topN-words)
[('pointers', 0.377), ('lists', 0.237), ('misses', 0.193), ('dirt', 0.189), ('structure', 0.175), ('comment', 0.166), ('link', 0.156), ('arrays', 0.153), ('linked', 0.148), ('goto', 0.124), ('dismiss', 0.124), ('situ', 0.124), ('fare', 0.124), ('iconic', 0.124), ('aater', 0.124), ('suleman', 0.124), ('demolishes', 0.124), ('redditon', 0.124), ('tlb', 0.117), ('sacred', 0.117), ('ever', 0.111), ('explanatory', 0.111), ('wyatt', 0.111), ('queues', 0.11), ('frustratingly', 0.107), ('crashed', 0.101), ('crunching', 0.101), ('degrade', 0.096), ('concurrent', 0.096), ('cheap', 0.094), ('exchanging', 0.093), ('paging', 0.088), ('avoid', 0.086), ('following', 0.086), ('cache', 0.085), ('discussion', 0.084), ('recommends', 0.083), ('pros', 0.082), ('inthe', 0.082), ('silly', 0.082), ('nothing', 0.081), ('copying', 0.079), ('bounded', 0.079), ('buffers', 0.077), ('ring', 0.077), ('dram', 0.076), ('reduce', 0.076), ('locality', 0.075), ('threading', 0.073), ('reflect', 0.072)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 1462 high scalability-2013-05-22-Strategy: Stop Using Linked-Lists
Introduction: What data structure is more sacred than the link list? If we get rid of it what silly interview questions would we use instead? But not using linked-lists is exactly what Aater Suleman recommends in Should you ever use Linked-Lists? In The Secret To 10 Million Concurrent Connections one of the important strategies is not scribbling data all over memory via pointers because following pointers increases cache misses which reduces performance . And there’s nothing more iconic of pointers than the link list. Here are Aeter's reasons to be anti-linked-list: They reduce the benefit of out-of-order execution. They throw off hardware prefetching. They reduce DRAM and TLB locality. They cannot leverage SIMD. They are harder to send to GPUs. He also demolishes the pros of linked-lists, finding arrays a better option in almost every case. Good discussion in the comment section as not everyone agrees. Patrick Wyatt details how a linked-list threading bug repeated
2 0.10091242 1151 high scalability-2011-12-05-Stuff The Internet Says On Scalability For December 5, 2011
Introduction: It's HighScalability Time! Quotable quotes: @jaykreps : Was wondering, How can I turn my boring, cachable, read-only traffic into random writes on mongodb? And lo! link @marshallk : Google runs 100-200 experiments every day on UI, algorithm & product @styggiti : The problem with companies like IBM and Oracle baking NoSQL "scalability" into their products isn't the tech, it's the $$ licensing. Blazing fast node.js: 10 performance tips from LinkedIn Mobile . You may have thought that node.js made just everything magically fast, but Shravya Garlapati has some great strategies for going even faster: Avoid synchronous code; Turn off socket pooling; Don't use Node.js for static assets; Render on the client-side; Use gzip; Go parallel; Go session-free; Use binary modules; Use standard V8 JavaScript instead of client-side libraries; Keep your code small and light. Nice thread in NoSQL Databases on HBase and Consistency in CAP . The short summary of the a
3 0.099282183 1456 high scalability-2013-05-13-The Secret to 10 Million Concurrent Connections -The Kernel is the Problem, Not the Solution
Introduction: Now that we have the C10K concurrent connection problem licked, how do we level up and support 10 million concurrent connections? Impossible you say. Nope, systems right now are delivering 10 million concurrent connections using techniques that are as radical as they may be unfamiliar. To learn how it’s done we turn to Robert Graham , CEO of Errata Security, and his absolutely fantastic talk at Shmoocon 2013 called C10M Defending The Internet At Scale . Robert has a brilliant way of framing the problem that I’ve never heard of before. He starts with a little bit of history, relating how Unix wasn’t originally designed to be a general server OS, it was designed to be a control system for a telephone network. It was the telephone network that actually transported the data so there was a clean separation between the control plane and the data plane. The problem is we now use Unix servers as part of the data plane , which we shouldn’t do at all. If we were des
4 0.099124312 545 high scalability-2009-03-19-Product: Redis - Not Just Another Key-Value Store
Introduction: With the introduction of Redis your options in the key-value space just grew and your choice of which to pick just got a lot harder. But when you think about it, that's not a bad position to be in at all. Redis (REmote DIctionary Server) - a key-value database. It's similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists and sets with atomic operations to push/pop elements. The key points are: open source; speed (benchmarked performing 110,000 SET operations, and 81,000 GETs, per second); persistence, but in an asynchronous way taking everything in memory; support for higher level data structures and atomic operations. The home page is well organized so I'll spare the excessive-copying-to-make-this-post-longer. For a good overview of Redis take a look at Antonio Cangiano's article: Introducing Redis: a fast key-value database . If you are looking at a way to understand how Redis is different than something like
5 0.094597772 514 high scalability-2009-02-18-Numbers Everyone Should Know
Introduction: Google AppEngine Numbers This group of numbers is from Brett Slatkin in Building Scalable Web Apps with Google App Engine . Writes are expensive! Datastore is transactional: writes require disk access Disk access means disk seeks Rule of thumb: 10ms for a disk seek Simple math: 1s / 10ms = 100 seeks/sec maximum Depends on: * The size and shape of your data * Doing work in batches (batch puts and gets) Reads are cheap! Reads do not need to be transactional, just consistent Data is read from disk once, then it's easily cached All subsequent reads come straight from memory Rule of thumb: 250usec for 1MB of data from memory Simple math: 1s / 250usec = 4GB/sec maximum * For a 1MB entity, that's 4000 fetches/sec Numbers Miscellaneous This group of numbers is from a presentation Jeff Dean gave at a Engineering All-Hands Meeting at Google. L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 100 ns Main me
6 0.082822971 661 high scalability-2009-07-25-Latency is Everywhere and it Costs You Sales - How to Crush it
7 0.08269456 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
8 0.080849655 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
9 0.07989525 435 high scalability-2008-10-30-The case for functional decomposition
10 0.078062095 1292 high scalability-2012-07-27-Stuff The Internet Says On Scalability For July 27, 2012
11 0.077437162 1235 high scalability-2012-04-27-Stuff The Internet Says On Scalability For April 27, 2012
12 0.077335604 360 high scalability-2008-08-04-A Bunch of Great Strategies for Using Memcached and MySQL Better Together
13 0.07695125 1305 high scalability-2012-08-16-Paper: A Provably Correct Scalable Concurrent Skip List
14 0.074441448 152 high scalability-2007-11-13-Flickr Architecture
15 0.07434275 1559 high scalability-2013-12-06-Stuff The Internet Says On Scalability For December 6th, 2013
16 0.073760517 155 high scalability-2007-11-15-Video: Dryad: A general-purpose distributed execution platform
17 0.073308781 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
18 0.073095322 1304 high scalability-2012-08-14-MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) and Familiar (SQL)
19 0.073074467 780 high scalability-2010-02-19-Twitter’s Plan to Analyze 100 Billion Tweets
20 0.072530285 971 high scalability-2011-01-10-Riak's Bitcask - A Log-Structured Hash Table for Fast Key-Value Data
topicId topicWeight
[(0, 0.129), (1, 0.099), (2, -0.014), (3, -0.002), (4, -0.001), (5, 0.041), (6, 0.004), (7, 0.06), (8, -0.023), (9, -0.02), (10, -0.015), (11, -0.042), (12, 0.003), (13, 0.038), (14, -0.051), (15, -0.041), (16, 0.023), (17, 0.013), (18, -0.012), (19, -0.01), (20, -0.033), (21, -0.038), (22, 0.015), (23, 0.05), (24, -0.032), (25, -0.029), (26, 0.044), (27, 0.005), (28, -0.006), (29, -0.004), (30, -0.024), (31, 0.017), (32, -0.015), (33, 0.012), (34, -0.016), (35, -0.004), (36, 0.044), (37, 0.06), (38, 0.03), (39, 0.026), (40, 0.014), (41, 0.029), (42, -0.043), (43, 0.01), (44, 0.026), (45, 0.02), (46, -0.008), (47, 0.025), (48, 0.009), (49, -0.033)]
simIndex simValue blogId blogTitle
same-blog 1 0.95131278 1462 high scalability-2013-05-22-Strategy: Stop Using Linked-Lists
Introduction: What data structure is more sacred than the link list? If we get rid of it what silly interview questions would we use instead? But not using linked-lists is exactly what Aater Suleman recommends in Should you ever use Linked-Lists? In The Secret To 10 Million Concurrent Connections one of the important strategies is not scribbling data all over memory via pointers because following pointers increases cache misses which reduces performance . And there’s nothing more iconic of pointers than the link list. Here are Aeter's reasons to be anti-linked-list: They reduce the benefit of out-of-order execution. They throw off hardware prefetching. They reduce DRAM and TLB locality. They cannot leverage SIMD. They are harder to send to GPUs. He also demolishes the pros of linked-lists, finding arrays a better option in almost every case. Good discussion in the comment section as not everyone agrees. Patrick Wyatt details how a linked-list threading bug repeated
2 0.78583592 462 high scalability-2008-12-06-Paper: Real-world Concurrency
Introduction: An excellent article by Bryan Cantrill and Jeff Bonwick on how to write multi-threaded code. With more processors and no magic bullet solution for how to use them, knowing how to write multiprocessor code that doesn't screw up your system is still a valuable skill. Some topics: Know your cold paths from your hot paths. Intuition is frequently wrong—be data intensive. Know when—and when not—to break up a lock. Be wary of readers/writer locks. Consider per-CPU locking. Know when to broadcast—and when to signal. Learn to debug postmortem. Design your systems to be composable. Don't use a semaphore where a mutex would suffice. Consider memory retiring to implement per-chain hash-table locks. Be aware of false sharing. Consider using nonblocking synchronization routines to monitor contention. When reacquiring locks, consider using generation counts to detect state change. Use wait- and lock-free structures only if you absolutely must. Prepare for the th
3 0.74518168 1246 high scalability-2012-05-16-Big List of 20 Common Bottlenecks
Introduction: In Zen And The Art Of Scaling - A Koan And Epigram Approach , Russell Sullivan offered an interesting conjecture: there are 20 classic bottlenecks. This sounds suspiciously like the idea that there only 20 basic story plots . And depending on how you chunkify things, it may be true, but in practice we all know bottlenecks come in infinite flavors, all tasting of sour and ash. One day Aurelien Broszniowski from Terracotta emailed me his list of bottlenecks, we cc’ed Russell in on the conversation, he gave me his list, I have a list, and here’s the resulting stone soup. Russell said this is his “I wish I knew when I was younger" list and I think that’s an enriching way to look at it. The more experience you have, the more different types of projects you tackle, the more lessons you’ll be able add to a list like this. So when you read this list, and when you make your own, you are stepping through years of accumulated experience and more than a little frustration, but in ea
4 0.74155599 1407 high scalability-2013-02-15-Stuff The Internet Says On Scalability For February 15, 2013
Introduction: Hey, it's HighScalability time: The Herokulypse. A cautionary tale of what can happen when scalability is left for later. Rap Genius created quite a stir ( reddit , Hacker News ) when they documented high costs ($20K/month for 15 million monthly uniques) and poor performance (6 second average response times) using Heroku's random routing mesh . The cause was tracked to queuing at the dyno level when the expectation was requests are routed to free dynos. Heroku admits this is a problem . So poor load balancing combined with RoR single threading = poor performance, one that adding more dynos and spending more money won't necessarily help. While it seems clear Heroku didn't make this aspect of their system crystal clear, the incident has generated a lot of teaching moments, if you slog through it all. This is a developing story. You need money to feed the beast. Fred Wilson has some revenue ideas for you : Paid App Downloads - ex. WhatsApp; In-app purchases - ex. Zynga Poke
5 0.73673701 1621 high scalability-2014-03-28-Stuff The Internet Says On Scalability For March 28th, 2014
Introduction: Hey, it's HighScalability time: Looks like a multiverse , if you can keep it. Quotable Quotes: @abt_programming : "I am a Unix Creationist. I believe the world was created on January 1, 1970 and as prophesized, will end on January 19, 2038" - @teropa @demisbellot : Cloud prices are hitting attractive price points, one more 40-50% drop and there'd be little reason to go it alone. @scott4arrows : Dentist "Do you floss regularly?" Me "Do you back up your computer data regularly?" @avestal : "I Kickstarted the Oculus Rift, what do I get?" You get a lesson in how capitalism works. @mtabini : “$20 charger that cost $2 to make.” Not pictured here: the $14 you pay for the 10,000 charger iterations that never made it to production. @strlen : "I built the original assembler in JS, because it's what I prefer to use when I need to get down to bare metal." - Adm. Grace Hopper tedchs : I'd like to propose a new rule for Hacker News: only if you have b
6 0.73628283 1387 high scalability-2013-01-15-More Numbers Every Awesome Programmer Must Know
7 0.72896361 1318 high scalability-2012-09-07-Stuff The Internet Says On Scalability For September 7, 2012
8 0.72349375 1151 high scalability-2011-12-05-Stuff The Internet Says On Scalability For December 5, 2011
9 0.72168821 1455 high scalability-2013-05-10-Stuff The Internet Says On Scalability For May 10, 2013
10 0.7129643 1475 high scalability-2013-06-13-Busting 4 Modern Hardware Myths - Are Memory, HDDs, and SSDs Really Random Access?
11 0.7123847 174 high scalability-2007-12-05-Product: Tugela Cache
12 0.71190834 1509 high scalability-2013-08-30-Stuff The Internet Says On Scalability For August 30, 2013
13 0.69489872 1652 high scalability-2014-05-21-9 Principles of High Performance Programs
14 0.68864471 1572 high scalability-2014-01-03-Stuff The Internet Says On Scalability For January 3rd, 2014
15 0.68181103 1633 high scalability-2014-04-16-Six Lessons Learned the Hard Way About Scaling a Million User System
16 0.67984259 1451 high scalability-2013-05-03-Stuff The Internet Says On Scalability For May 3, 2013
17 0.6784339 1141 high scalability-2011-11-11-Stuff The Internet Says On Scalability For November 11, 2011
18 0.67574131 1302 high scalability-2012-08-10-Stuff The Internet Says On Scalability For August 10, 2012
19 0.67349023 978 high scalability-2011-01-26-Google Pro Tip: Use Back-of-the-envelope-calculations to Choose the Best Design
20 0.67266899 594 high scalability-2009-05-08-Eight Best Practices for Building Scalable Systems
topicId topicWeight
[(1, 0.087), (2, 0.186), (9, 0.326), (10, 0.032), (61, 0.08), (79, 0.101), (85, 0.013), (94, 0.086)]
simIndex simValue blogId blogTitle
1 0.8882122 604 high scalability-2009-05-20-Paper: Flux: An Adaptive Partitioning Operator for Continuous Query Systems
Introduction: At the core of the new real-time web, which is really really old, are continuous queries. I like how this paper proposed to handle dynamic demand and dynamic resource availability by making the underlying system adaptable, which seems like a very cloudy kind of thing to do. Abstract: The long-running nature of continuous queries poses new scalability challenges for dataflow processing. CQ systems execute pipelined dataflows that may be shared across multiple queries. The scalability of these dataflows is limited by their constituent, stateful operators – e.g. windowed joins or grouping operators. To scale such operators, a natural solution is to partition them across a shared-nothing platform. But in the CQ context, traditional, static techniques for partitioned parallelism can exhibit detrimental imbalances as workload and runtime conditions evolve. Longrunning CQ dataflows must continue to function robustly in the face of these imbalances. To address this challenge, we introduce
same-blog 2 0.8297562 1462 high scalability-2013-05-22-Strategy: Stop Using Linked-Lists
Introduction: What data structure is more sacred than the link list? If we get rid of it what silly interview questions would we use instead? But not using linked-lists is exactly what Aater Suleman recommends in Should you ever use Linked-Lists? In The Secret To 10 Million Concurrent Connections one of the important strategies is not scribbling data all over memory via pointers because following pointers increases cache misses which reduces performance . And there’s nothing more iconic of pointers than the link list. Here are Aeter's reasons to be anti-linked-list: They reduce the benefit of out-of-order execution. They throw off hardware prefetching. They reduce DRAM and TLB locality. They cannot leverage SIMD. They are harder to send to GPUs. He also demolishes the pros of linked-lists, finding arrays a better option in almost every case. Good discussion in the comment section as not everyone agrees. Patrick Wyatt details how a linked-list threading bug repeated
3 0.7099725 520 high scalability-2009-02-25-Advanced BPM program in USA and India discount for Group Membership
Introduction: One day, Advanced BPM Certified program led by Global Leader, Steve Towers. Latest Case Studies and innovations - hands-on, practical. Event locations USA San Francisco 16 Mar 09 Atlanta 17 Mar 09 New York 19 Mar 09 Chicago 20 Mar 09 www.BESTBPMTRAINING.COM India Mumbai 23 Mar 09 Bangalore 24 Mar 09 Hyderabad 26 Mar 09 Delhi 27 Mar 09 www.BPMTRAININGNOW.COM For more information please visit For registrations, group discounts or further details please contact Caroline.smith@icmgworld.com
4 0.69833273 18 high scalability-2007-07-16-Paper: MySQL Scale-Out by application partitioning
Introduction: MySQL Scale-Out by application partitioning by Oli Sennhauser Eventually every database system hit its limits. Especially on the Internet, where you have millions of users which theoretically access your database simultaneously, eventually your IO system will be a bottleneck. [A] promising but more complex solution with nearly no scale-out limits is application partitioning. If and when you get into the top-1000 rank on alexa [1], you have to think about such solutions. A Quick Hit of What's Inside Horizontal application partitioning, Vertical application partitioning, Disk IO calculations, How to partition an entity
5 0.69608772 519 high scalability-2009-02-23-Database Sharding at Netlog, with MySQL and PHP
Introduction: Jurriaan Persyn is a Lead Web Developer at Netlog, a social portal site that gets 50 million unique visitors and 5+ billion page views per month. In this paper Jurriaan goes into a lot of excellent nuts and bolts details about how they used sharding to scale their system. If you are pondering sharding as a solution to your scaling problems you'll want to read this paper. As the paper is quite well organized there's no reason to write a summary, but I especially liked this part from the conclusion: If you can do with simpler solutions (better hardware, more hardware, server tweaking and tuning, vertical partitioning, sql query optimization, ...) that require less development cost, why invest lots of effort in sharding? On the other hand, when your visitor statistics really start blowing through the roof, it is a good direction to go. After all, it worked for us.
6 0.69580626 1580 high scalability-2014-01-15-Vedis - An Embedded Implementation of Redis Supporting Terabyte Sized Databases
7 0.6897192 577 high scalability-2009-04-22-Gear6 Web cache - the hardware solution for working with Memcache
8 0.68444645 409 high scalability-2008-10-13-Challenges from large scale computing at Google
9 0.67987436 1270 high scalability-2012-06-22-Stuff The Internet Says On Scalability For June 22, 2012
10 0.647246 169 high scalability-2007-12-01-many website, one setup, many databases
11 0.63584518 942 high scalability-2010-11-15-Strategy: Biggest Performance Impact is to Reduce the Number of HTTP Requests
14 0.61237401 863 high scalability-2010-07-22-How can we spark the movement of research out of the Ivory Tower and into production?
15 0.61092484 1112 high scalability-2011-09-07-What Google App Engine Price Changes Say About the Future of Web Architecture
16 0.61090457 119 high scalability-2007-10-10-WAN Accelerate Your Way to Lightening Fast Transfers Between Data Centers
17 0.61083347 1402 high scalability-2013-02-07-Ask HighScalability: Web asset server concept - 3rd party software available?
18 0.61068547 1516 high scalability-2013-09-13-Stuff The Internet Says On Scalability For September 13, 2013
19 0.61045736 517 high scalability-2009-02-21-Google AppEngine - A Second Look
20 0.61019826 1649 high scalability-2014-05-16-Stuff The Internet Says On Scalability For May 16th, 2014