high_scalability high_scalability-2008 high_scalability-2008-374 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I found the discussion of the available bandwidth of tree vs higher dimensional virtual networks topologies quite, to quote Spock, fascinating : A mathematical analysis by Ritter (2002) (one of the original developers of Napster) presented a detailed numerical argument demonstrating that the Gnutella network could not scale to the capacity of its competitor, the Napster network. Essentially, that model showed that the Gnutella network is severely bandwidth-limited long before the P2P population reaches a million peers. In each of these previous studies, the conclusions have overlooked the intrinsic bandwidth limits of the underlying topology in the Gnutella network: a Cayley tree (Rains and Sloane 1999) (see Sect. 9.4 for the definition). Trees are known to have lower aggregate bandwidth than higher dimensional topologies, e.g., hypercubes and hypertori. Studies of interconnection topologies in the literature have tended to focus on hardware implementations (see, e.g., Culler et
sentIndex sentText sentNum sentScore
1 Essentially, that model showed that the Gnutella network is severely bandwidth-limited long before the P2P population reaches a million peers. [sent-2, score-0.412]
2 In each of these previous studies, the conclusions have overlooked the intrinsic bandwidth limits of the underlying topology in the Gnutella network: a Cayley tree (Rains and Sloane 1999) (see Sect. [sent-3, score-0.904]
3 Trees are known to have lower aggregate bandwidth than higher dimensional topologies, e. [sent-6, score-0.517]
4 Studies of interconnection topologies in the literature have tended to focus on hardware implementations (see, e. [sent-9, score-0.504]
5 1996; Buyya 1999), which are generally limited by the cost of the chips and wires to a few thousand nodes. [sent-12, score-0.315]
6 P2P networks, on the other hand, are intended to support from hundreds of thousands to millions of simultaneous peers, and since they are implemented in software, hyper-topologies are relatively unfettered by the economics of hardware. [sent-13, score-0.472]
7 In this chapter, we analyze the scalability of several alternative topologies and compare their throughput up to 2–3 million peers. [sent-14, score-0.288]
8 The virtual hypercube and the virtual hypertorus offer near-linear scalable bandwidth subject to the number of peer TCP/IP connections that can be simultaneously kept open. [sent-15, score-0.768]
wordName wordTfidf (topN-words)
[('gnutella', 0.353), ('studies', 0.205), ('bandwidth', 0.194), ('buyya', 0.176), ('napster', 0.176), ('ritter', 0.176), ('tothe', 0.176), ('unfettered', 0.176), ('tree', 0.172), ('numerical', 0.158), ('demonstrating', 0.143), ('tended', 0.143), ('virtual', 0.143), ('conclusions', 0.14), ('literature', 0.14), ('chapter', 0.137), ('overlooked', 0.137), ('wires', 0.134), ('competitor', 0.134), ('dimensional', 0.134), ('topologies', 0.127), ('et', 0.123), ('population', 0.117), ('peers', 0.112), ('reaches', 0.111), ('peer', 0.108), ('topology', 0.108), ('higher', 0.106), ('quote', 0.105), ('economics', 0.102), ('chips', 0.1), ('mathematical', 0.1), ('intended', 0.1), ('definition', 0.099), ('showed', 0.099), ('argument', 0.097), ('simultaneous', 0.094), ('simultaneously', 0.094), ('implementations', 0.094), ('presented', 0.089), ('essentially', 0.088), ('subject', 0.086), ('compare', 0.085), ('network', 0.085), ('aggregate', 0.083), ('thousand', 0.081), ('hand', 0.079), ('underlying', 0.077), ('previous', 0.076), ('alternative', 0.076)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999976 374 high scalability-2008-08-30-Paper: GargantuanComputing—GRIDs and P2P
Introduction: I found the discussion of the available bandwidth of tree vs higher dimensional virtual networks topologies quite, to quote Spock, fascinating : A mathematical analysis by Ritter (2002) (one of the original developers of Napster) presented a detailed numerical argument demonstrating that the Gnutella network could not scale to the capacity of its competitor, the Napster network. Essentially, that model showed that the Gnutella network is severely bandwidth-limited long before the P2P population reaches a million peers. In each of these previous studies, the conclusions have overlooked the intrinsic bandwidth limits of the underlying topology in the Gnutella network: a Cayley tree (Rains and Sloane 1999) (see Sect. 9.4 for the definition). Trees are known to have lower aggregate bandwidth than higher dimensional topologies, e.g., hypercubes and hypertori. Studies of interconnection topologies in the literature have tended to focus on hardware implementations (see, e.g., Culler et
2 0.12040368 371 high scalability-2008-08-24-A Scalable, Commodity Data Center Network Architecture
Introduction: Looks interesting... Abstract: Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Nonuniform bandwidth among data center nodes complicates application design and limits overall system performance. In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodi
3 0.094202541 1316 high scalability-2012-09-04-Changing Architectures: New Datacenter Networks Will Set Your Code and Data Free
Introduction: One consequence of IT standardization and commodification has been Google’s datacenter is the computer view of the world. In that view all compute resources (memory, CPU, storage) are fungible. They are interchangeable and location independent, individual computers lose identity and become just a part of a service. Thwarting that nirvana has been the abysmal performance of commodity datacenter networks which have caused the preference of architectures that favor the collocation of state and behaviour on the same box. MapReduce famously ships code over to storage nodes for just this reason. Change the network and you change the fundamental assumption driving collocation based software architectures. You are then free to store data anywhere and move compute anywhere you wish. The datacenter becomes the computer. On the host side with an x8 slot running at PCI-Express 3.0 speeds able to push 8GB/sec (that’s bytes) of bandwidth in both directions, we have
4 0.080978081 1236 high scalability-2012-04-30-Masstree - Much Faster than MongoDB, VoltDB, Redis, and Competitive with Memcached
Introduction: The EuroSys 2012 system conference has an excellent live blog summary of their talks for: Day 1 , Day 2 , Day 3 (thanks Henry at the Paper Trail blog ). Summaries for each of the accepted papers are here . One of the more interesting papers from a NoSQL perspective was Cache Craftiness for Fast Multicore Key-Value Storage , a wonderfully detailed description of the low level techniques used to implement Masstree: A storage system specialized for key-value data in which all data fits in memory, but must persist across server restarts. It supports arbitrary, variable-length keys. It allows range queries over those keys: clients can traverse subsets of the database, or the whole database, in sorted order by key. On a 16-core machine Masstree achieves six to ten million operations per second on parts A–C of the Yahoo! Cloud Serving Benchmark benchmark, more than 30 as fast as VoltDB [5] or MongoDB [2]. If you are looking for innovative detailed high performance design, t
5 0.075351968 1052 high scalability-2011-06-03-Stuff The Internet Says On Scalability For June 3, 2011
Introduction: Submitted for your scaling pleasure: Twitter indexes an average of 2,200 TPS (peek is 4x that) while serving 18,000 QPS (1.6B queries per day). eBay serves 2 billion page views every day requiring more than 75 billion database requests. Quotable Quotes: Infrastructure is adaptation --Kenneth Wright, referencing reservoir building by the Anasazi bnolan : I see why people are all 'denormalize' / 'map reduce' / scalability. I've seen a bunch of megajoins lately, and my macbook doesnt like them. MattTGrant : You say: "Infinite scalability" - I say: "fractal infrastructure" Like the rich, More is different , says Zillionics . Large quantities of something can transform the nature of those somethings. Zillionics is a new realm, and our new home. The scale of so many moving parts require new tools, new mathematics, new mind shifts. Amen. Data mine yourself says the Quantified Self . All that jazz about monitoring and measuring services t
7 0.070263915 1630 high scalability-2014-04-11-Stuff The Internet Says On Scalability For April 11th, 2014
8 0.067816831 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud
9 0.06781324 583 high scalability-2009-04-26-Scale-up vs. Scale-out: A Case Study by IBM using Nutch-Lucene
11 0.067722991 1533 high scalability-2013-10-16-Interview With Google's Ilya Grigorik On His New Book: High Performance Browser Networking
12 0.063754134 469 high scalability-2008-12-17-Scalability Strategies Primer: Database Sharding
13 0.063007936 768 high scalability-2010-02-01-What Will Kill the Cloud?
14 0.061298996 58 high scalability-2007-08-04-Product: Cacti
15 0.060520425 44 high scalability-2007-07-30-Product: Photobucket
16 0.059614785 645 high scalability-2009-06-30-Hot New Trend: Linking Clouds Through Cheap IP VPNs Instead of Private Lines
17 0.059077084 1276 high scalability-2012-07-04-Top Features of a Scalable Database
18 0.05828321 661 high scalability-2009-07-25-Latency is Everywhere and it Costs You Sales - How to Crush it
19 0.057883762 605 high scalability-2009-05-22-Distributed content system with bandwidth balancing
20 0.057631668 1138 high scalability-2011-11-07-10 Core Architecture Pattern Variations for Achieving Scalability
topicId topicWeight
[(0, 0.088), (1, 0.04), (2, 0.031), (3, 0.023), (4, -0.042), (5, 0.002), (6, 0.008), (7, 0.027), (8, -0.034), (9, 0.026), (10, 0.011), (11, -0.035), (12, -0.003), (13, 0.019), (14, 0.016), (15, 0.035), (16, 0.032), (17, 0.028), (18, -0.007), (19, -0.013), (20, -0.006), (21, 0.008), (22, -0.026), (23, -0.034), (24, 0.023), (25, 0.008), (26, -0.033), (27, -0.033), (28, 0.017), (29, -0.021), (30, 0.01), (31, -0.04), (32, 0.017), (33, 0.033), (34, -0.0), (35, 0.007), (36, 0.013), (37, -0.004), (38, -0.0), (39, -0.019), (40, 0.033), (41, 0.011), (42, -0.015), (43, -0.006), (44, 0.007), (45, 0.037), (46, -0.042), (47, 0.004), (48, -0.0), (49, 0.008)]
simIndex simValue blogId blogTitle
same-blog 1 0.97418869 374 high scalability-2008-08-30-Paper: GargantuanComputing—GRIDs and P2P
Introduction: I found the discussion of the available bandwidth of tree vs higher dimensional virtual networks topologies quite, to quote Spock, fascinating : A mathematical analysis by Ritter (2002) (one of the original developers of Napster) presented a detailed numerical argument demonstrating that the Gnutella network could not scale to the capacity of its competitor, the Napster network. Essentially, that model showed that the Gnutella network is severely bandwidth-limited long before the P2P population reaches a million peers. In each of these previous studies, the conclusions have overlooked the intrinsic bandwidth limits of the underlying topology in the Gnutella network: a Cayley tree (Rains and Sloane 1999) (see Sect. 9.4 for the definition). Trees are known to have lower aggregate bandwidth than higher dimensional topologies, e.g., hypercubes and hypertori. Studies of interconnection topologies in the literature have tended to focus on hardware implementations (see, e.g., Culler et
2 0.74946499 371 high scalability-2008-08-24-A Scalable, Commodity Data Center Network Architecture
Introduction: Looks interesting... Abstract: Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Nonuniform bandwidth among data center nodes complicates application design and limits overall system performance. In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodi
3 0.72142982 1594 high scalability-2014-02-12-Paper: Network Stack Specialization for Performance
Introduction: In the scalability is specialization department here is an interesting paper presented at HotNets '13 on high performance networking: Network Stack Specialization for Performance . The idea is generalizing a service so it fits in the kernel comes at a high performance cost. So move TCP into user space. The result is a web server with ~3.5x the throughput of Nginx "while experiencing low CPU utilization, linear scaling on multicore systems, and saturating current NIC hardware." Here's a good description of the paper published on Layer 9 : Traditionally, servers and OSes have been built to be general purpose. However now we have a high degree of specialization. In fact, in a big web service, you might have thousands of machines dedicated to one function. Therefore, there's scope for specialization. This paper looks at a specific opportunity in that space. Network stacks today are good for high throughput with large transfers, but not small files (which are common in web browsi
4 0.71659797 1316 high scalability-2012-09-04-Changing Architectures: New Datacenter Networks Will Set Your Code and Data Free
Introduction: One consequence of IT standardization and commodification has been Google’s datacenter is the computer view of the world. In that view all compute resources (memory, CPU, storage) are fungible. They are interchangeable and location independent, individual computers lose identity and become just a part of a service. Thwarting that nirvana has been the abysmal performance of commodity datacenter networks which have caused the preference of architectures that favor the collocation of state and behaviour on the same box. MapReduce famously ships code over to storage nodes for just this reason. Change the network and you change the fundamental assumption driving collocation based software architectures. You are then free to store data anywhere and move compute anywhere you wish. The datacenter becomes the computer. On the host side with an x8 slot running at PCI-Express 3.0 speeds able to push 8GB/sec (that’s bytes) of bandwidth in both directions, we have
5 0.69025755 1213 high scalability-2012-03-22-Paper: Revisiting Network I-O APIs: The netmap Framework
Introduction: Here's a really good article in the Communications of the ACM on reducing network packet processing overhead by redesigning the network stack: Revisiting Network I/O APIs: The Netmap Framework by Luigi Rizzo . As commodity networking performance increases operating systems need to keep up or all those CPUs will go to waste. How do they make this happen? Abstract: Today 10-gigabit interfaces are used more and more in datacenters and servers. On these links, packets flow as fast as one every 67.2 nanoseconds, yet modern operating systems can take 10-20 times longer just to move one packet between the wire and the application. We can do much better, not with more powerful hardware but by revising architectural decisions made long ago regarding the design of device drivers and network stacks. The netmap framework is a promising step in this direction. Thanks to a careful design and the engineering of a new packet I/O API, netmap eliminates much unnecessary overhead and moves
6 0.6673733 1105 high scalability-2011-08-25-The Cloud and The Consumer: The Impact on Bandwidth and Broadband
7 0.66479629 1572 high scalability-2014-01-03-Stuff The Internet Says On Scalability For January 3rd, 2014
8 0.66080803 645 high scalability-2009-06-30-Hot New Trend: Linking Clouds Through Cheap IP VPNs Instead of Private Lines
9 0.65286058 1256 high scalability-2012-06-04-OpenFlow-SDN is Not a Silver Bullet for Network Scalability
10 0.64461577 446 high scalability-2008-11-18-Scalability Perspectives #2: Van Jacobson – Content-Centric Networking
11 0.64386427 1478 high scalability-2013-06-19-Paper: MegaPipe: A New Programming Interface for Scalable Network I-O
12 0.61891019 1651 high scalability-2014-05-20-It's Networking. In Space! Or How E.T. Will Phone Home.
14 0.61012501 1419 high scalability-2013-03-07-It's a VM Wasteland - A Near Optimal Packing of VMs to Machines Reduces TCO by 22%
15 0.60941452 1612 high scalability-2014-03-14-Stuff The Internet Says On Scalability For March 14th, 2014
16 0.6016438 1267 high scalability-2012-06-18-The Clever Ways Chrome Hides Latency by Anticipating Your Every Need
17 0.59933615 987 high scalability-2011-02-10-Dispelling the New SSL Myth
18 0.596659 1140 high scalability-2011-11-10-Kill the Telcos Save the Internet - The Unsocial Network
19 0.59365773 1203 high scalability-2012-03-02-Stuff The Internet Says On Scalability For March 2, 2012
20 0.59028947 1195 high scalability-2012-02-17-Stuff The Internet Says On Scalability For February 17, 2012
topicId topicWeight
[(1, 0.049), (2, 0.136), (10, 0.023), (61, 0.059), (77, 0.048), (79, 0.095), (85, 0.047), (94, 0.112), (99, 0.334)]
simIndex simValue blogId blogTitle
1 0.8941564 1128 high scalability-2011-09-30-Gone Fishin'
Introduction: Well, not exactly Fishin', I'll be on vacation starting today and I'll be back in mid October. I won't be posting, so we'll all have a break. Disappointing, I know. If you've ever wanted to write an article for HighScalability, this would be a great time :-) I especially need help on writing Stuff the Internet Says on Scalability as I won't even be reading the Interwebs. Shock! Horror! So if the spirit moves you, please write something. My connectivity in South Africa is unknown, but I will check in and approve articles when I can. See you on down the road...
2 0.85024452 1350 high scalability-2012-10-29-Gone Fishin' Two
Introduction: Well, not exactly Fishin', I'll be on vacation starting today and I'll be back late November. I won't be posting anything new, so we'll all have a break. Disappointing, I know, but fear not, I will be posting some oldies for your re-enjoyment. And If you've ever wanted to write an article for HighScalability, this would be a great time :-) I especially need help on writing Stuff the Internet Says on Scalability as I will be reading the Interwebs on a much reduced schedule. Shock! Horror! Â So if the spirit moves you, please write something. My connectivity in Italy will probably be good, so I will check in and approve articles on a regular basis. Ciao...
same-blog 3 0.83989197 374 high scalability-2008-08-30-Paper: GargantuanComputing—GRIDs and P2P
Introduction: I found the discussion of the available bandwidth of tree vs higher dimensional virtual networks topologies quite, to quote Spock, fascinating : A mathematical analysis by Ritter (2002) (one of the original developers of Napster) presented a detailed numerical argument demonstrating that the Gnutella network could not scale to the capacity of its competitor, the Napster network. Essentially, that model showed that the Gnutella network is severely bandwidth-limited long before the P2P population reaches a million peers. In each of these previous studies, the conclusions have overlooked the intrinsic bandwidth limits of the underlying topology in the Gnutella network: a Cayley tree (Rains and Sloane 1999) (see Sect. 9.4 for the definition). Trees are known to have lower aggregate bandwidth than higher dimensional topologies, e.g., hypercubes and hypertori. Studies of interconnection topologies in the literature have tended to focus on hardware implementations (see, e.g., Culler et
4 0.7733776 1653 high scalability-2014-05-23-Gone Fishin' 2014
Introduction: Well, not exactly Fishin', but I'll be on a month long vacation starting today. I won't be posting new content, so we'll all have a break. Disappointing, I know. If you've ever wanted to write an article for HighScalability this would be a great time :-) I'd be very interested in your experiences with containers vs VMs if you have some thoughts on the subject. So if the spirit moves you, please write something. See you on down the road...
5 0.70347816 478 high scalability-2008-12-29-Paper: Spamalytics: An Empirical Analysisof Spam Marketing Conversion
Introduction: Under the philosophy that the best method to analyse spam is to become a spammer , this absolutely fascinating paper recounts how a team of UC Berkely researchers went under cover to infiltrate a spam network. Part CSI, part Mission Impossible, and part MacGyver, the team hijacked the botnet so that their code was actually part of the dark network itself. Once inside they figured out the architecture and protocols of the botnet and how many sales they were able to tally. Truly elegant work. Two different spam campaigns were run on a Storm botnet network of 75,800 zombie computers. Storm is a peer-to-peer botnet that uses spam to creep its tentacles through the world wide computer network. One of the campains distributed viruses in order to recruit new bots into the network. This is normally accomplished by enticing people to download email attachments. An astonishing one in ten people downloaded the executable and ran it, which means we won't run out of zombies soon. The downloade
6 0.67400444 1367 high scalability-2012-12-05-5 Ways to Make Cloud Failure Not an Option
7 0.65351254 907 high scalability-2010-09-23-Working With Large Data Sets
8 0.64392066 1163 high scalability-2011-12-23-Stuff The Internet Says On Scalability For December 23, 2011
9 0.59765124 384 high scalability-2008-09-16-EE-Appserver Clustering OR Terracota OR Coherence OR something else?
10 0.59721005 1137 high scalability-2011-11-04-Stuff The Internet Says On Scalability For November 4, 2011
11 0.58318096 1301 high scalability-2012-08-08-3 Tips and Tools for Creating Reliable Billion Page View Web Services
12 0.57942438 120 high scalability-2007-10-11-How Flickr Handles Moving You to Another Shard
14 0.55478203 810 high scalability-2010-04-14-Parallel Information Retrieval and Other Search Engine Goodness
15 0.53755414 1223 high scalability-2012-04-06-Stuff The Internet Says On Scalability For April 6, 2012
16 0.53341138 1174 high scalability-2012-01-13-Stuff The Internet Says On Scalability For January 13, 2012
17 0.53331554 1171 high scalability-2012-01-09-The Etsy Saga: From Silos to Happy to Billions of Pageviews a Month
18 0.53161752 72 high scalability-2007-08-22-Wikimedia architecture
20 0.53049922 863 high scalability-2010-07-22-How can we spark the movement of research out of the Ivory Tower and into production?