high_scalability high_scalability-2008 high_scalability-2008-371 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Looks interesting... Abstract: Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Nonuniform bandwidth among data center nodes complicates application design and limits overall system performance. In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodi
sentIndex sentText sentNum sentScore
1 Abstract: Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. [sent-4, score-1.172]
2 The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. [sent-5, score-1.115]
3 Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. [sent-6, score-1.277]
4 Nonuniform bandwidth among data center nodes complicates application design and limits overall system performance. [sent-7, score-0.499]
5 In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. [sent-8, score-1.7]
6 Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today’s higher-end solutions. [sent-9, score-2.038]
7 Our approach requires no modifications to the end host network interface, operating system, or applications; critically, it is fully backward compatible with Ethernet, IP, and TCP. [sent-10, score-0.482]
wordName wordTfidf (topN-words)
[('aggregate', 0.257), ('commodity', 0.228), ('ethernet', 0.216), ('bandwidth', 0.199), ('largely', 0.193), ('tens', 0.187), ('switches', 0.185), ('mpps', 0.181), ('nonuniform', 0.17), ('specialized', 0.165), ('smps', 0.162), ('computers', 0.159), ('ip', 0.154), ('incurring', 0.151), ('progressively', 0.147), ('critically', 0.147), ('complicates', 0.144), ('backward', 0.14), ('interconnected', 0.138), ('clusters', 0.136), ('topologies', 0.131), ('modifications', 0.129), ('appropriately', 0.125), ('consisting', 0.118), ('network', 0.117), ('architected', 0.115), ('thousands', 0.112), ('tremendous', 0.111), ('consists', 0.107), ('argue', 0.105), ('today', 0.105), ('resulting', 0.102), ('elements', 0.101), ('equipment', 0.099), ('compatible', 0.096), ('switching', 0.096), ('unfortunately', 0.096), ('replaced', 0.094), ('contain', 0.092), ('may', 0.09), ('tree', 0.088), ('leverage', 0.085), ('deploying', 0.081), ('among', 0.08), ('tcp', 0.079), ('edge', 0.078), ('typically', 0.078), ('available', 0.077), ('limits', 0.076), ('significant', 0.076)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 371 high scalability-2008-08-24-A Scalable, Commodity Data Center Network Architecture
Introduction: Looks interesting... Abstract: Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Nonuniform bandwidth among data center nodes complicates application design and limits overall system performance. In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodi
2 0.12045136 1316 high scalability-2012-09-04-Changing Architectures: New Datacenter Networks Will Set Your Code and Data Free
Introduction: One consequence of IT standardization and commodification has been Google’s datacenter is the computer view of the world. In that view all compute resources (memory, CPU, storage) are fungible. They are interchangeable and location independent, individual computers lose identity and become just a part of a service. Thwarting that nirvana has been the abysmal performance of commodity datacenter networks which have caused the preference of architectures that favor the collocation of state and behaviour on the same box. MapReduce famously ships code over to storage nodes for just this reason. Change the network and you change the fundamental assumption driving collocation based software architectures. You are then free to store data anywhere and move compute anywhere you wish. The datacenter becomes the computer. On the host side with an x8 slot running at PCI-Express 3.0 speeds able to push 8GB/sec (that’s bytes) of bandwidth in both directions, we have
3 0.12040368 374 high scalability-2008-08-30-Paper: GargantuanComputing—GRIDs and P2P
Introduction: I found the discussion of the available bandwidth of tree vs higher dimensional virtual networks topologies quite, to quote Spock, fascinating : A mathematical analysis by Ritter (2002) (one of the original developers of Napster) presented a detailed numerical argument demonstrating that the Gnutella network could not scale to the capacity of its competitor, the Napster network. Essentially, that model showed that the Gnutella network is severely bandwidth-limited long before the P2P population reaches a million peers. In each of these previous studies, the conclusions have overlooked the intrinsic bandwidth limits of the underlying topology in the Gnutella network: a Cayley tree (Rains and Sloane 1999) (see Sect. 9.4 for the definition). Trees are known to have lower aggregate bandwidth than higher dimensional topologies, e.g., hypercubes and hypertori. Studies of interconnection topologies in the literature have tended to focus on hardware implementations (see, e.g., Culler et
4 0.10929081 645 high scalability-2009-06-30-Hot New Trend: Linking Clouds Through Cheap IP VPNs Instead of Private Lines
Introduction: You might think major Internet companies have a latency, availability, and bandwidth advantage because they can afford expensive dedicated point-to-point private line networks between their data centers. And you would be right. It's a great advantage. Or it at least it was a great advantage. Cost is the great equalizer and companies are now scrambling for ways to cut costs. Many of the most recognizable Internet companies are moving to IP VPNs (Virtual Private Networks) as a much cheaper alternative to private lines. This is a strategy you can effectively use too. This trend has historical precedent in the data center. In the same way leading edge companies moved early to virtualize their data centers, leading edge companies are now virtualizing their networks using IP VPNs to build inexpensive private networks over a shared public network. In kindergarten we learned sharing was polite, it turns out sharing can also save a lot of money in both the data center and on the network. The
5 0.10898954 1213 high scalability-2012-03-22-Paper: Revisiting Network I-O APIs: The netmap Framework
Introduction: Here's a really good article in the Communications of the ACM on reducing network packet processing overhead by redesigning the network stack: Revisiting Network I/O APIs: The Netmap Framework by Luigi Rizzo . As commodity networking performance increases operating systems need to keep up or all those CPUs will go to waste. How do they make this happen? Abstract: Today 10-gigabit interfaces are used more and more in datacenters and servers. On these links, packets flow as fast as one every 67.2 nanoseconds, yet modern operating systems can take 10-20 times longer just to move one packet between the wire and the application. We can do much better, not with more powerful hardware but by revising architectural decisions made long ago regarding the design of device drivers and network stacks. The netmap framework is a promising step in this direction. Thanks to a careful design and the engineering of a new packet I/O API, netmap eliminates much unnecessary overhead and moves
7 0.087679341 1594 high scalability-2014-02-12-Paper: Network Stack Specialization for Performance
8 0.082636334 1118 high scalability-2011-09-19-Big Iron Returns with BigMemory
9 0.082280621 414 high scalability-2008-10-15-Hadoop - A Primer
10 0.08135812 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud
12 0.077981673 101 high scalability-2007-09-27-Product: Ganglia Monitoring System
13 0.077452108 168 high scalability-2007-11-30-Strategy: Efficiently Geo-referencing IPs
14 0.077165648 289 high scalability-2008-03-27-Amazon Announces Static IP Addresses and Multiple Datacenter Operation
15 0.075479597 793 high scalability-2010-03-10-Saying Yes to NoSQL; Going Steady with Cassandra at Digg
16 0.074617602 448 high scalability-2008-11-22-Google Architecture
17 0.073883295 1177 high scalability-2012-01-19-Is it time to get rid of the Linux OS model in the cloud?
18 0.073020481 1075 high scalability-2011-07-07-Myth: Google Uses Server Farms So You Should Too - Resurrection of the Big-Ass Machines
19 0.072916158 1027 high scalability-2011-04-20-Packet Pushers: How to Build a Low Cost Data Center
20 0.07289163 1256 high scalability-2012-06-04-OpenFlow-SDN is Not a Silver Bullet for Network Scalability
topicId topicWeight
[(0, 0.11), (1, 0.048), (2, 0.026), (3, 0.025), (4, -0.073), (5, 0.009), (6, 0.056), (7, 0.002), (8, -0.051), (9, 0.064), (10, 0.006), (11, -0.027), (12, 0.003), (13, 0.006), (14, 0.055), (15, 0.057), (16, 0.024), (17, 0.045), (18, -0.033), (19, -0.021), (20, 0.002), (21, 0.052), (22, -0.017), (23, -0.05), (24, 0.03), (25, 0.031), (26, -0.04), (27, -0.058), (28, -0.002), (29, -0.001), (30, -0.016), (31, -0.011), (32, 0.018), (33, 0.048), (34, -0.008), (35, 0.029), (36, -0.002), (37, 0.032), (38, -0.012), (39, 0.062), (40, 0.023), (41, 0.012), (42, -0.024), (43, 0.022), (44, 0.048), (45, 0.088), (46, -0.056), (47, -0.039), (48, -0.057), (49, -0.037)]
simIndex simValue blogId blogTitle
same-blog 1 0.97406697 371 high scalability-2008-08-24-A Scalable, Commodity Data Center Network Architecture
Introduction: Looks interesting... Abstract: Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Nonuniform bandwidth among data center nodes complicates application design and limits overall system performance. In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodi
2 0.79081839 1316 high scalability-2012-09-04-Changing Architectures: New Datacenter Networks Will Set Your Code and Data Free
Introduction: One consequence of IT standardization and commodification has been Google’s datacenter is the computer view of the world. In that view all compute resources (memory, CPU, storage) are fungible. They are interchangeable and location independent, individual computers lose identity and become just a part of a service. Thwarting that nirvana has been the abysmal performance of commodity datacenter networks which have caused the preference of architectures that favor the collocation of state and behaviour on the same box. MapReduce famously ships code over to storage nodes for just this reason. Change the network and you change the fundamental assumption driving collocation based software architectures. You are then free to store data anywhere and move compute anywhere you wish. The datacenter becomes the computer. On the host side with an x8 slot running at PCI-Express 3.0 speeds able to push 8GB/sec (that’s bytes) of bandwidth in both directions, we have
3 0.78288108 374 high scalability-2008-08-30-Paper: GargantuanComputing—GRIDs and P2P
Introduction: I found the discussion of the available bandwidth of tree vs higher dimensional virtual networks topologies quite, to quote Spock, fascinating : A mathematical analysis by Ritter (2002) (one of the original developers of Napster) presented a detailed numerical argument demonstrating that the Gnutella network could not scale to the capacity of its competitor, the Napster network. Essentially, that model showed that the Gnutella network is severely bandwidth-limited long before the P2P population reaches a million peers. In each of these previous studies, the conclusions have overlooked the intrinsic bandwidth limits of the underlying topology in the Gnutella network: a Cayley tree (Rains and Sloane 1999) (see Sect. 9.4 for the definition). Trees are known to have lower aggregate bandwidth than higher dimensional topologies, e.g., hypercubes and hypertori. Studies of interconnection topologies in the literature have tended to focus on hardware implementations (see, e.g., Culler et
4 0.73840022 1213 high scalability-2012-03-22-Paper: Revisiting Network I-O APIs: The netmap Framework
Introduction: Here's a really good article in the Communications of the ACM on reducing network packet processing overhead by redesigning the network stack: Revisiting Network I/O APIs: The Netmap Framework by Luigi Rizzo . As commodity networking performance increases operating systems need to keep up or all those CPUs will go to waste. How do they make this happen? Abstract: Today 10-gigabit interfaces are used more and more in datacenters and servers. On these links, packets flow as fast as one every 67.2 nanoseconds, yet modern operating systems can take 10-20 times longer just to move one packet between the wire and the application. We can do much better, not with more powerful hardware but by revising architectural decisions made long ago regarding the design of device drivers and network stacks. The netmap framework is a promising step in this direction. Thanks to a careful design and the engineering of a new packet I/O API, netmap eliminates much unnecessary overhead and moves
5 0.71411335 1256 high scalability-2012-06-04-OpenFlow-SDN is Not a Silver Bullet for Network Scalability
Introduction: Ivan Pepelnjak (CCIE#1354 Emeritus) is Chief Technology Advisor at NIL Data Communications , author of numerous webinars and advanced networking books , and a prolific blogger . He’s focusing on data center and cloud networking, network virtualization, and scalable application design. OpenFlow is an interesting emerging networking technology appearing seemingly out of nowhere with much hype and fanfare in March 2011. More than a year later, there are two commercial products based on OpenFlow ( NEC’s Programmable Flow and Nicira’s Network Virtualization Platform ) and probably less than a dozen production-grade implementations (including Google’s G-Scale network and Indiana University’s campus network ). Is this an expected result for an emerging technology or another case of overhyped technology hitting limits imposed by reality? OpenFlow-based solutions have to overcome numerous problems every emerging technology is facing, in OpenFlow’s case ranging from compatibili
6 0.70100206 1594 high scalability-2014-02-12-Paper: Network Stack Specialization for Performance
7 0.69877797 645 high scalability-2009-06-30-Hot New Trend: Linking Clouds Through Cheap IP VPNs Instead of Private Lines
8 0.63976735 453 high scalability-2008-12-01-Breakthrough Web-Tier Solutions with Record-Breaking Performance
9 0.62611681 1140 high scalability-2011-11-10-Kill the Telcos Save the Internet - The Unsocial Network
11 0.62146443 446 high scalability-2008-11-18-Scalability Perspectives #2: Van Jacobson – Content-Centric Networking
12 0.61194217 146 high scalability-2007-11-08-scaling drupal - an open-source infrastructure for high-traffic drupal sites
13 0.60706371 1105 high scalability-2011-08-25-The Cloud and The Consumer: The Impact on Bandwidth and Broadband
14 0.60649592 1338 high scalability-2012-10-11-RAMCube: Exploiting Network Proximity for RAM-Based Key-Value Store
15 0.59715831 1572 high scalability-2014-01-03-Stuff The Internet Says On Scalability For January 3rd, 2014
16 0.59555393 266 high scalability-2008-03-04-Manage Downtime Risk by Connecting Multiple Data Centers into a Secure Virtual LAN
17 0.59363359 403 high scalability-2008-10-06-Paper: Scaling Genome Sequencing - Complete Genomics Technology Overview
18 0.59102875 1651 high scalability-2014-05-20-It's Networking. In Space! Or How E.T. Will Phone Home.
19 0.58899611 463 high scalability-2008-12-09-Rules of Thumb in Data Engineering
20 0.58383036 119 high scalability-2007-10-10-WAN Accelerate Your Way to Lightening Fast Transfers Between Data Centers
topicId topicWeight
[(1, 0.072), (2, 0.108), (5, 0.027), (10, 0.108), (30, 0.015), (61, 0.068), (77, 0.04), (79, 0.126), (85, 0.05), (86, 0.157), (94, 0.13)]
simIndex simValue blogId blogTitle
same-blog 1 0.92408264 371 high scalability-2008-08-24-A Scalable, Commodity Data Center Network Architecture
Introduction: Looks interesting... Abstract: Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Nonuniform bandwidth among data center nodes complicates application design and limits overall system performance. In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodi
Introduction: This is a guest post by Matt Abrams (@abramsm), from Clearspring, discussing how they are able to accurately estimate the cardinality of sets with billions of distinct elements using surprisingly small data structures. Their servers receive well over 100 billion events per month. At Clearspring we like to count things. Counting the number of distinct elements (the cardinality) of a set is challenge when the cardinality of the set is large. To better understand the challenge of determining the cardinality of large sets let's imagine that you have a 16 character ID and you'd like to count the number of distinct IDs that you've seen in your logs. Here is an example: 4f67bfc603106cb2 These 16 characters represent 128 bits. 65K IDs would require 1 megabyte of space. We receive over 3 billion events per day, and each event has an ID. Those IDs require 384,000,000,000 bits or 45 gigabytes of storage. And that is just the space that the ID field requires! To get the
3 0.76494735 1223 high scalability-2012-04-06-Stuff The Internet Says On Scalability For April 6, 2012
Introduction: It's HighScalability Time: Exascale Supercomputer : how IBM plans to understand data from a universe of light; 905 Billion Objects and 650,000 Requests/Second : S3; 64-cores : PostgreSQL shows linear read scalability; Quotable quotes: pkaler : Programming is hard. Scaling is harder. @crucially : As far as I can tell, openstack is what happens when ops people write code. @DEVOPS_BORAT : Goal of sysadmin is replace itself with small shell script. Goal of devops is replace itself with small REST API. @fowlduck : ec2, where dynamic scalability means them running out of instances :( hcarvalhoalves : You know what is amazing? Is that as soon you hit bigger or more general problems, you always face the compromise of "trading X resource for accuracy". Which leads me to believe that software, so far, has only been deterministic by pure accident. Kyle Lemmons : Clearly Go is a superior weapon if the goal is to shoot everyone in the > foot at the same ti
4 0.7595284 1174 high scalability-2012-01-13-Stuff The Internet Says On Scalability For January 13, 2012
Introduction: With a name like HighScalability... it has to be good: Facebook: 1 Billion Users? ; Internet Archive : 500,000 users/day, 6PB of data, 150 billion pages, 1000 queries a second; 6,180 : The number of patents granted to IBM in 2011; 676 : The number of patents granted to Apple in 2011; Live TV is Dead ; Kickstarter: 10,000 successfully funded projects ; $82bn : Apple's cash hoard; 100 Billion Planets : Our home sweet galaxy; Creative: 100-core system-on-a-chip ; 15 million : Lines of code in the Linux kernel; According to Twitter: Justin Bieber > President Obama . Quotable quotes: @florind : I just realized that the Santa story is a classical scalability myth. @juokaz : doesn't always use dating sites, but when he does, he finds out about them on High Scalability http://bit.ly/xYfBmq. True story @niclashulting : The Yahoo! homepage is updated 45,000 times every five minutes." A content strategy is vital. Google’s Data Center Engineer Sh
5 0.75595933 266 high scalability-2008-03-04-Manage Downtime Risk by Connecting Multiple Data Centers into a Secure Virtual LAN
Introduction: Update: VcubeV - an OpenVPN-based solution designed to build and operate a multisourced infrastructure. True high availability requires a presence in multiple data centers. The recent downtime of even a high quality operation like Amazon makes this need all the more clear. Typically only the big boys can afford the complexity of operating in two or more data centers. Cloud computing along with utility billing starts to change that equation, leveling the playing field. Even smaller outfits will be in a position to manage risk by spreading machines amongst EC2, 3tera, Slicehost, Mosso and other providers. The question then becomes: given we aren't Angels, how do we walk amongst the clouds? One fascinating answer is exquisitely explained by Dmitriy Samovskiy in his Linux Journal article titled Building a Multisourced Infrastructure Using OpenVPN . Dmitriy's idea is to create a secure UDP tunnel between different data centers over public internet links so your applicatio
6 0.75419039 976 high scalability-2011-01-20-75% Chance of Scale - Leveraging the New Scaleogenic Environment for Growth
7 0.7478193 289 high scalability-2008-03-27-Amazon Announces Static IP Addresses and Multiple Datacenter Operation
8 0.74658746 1402 high scalability-2013-02-07-Ask HighScalability: Web asset server concept - 3rd party software available?
9 0.74618256 863 high scalability-2010-07-22-How can we spark the movement of research out of the Ivory Tower and into production?
10 0.74569893 1371 high scalability-2012-12-12-Pinterest Cut Costs from $54 to $20 Per Hour by Automatically Shutting Down Systems
11 0.74275434 1018 high scalability-2011-04-07-Paper: A Co-Relational Model of Data for Large Shared Data Banks
12 0.74251515 716 high scalability-2009-10-06-Building a Unique Data Warehouse
13 0.74110687 706 high scalability-2009-09-16-The VeriScale Architecture - Elasticity and efficiency for private clouds
14 0.73884386 1626 high scalability-2014-04-04-Stuff The Internet Says On Scalability For April 4th, 2014
15 0.73795015 736 high scalability-2009-11-04-Damn, Which Database do I Use Now?
16 0.73790354 978 high scalability-2011-01-26-Google Pro Tip: Use Back-of-the-envelope-calculations to Choose the Best Design
17 0.73726416 1382 high scalability-2013-01-07-Analyzing billions of credit card transactions and serving low-latency insights in the cloud
18 0.73698765 1516 high scalability-2013-09-13-Stuff The Internet Says On Scalability For September 13, 2013
19 0.73434973 142 high scalability-2007-11-05-Strategy: Diagonal Scaling - Don't Forget to Scale Out AND Up