high_scalability high_scalability-2008 high_scalability-2008-266 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Update: VcubeV - an OpenVPN-based solution designed to build and operate a multisourced infrastructure. True high availability requires a presence in multiple data centers. The recent downtime of even a high quality operation like Amazon makes this need all the more clear. Typically only the big boys can afford the complexity of operating in two or more data centers. Cloud computing along with utility billing starts to change that equation, leveling the playing field. Even smaller outfits will be in a position to manage risk by spreading machines amongst EC2, 3tera, Slicehost, Mosso and other providers. The question then becomes: given we aren't Angels, how do we walk amongst the clouds? One fascinating answer is exquisitely explained by Dmitriy Samovskiy in his Linux Journal article titled Building a Multisourced Infrastructure Using OpenVPN . Dmitriy's idea is to create a secure UDP tunnel between different data centers over public internet links so your applicatio
sentIndex sentText sentNum sentScore
1 Dmitriy's idea is to create a secure UDP tunnel between different data centers over public internet links so your application sees a flat virtual network even though the machines run in different data centers. [sent-9, score-0.516]
2 Your machines think they are on the same local network when in reality clusters of machines are maintained in multiple locations communicating over the internet. [sent-10, score-0.242]
3 Latency over the public network is higher over the public network than it is with your local Ethernet. [sent-15, score-0.418]
4 Why would I want to create a virtual LAN rather than create a service layer and access services over http? [sent-20, score-0.209]
5 With hosts in 2 different datacenters which are operated by different hosting companies, and assuming no private connectivity (like a private T1 which you pay for and support), the only way for hosts to talk to each other is via public Internet. [sent-22, score-0.639]
6 If the data your services will be exchanging do not need to be protected from external eyes and you don't need to restrict access directly to services from Internet, then service layer and access over http would definitely be easier. [sent-23, score-0.356]
7 However, if you don't want public access to those services, the first thing we did was have a firewall and restrict who can access which service by IP. [sent-24, score-0.56]
8 Whenever we get a new machine, we adjust its firewall and adjust firewalls on all other machines which it's going to communicate with. [sent-27, score-0.688]
9 In our case, we adjusted firewall on LDAP server so a new host could talk to LDAP. [sent-28, score-0.243]
10 With time this peer-to-peer firewall adjusting became too error prone and time consuming as the number of hosts you have goes up. [sent-29, score-0.456]
11 In our example - we set up LDAP replica and now all hosts needed to be reconfigured to failover to replica if the primary was not reachable - which meant a lot of firewall changes on multiple hosts. [sent-31, score-0.514]
12 With more services and more hosts, I was dreading we'd end up with a pile of unmanageable firewall rules. [sent-32, score-0.316]
13 Another aspect missing was data encryption when data pass on public Internet links. [sent-33, score-0.287]
14 We got encryption and once a server has a virtual IP, it's easier to manage firewalls - I choose to manage it on server side (so in our example, on LDAP server). [sent-36, score-0.415]
15 you can assign static virtual IPs to hosts based on ssl key/cert pairs. [sent-46, score-0.287]
16 Yes you can, provided all your hosts that need to connect to VcubeV have physical network connectivity to at least one OpenVPN server (either over LAN or WAN). [sent-50, score-0.443]
17 Primarily it's "don't multisource if an app delivers better value when singlesourced. [sent-60, score-0.191]
18 I personally would not multisource an app that does broadcast or multicast, since it's too low level and imho is likely to have other issues with being deployed in environment which is drastically different from what its designers had in mind. [sent-66, score-0.263]
19 One depends on public Internet links, so latency can't be controlled. [sent-69, score-0.26]
20 If latency is a key aspect of application (trading, for example), don't multisource or at least think twice. [sent-71, score-0.36]
wordName wordTfidf (topN-words)
[('openvpn', 0.543), ('hosts', 0.209), ('multisource', 0.191), ('firewall', 0.189), ('firewalls', 0.156), ('public', 0.153), ('ldap', 0.132), ('adjust', 0.125), ('udp', 0.118), ('tunnels', 0.1), ('restrict', 0.094), ('machines', 0.093), ('dmitriy', 0.092), ('nat', 0.086), ('lan', 0.085), ('clouds', 0.08), ('retry', 0.079), ('virtual', 0.078), ('tcp', 0.076), ('ip', 0.074), ('encryption', 0.073), ('links', 0.072), ('broadcast', 0.072), ('amongst', 0.072), ('services', 0.069), ('connectivity', 0.068), ('internet', 0.064), ('solution', 0.064), ('access', 0.062), ('aspect', 0.061), ('replica', 0.058), ('adjustable', 0.058), ('adjusting', 0.058), ('dreading', 0.058), ('leveling', 0.058), ('pairwise', 0.058), ('titledbuilding', 0.058), ('least', 0.056), ('network', 0.056), ('depends', 0.055), ('beach', 0.054), ('exquisitely', 0.054), ('preemptive', 0.054), ('server', 0.054), ('operate', 0.053), ('latency', 0.052), ('please', 0.052), ('boys', 0.052), ('angels', 0.052), ('multicasting', 0.052)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000002 266 high scalability-2008-03-04-Manage Downtime Risk by Connecting Multiple Data Centers into a Secure Virtual LAN
Introduction: Update: VcubeV - an OpenVPN-based solution designed to build and operate a multisourced infrastructure. True high availability requires a presence in multiple data centers. The recent downtime of even a high quality operation like Amazon makes this need all the more clear. Typically only the big boys can afford the complexity of operating in two or more data centers. Cloud computing along with utility billing starts to change that equation, leveling the playing field. Even smaller outfits will be in a position to manage risk by spreading machines amongst EC2, 3tera, Slicehost, Mosso and other providers. The question then becomes: given we aren't Angels, how do we walk amongst the clouds? One fascinating answer is exquisitely explained by Dmitriy Samovskiy in his Linux Journal article titled Building a Multisourced Infrastructure Using OpenVPN . Dmitriy's idea is to create a secure UDP tunnel between different data centers over public internet links so your applicatio
2 0.14018157 1140 high scalability-2011-11-10-Kill the Telcos Save the Internet - The Unsocial Network
Introduction: Someone is killing the Internet. Since you probably use the Internet everyday you might find this surprising. It almost sounds silly, and the reason is technical, but our crack team of networking experts has examined the patient and made the diagnosis. What did they find? Diagnostic team : the Packet Pushers gang ( Greg Ferro , Jan Zorz , Ivan Pepelnjak ) in the podcast How We Are Killing the Internet . Diagnosis : invasive tunnelation. ( tubes anyone? ) Prognosis : even Dr. House might not be able to help. Cure : go back to what the Internet was; kill the tunnels; route IPv4 and IPv6; have public addresses on everything; disrupt the telcos. This is a classic story in a strange setting--the network--but the themes are universal: centralization vs. decentralization (that's where the telcos obviously come in), good vs. evil, order vs. disorder, tyranny vs. freedom, change vs. stasis, simplicity vs. complexity. And it's all being carried out on battlefield few get
Introduction: Ivan Pepelnjak, in his short and information packed REDUNDANT DATA CENTER INTERNET CONNECTIVIT Y video, shows why networking as played at the highest levels is something you want to leave to professionals, like a large animal country vetenarian delivering a stuck foal at 2AM on a dark and stormy night. There are always a lot questions about the black art of building redundant datacenter networks and there's a shortage of accessible explanations. What I liked about Ivan's video is how effortlessly he explains the issues and tradeoffs you can expect in designing your own solution, as well as giving creative solutions to those problems. A lot of years of experience are boiled down to a 17 minute video. Ivan begins by showing what a canonical fully redundant datacenter would look like: It's like an ark where everything goes two by two. You have two datacenters, each datacenter has redundant core switches, redundant servers, redundant disk arrays, redundant links between d
Introduction: All in all this is still my favorite post and I still think it's an accurate vision of a future. Not everyone agrees, but I guess we'll see..."But it is not complicated. [There's] just a lot of it." \--Richard Feynmanon how the immense variety of the world arises from simple rules.Contents:Have We Reached the End of Scaling?Applications Become Black Boxes Using Markets to Scale and Control CostsLet's Welcome our Neo-Feudal OverlordsThe Economic Argument for the Ambient CloudWhat Will Kill the Cloud?The Amazing Collective Compute Power of the Ambient CloudUsing the Ambient Cloud as an Application RuntimeApplications as Virtual StatesConclusionWe have not yet begun to scale. The world is still fundamentally disconnected and for all our wisdom we are still in the earliest days of learning how to build truly large planet-scaling applications.Today 350 million users on Facebook is a lot of users and five million followers on Twitter is a lot of followers. This may seem like a lot now, but c
Introduction: "But it is not complicated. [There's] just a lot of it." \--Richard Feynmanon how the immense variety of the world arises from simple rules.Contents:Have We Reached the End of Scaling?Applications Become Black Boxes Using Markets to Scale and Control CostsLet's Welcome our Neo-Feudal OverlordsThe Economic Argument for the Ambient CloudWhat Will Kill the Cloud?The Amazing Collective Compute Power of the Ambient CloudUsing the Ambient Cloud as an Application RuntimeApplications as Virtual StatesConclusionWe have not yet begun to scale. The world is still fundamentally disconnected and for all our wisdom we are still in the earliest days of learning how to build truly large planet-scaling applications.Today 350 million users on Facebook is a lot of users and five million followers on Twitter is a lot of followers. This may seem like a lot now, but consider we have no planet wide applications yet. None.Tomorrow the numbers foreshadow a newCambrian explosionof connectivity that will look as
6 0.12809287 228 high scalability-2008-01-28-Product: ISPMan Centralized ISP Management System
7 0.12539241 645 high scalability-2009-06-30-Hot New Trend: Linking Clouds Through Cheap IP VPNs Instead of Private Lines
8 0.12379412 1240 high scalability-2012-05-07-Startups are Creating a New System of the World for IT
9 0.12260756 853 high scalability-2010-07-08-Cloud AWS Infrastructure vs. Physical Infrastructure
10 0.12226435 289 high scalability-2008-03-27-Amazon Announces Static IP Addresses and Multiple Datacenter Operation
11 0.11711114 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
12 0.11692072 444 high scalability-2008-11-14-Private-Public Cloud
13 0.1161292 661 high scalability-2009-07-25-Latency is Everywhere and it Costs You Sales - How to Crush it
14 0.10982706 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
17 0.10773578 881 high scalability-2010-08-16-Scaling an AWS infrastructure - Tools and Patterns
18 0.10600642 96 high scalability-2007-09-18-Amazon Architecture
19 0.1041665 1359 high scalability-2012-11-15-Gone Fishin': Justin.Tv's Live Video Broadcasting Architecture
20 0.10373896 796 high scalability-2010-03-16-Justin.tv's Live Video Broadcasting Architecture
topicId topicWeight
[(0, 0.214), (1, 0.076), (2, 0.034), (3, 0.008), (4, -0.066), (5, -0.068), (6, 0.061), (7, -0.055), (8, -0.036), (9, -0.006), (10, -0.024), (11, 0.039), (12, -0.032), (13, -0.011), (14, 0.053), (15, 0.026), (16, 0.048), (17, 0.024), (18, -0.024), (19, -0.026), (20, -0.005), (21, 0.016), (22, -0.002), (23, 0.006), (24, 0.04), (25, 0.082), (26, 0.013), (27, -0.028), (28, -0.058), (29, -0.004), (30, -0.005), (31, -0.029), (32, -0.009), (33, 0.047), (34, 0.045), (35, 0.052), (36, 0.059), (37, 0.045), (38, -0.02), (39, 0.027), (40, 0.013), (41, 0.036), (42, -0.045), (43, -0.011), (44, 0.022), (45, 0.027), (46, 0.005), (47, -0.012), (48, -0.034), (49, -0.0)]
simIndex simValue blogId blogTitle
same-blog 1 0.95757759 266 high scalability-2008-03-04-Manage Downtime Risk by Connecting Multiple Data Centers into a Secure Virtual LAN
Introduction: Update: VcubeV - an OpenVPN-based solution designed to build and operate a multisourced infrastructure. True high availability requires a presence in multiple data centers. The recent downtime of even a high quality operation like Amazon makes this need all the more clear. Typically only the big boys can afford the complexity of operating in two or more data centers. Cloud computing along with utility billing starts to change that equation, leveling the playing field. Even smaller outfits will be in a position to manage risk by spreading machines amongst EC2, 3tera, Slicehost, Mosso and other providers. The question then becomes: given we aren't Angels, how do we walk amongst the clouds? One fascinating answer is exquisitely explained by Dmitriy Samovskiy in his Linux Journal article titled Building a Multisourced Infrastructure Using OpenVPN . Dmitriy's idea is to create a secure UDP tunnel between different data centers over public internet links so your applicatio
2 0.84031504 1140 high scalability-2011-11-10-Kill the Telcos Save the Internet - The Unsocial Network
Introduction: Someone is killing the Internet. Since you probably use the Internet everyday you might find this surprising. It almost sounds silly, and the reason is technical, but our crack team of networking experts has examined the patient and made the diagnosis. What did they find? Diagnostic team : the Packet Pushers gang ( Greg Ferro , Jan Zorz , Ivan Pepelnjak ) in the podcast How We Are Killing the Internet . Diagnosis : invasive tunnelation. ( tubes anyone? ) Prognosis : even Dr. House might not be able to help. Cure : go back to what the Internet was; kill the tunnels; route IPv4 and IPv6; have public addresses on everything; disrupt the telcos. This is a classic story in a strange setting--the network--but the themes are universal: centralization vs. decentralization (that's where the telcos obviously come in), good vs. evil, order vs. disorder, tyranny vs. freedom, change vs. stasis, simplicity vs. complexity. And it's all being carried out on battlefield few get
3 0.82532668 645 high scalability-2009-06-30-Hot New Trend: Linking Clouds Through Cheap IP VPNs Instead of Private Lines
Introduction: You might think major Internet companies have a latency, availability, and bandwidth advantage because they can afford expensive dedicated point-to-point private line networks between their data centers. And you would be right. It's a great advantage. Or it at least it was a great advantage. Cost is the great equalizer and companies are now scrambling for ways to cut costs. Many of the most recognizable Internet companies are moving to IP VPNs (Virtual Private Networks) as a much cheaper alternative to private lines. This is a strategy you can effectively use too. This trend has historical precedent in the data center. In the same way leading edge companies moved early to virtualize their data centers, leading edge companies are now virtualizing their networks using IP VPNs to build inexpensive private networks over a shared public network. In kindergarten we learned sharing was polite, it turns out sharing can also save a lot of money in both the data center and on the network. The
Introduction: Ivan Pepelnjak, in his short and information packed REDUNDANT DATA CENTER INTERNET CONNECTIVIT Y video, shows why networking as played at the highest levels is something you want to leave to professionals, like a large animal country vetenarian delivering a stuck foal at 2AM on a dark and stormy night. There are always a lot questions about the black art of building redundant datacenter networks and there's a shortage of accessible explanations. What I liked about Ivan's video is how effortlessly he explains the issues and tradeoffs you can expect in designing your own solution, as well as giving creative solutions to those problems. A lot of years of experience are boiled down to a 17 minute video. Ivan begins by showing what a canonical fully redundant datacenter would look like: It's like an ark where everything goes two by two. You have two datacenters, each datacenter has redundant core switches, redundant servers, redundant disk arrays, redundant links between d
5 0.79711992 1256 high scalability-2012-06-04-OpenFlow-SDN is Not a Silver Bullet for Network Scalability
Introduction: Ivan Pepelnjak (CCIE#1354 Emeritus) is Chief Technology Advisor at NIL Data Communications , author of numerous webinars and advanced networking books , and a prolific blogger . He’s focusing on data center and cloud networking, network virtualization, and scalable application design. OpenFlow is an interesting emerging networking technology appearing seemingly out of nowhere with much hype and fanfare in March 2011. More than a year later, there are two commercial products based on OpenFlow ( NEC’s Programmable Flow and Nicira’s Network Virtualization Platform ) and probably less than a dozen production-grade implementations (including Google’s G-Scale network and Indiana University’s campus network ). Is this an expected result for an emerging technology or another case of overhyped technology hitting limits imposed by reality? OpenFlow-based solutions have to overcome numerous problems every emerging technology is facing, in OpenFlow’s case ranging from compatibili
7 0.76212597 686 high scalability-2009-08-20-VMware to bridge a DMZ.
8 0.75990486 960 high scalability-2010-12-20-Netflix: Use Less Chatty Protocols in the Cloud - Plus 26 Fixes
9 0.75051439 446 high scalability-2008-11-18-Scalability Perspectives #2: Van Jacobson – Content-Centric Networking
10 0.74514639 1105 high scalability-2011-08-25-The Cloud and The Consumer: The Impact on Bandwidth and Broadband
11 0.74413061 987 high scalability-2011-02-10-Dispelling the New SSL Myth
12 0.74128598 1316 high scalability-2012-09-04-Changing Architectures: New Datacenter Networks Will Set Your Code and Data Free
13 0.73993313 1157 high scalability-2011-12-14-Virtualization and Cloud Computing is Changing the Network to East-West Routing
14 0.73816222 1091 high scalability-2011-08-02-How Will DIDO Wireless Networking Change Everything?
15 0.73592508 1077 high scalability-2011-07-11-ATMCash Exploits Virtualization for Security - Immutability and Reversion
16 0.72396266 1381 high scalability-2013-01-04-Stuff The Internet Says On Scalability For January 4, 2013
17 0.71951526 1051 high scalability-2011-06-01-Why is your network so slow? Your switch should tell you.
19 0.71520114 1651 high scalability-2014-05-20-It's Networking. In Space! Or How E.T. Will Phone Home.
20 0.71236616 119 high scalability-2007-10-10-WAN Accelerate Your Way to Lightening Fast Transfers Between Data Centers
topicId topicWeight
[(1, 0.119), (2, 0.165), (10, 0.036), (18, 0.014), (30, 0.041), (47, 0.022), (61, 0.073), (77, 0.011), (79, 0.095), (85, 0.028), (94, 0.313)]
simIndex simValue blogId blogTitle
1 0.98381293 1601 high scalability-2014-02-25-Peter Norvig's 9 Master Steps to Improving a Program
Introduction: Inspired by a xkcd comic , Peter Norvig , Director of Research at Google and all around interesting and nice guy, has created an above par code kata involving a regex program that demonstrates the core inner loop of many successful systems profiled on HighScalability. The original code is at xkcd 1313: Regex Golf , which comes up with an algorithm to find a short regex that matches the winners and not the losers from two arbitrary lists. The Python code is readable, the process is TDDish, and the problem, which sounds simple, but soon explodes into regex weirdness, as does most regex code. If you find regular expressions confusing you'll definitely benefit from Peter's deliberate strategy for finding a regex. The post demonstrating the iterated improvement of the program is at xkcd 1313: Regex Golf (Part 2: Infinite Problems) . As with most first solutions it wasn't optimal. To improve the program Peter recommends the following steps: Profiling : Figure out wher
Introduction: This is a guest post by Matt Abrams (@abramsm), from Clearspring, discussing how they are able to accurately estimate the cardinality of sets with billions of distinct elements using surprisingly small data structures. Their servers receive well over 100 billion events per month. At Clearspring we like to count things. Counting the number of distinct elements (the cardinality) of a set is challenge when the cardinality of the set is large. To better understand the challenge of determining the cardinality of large sets let's imagine that you have a 16 character ID and you'd like to count the number of distinct IDs that you've seen in your logs. Here is an example: 4f67bfc603106cb2 These 16 characters represent 128 bits. 65K IDs would require 1 megabyte of space. We receive over 3 billion events per day, and each event has an ID. Those IDs require 384,000,000,000 bits or 45 gigabytes of storage. And that is just the space that the ID field requires! To get the
3 0.96724379 115 high scalability-2007-10-07-Using ThreadLocal to pass context information around in web applications
Introduction: Hi, In java web servers, each http request is handled by a thread in thread pool. So for a Servlet handling the request, a thread is assigned. It is tempting (and very convinient) to keep context information in the threadlocal variable. I recently had a requirement where we need to assign logged in user id and timestamp to request sent to web services. Because we already had the code in place, it was extremely difficult to change the method signatures to pass user id everywhere. The solution I thought is class ReferenceIdGenerator { public static setReferenceId(String login) { threadLocal.set(login + System.currentMillis()); } public static String getReferenceId() { return threadLocal.get(); } private static ThreadLocal threadLocal = new ThreadLocal(); } class MySevlet { void service(.....) { HttpSession session = request.getSession(false); String userId = session.get("userId"); ReferenceIdGenerator.setRefernceId(userId
4 0.95830691 605 high scalability-2009-05-22-Distributed content system with bandwidth balancing
Introduction: I am looking for a way to distribute files over servers in different physical locations. My main concern is that I have bandwidth limitations on each location, and wish to spread the bandwidth load evenly. Atm. I just have 1:1 copies of the files on all servers, and have the application pick a random server to serve the file as a temp fix... It's a small video streaming service. I want to spoonfeed the stream to the client with a max bandwidth output, and support seek. At present I use php to limit the network stream, and read the file at a given offset sendt as a get parameter from the player for seek. It's psuedo streaming, but it works. I have been looking at MogileFS, which would solve the storage part. With MogileFS I can make use of my current php solution as it supports lighttpd and apache (with mod_rewrite or similar). However I don't see how I can apply MogileFS to check for bandwidth % usage? Any reccomendations for how I can solve this?
5 0.94639033 1305 high scalability-2012-08-16-Paper: A Provably Correct Scalable Concurrent Skip List
Introduction: In MemSQL Architecture we learned one of the core strategies MemSQL uses to achieve their need for speed is lock-free skip lists. Skip lists are used to efficiently handle range queries. Making the skip-lists lock-free helps eliminate contention and make writes fast. If this all sounds a little pie-in-the-sky then here's a very good paper on the subject that might help make it clearer: A Provably Correct Scalable Concurrent Skip List . From the abstract: We propose a new concurrent skip list algorithm distinguished by a combination of simplicity and scalability. The algorithm employs optimistic synchronization, searching without acquiring locks, followed by short lock-based validation before adding or removing nodes. It also logically removes an item before physically unlinking it. Unlike some other concurrent skip list algorithms, this algorithm preserves the skiplist properties at all times, which facilitates reasoning about its correctness. Experimental evidence shows that
7 0.93316293 1412 high scalability-2013-02-25-SongPop Scales to 1 Million Active Users on GAE, Showing PaaS is not Passé
8 0.93233603 1025 high scalability-2011-04-16-The NewSQL Market Breakdown
9 0.92592323 834 high scalability-2010-06-01-Web Speed Can Push You Off of Google Search Rankings! What Can You Do?
10 0.91906047 559 high scalability-2009-04-07-Six Lessons Learned Deploying a Large-scale Infrastructure in Amazon EC2
11 0.9173342 1084 high scalability-2011-07-22-Stuff The Internet Says On Scalability For July 22, 2011
12 0.91371071 1223 high scalability-2012-04-06-Stuff The Internet Says On Scalability For April 6, 2012
13 0.90902716 970 high scalability-2011-01-06-BankSimple Mini-Architecture - Using a Next Generation Toolchain
same-blog 14 0.90614867 266 high scalability-2008-03-04-Manage Downtime Risk by Connecting Multiple Data Centers into a Secure Virtual LAN
15 0.90587425 78 high scalability-2007-09-01-2 tier switch selection for colocation
16 0.90202665 827 high scalability-2010-05-14-Hot Scalability Links for May 14, 2010
17 0.89895964 1174 high scalability-2012-01-13-Stuff The Internet Says On Scalability For January 13, 2012
18 0.89674425 863 high scalability-2010-07-22-How can we spark the movement of research out of the Ivory Tower and into production?
19 0.88754117 241 high scalability-2008-02-05-SLA monitoring
20 0.8829906 976 high scalability-2011-01-20-75% Chance of Scale - Leveraging the New Scaleogenic Environment for Growth