high_scalability high_scalability-2010 high_scalability-2010-773 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: CNBC, like many large web sites, relied on a CDN for content delivery. Recently, we started looking to see if we could improve this model. Our criteria was: - improve response time - have better control over traffic (real time reporting, change management and alerting) - better utilize internal datacenters and their infrastructure - shield users from any troubles at the origin infrastructure - cost out After researching the market, we turned to two vendors: Dyn (Dynamic Network Services) and aiScaler . We' have had about a year worth of experience with aiScaler (search for "CNBC" to see my previous post ), but Dyn was a new vendor for us. We started building our relationship at Velocity conference in the summer of 2009. Dyn has recently started offering a geo-aware DNS load balancing solution, using Anycast and the distributed nature of their DNS presence to enable a key component of what we were trying to achieve: steer users to geographically closest origin
sentIndex sentText sentNum sentScore
1 Recently, we started looking to see if we could improve this model. [sent-2, score-0.062]
2 We started building our relationship at Velocity conference in the summer of 2009. [sent-5, score-0.062]
3 Dyn has recently started offering a geo-aware DNS load balancing solution, using Anycast and the distributed nature of their DNS presence to enable a key component of what we were trying to achieve: steer users to geographically closest origin point. [sent-6, score-0.704]
4 In principle, to direct a user to geographically closest origin point, one has to have an idea as to the user's location. [sent-10, score-0.636]
5 A very different, albeit less granular, way to accomplish the same is to use Internet routing (BGP protocol) to advertise routes to the same IP addresses from multiple points of presence. [sent-13, score-0.126]
6 Each cluster is positioned at a major peering point : US East Coast, West Coast, one in EU and one in Asia. [sent-21, score-0.078]
7 Through magic of routing, users in Asia will have their DNS requests come to one's DNS servers in Asia, EU to EU and so on. [sent-24, score-0.065]
8 It is easy to see how this implied knowledge of requestor's geo location can now be used to direct their traffic in a certain, location-specific way. [sent-25, score-0.533]
9 com , his/her browser requests DNS resolution for www. [sent-29, score-0.065]
10 The DNS request will naturally flow to the closest Dyn DNS cluster. [sent-32, score-0.125]
11 The DNS servers at the said cluster have implied awareness of their location. [sent-33, score-0.097]
12 Based on that, DNS server infers that the requests are also coming from users in the same geo area and based on that and set of rules we configure, it directs requesting user to proper origin point for www. [sent-34, score-0.719]
13 For origin points, we've chosen our own datacenters, each with multiple gigabits of egress capacity, at East and West coasts of US. [sent-37, score-0.444]
14 Just 4 common 1RU blade servers, 2 at each location, are all we needed to deliver all of the traffic to our US user base. [sent-39, score-0.247]
15 The latest iteration of aiScaler product, v6, has been tested to in excess of 250,000 RPS per common HP DL360 server. [sent-40, score-0.092]
16 com peak at over 3000 RPS, so we have a lot of excess capacity for any possible traffic spikes. [sent-45, score-0.28]
17 Here're our results so far: - we were able to shave about 1 sec (about 30%! [sent-46, score-0.115]
18 - our CDN traffic has seen about 80% reduction as well - complete with 80% reduction in CDN fees - we're now better utilizing our own datacenters capacity - we now have ability to instantaneously affect our caching rules or load distribution. [sent-49, score-0.428]
19 Summary: Dyn's Dynamic and Geo-aware DNS load balancing solution and aiScaler 's proven caching software have enabled a top-tier financial news website to shave 30% off response time, save money, have better, real-time monitoring, reporting and alerting setup. [sent-51, score-0.412]
20 And lastly: the above doesn't constitute, in any way, shape or form, an endorsement of the mentioned products, vendors and/or solutions, by CNBC, NBC, GE or any of its subsidiaries. [sent-53, score-0.124]
wordName wordTfidf (topN-words)
[('dns', 0.324), ('origin', 0.323), ('eu', 0.23), ('dyn', 0.199), ('coast', 0.195), ('cnbc', 0.192), ('traffic', 0.188), ('rps', 0.156), ('asia', 0.152), ('withaiscaler', 0.141), ('cms', 0.136), ('alerting', 0.131), ('closest', 0.125), ('west', 0.117), ('shave', 0.115), ('east', 0.115), ('anycast', 0.111), ('geo', 0.107), ('reporting', 0.102), ('bgp', 0.102), ('cdn', 0.1), ('implied', 0.097), ('excess', 0.092), ('send', 0.087), ('rules', 0.087), ('location', 0.082), ('datacenters', 0.082), ('point', 0.078), ('reduction', 0.071), ('geographically', 0.07), ('requests', 0.065), ('vendors', 0.064), ('coasts', 0.064), ('requestor', 0.064), ('albeit', 0.064), ('ge', 0.064), ('balancing', 0.064), ('started', 0.062), ('addresses', 0.062), ('nbc', 0.06), ('uucp', 0.06), ('steer', 0.06), ('endorsement', 0.06), ('direct', 0.059), ('user', 0.059), ('russia', 0.057), ('busier', 0.057), ('egress', 0.057), ('valve', 0.057), ('distro', 0.055)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 773 high scalability-2010-02-06-GEO-aware traffic load balancing and caching at CNBC.com
Introduction: CNBC, like many large web sites, relied on a CDN for content delivery. Recently, we started looking to see if we could improve this model. Our criteria was: - improve response time - have better control over traffic (real time reporting, change management and alerting) - better utilize internal datacenters and their infrastructure - shield users from any troubles at the origin infrastructure - cost out After researching the market, we turned to two vendors: Dyn (Dynamic Network Services) and aiScaler . We' have had about a year worth of experience with aiScaler (search for "CNBC" to see my previous post ), but Dyn was a new vendor for us. We started building our relationship at Velocity conference in the summer of 2009. Dyn has recently started offering a geo-aware DNS load balancing solution, using Anycast and the distributed nature of their DNS presence to enable a key component of what we were trying to achieve: steer users to geographically closest origin
2 0.27482149 573 high scalability-2009-04-16-Serving 250M quotes-day at CNBC.com with aiCache
Introduction: As traffic to cnbc.com continued to grow, we found ourselves in an all-too-familiar situation where one feels that a BIG change in how things are done was in order, the status-quo was a road to nowhere. The spending on HW, amount of space and power required to host additional servers, less-than-stellar response times, having to resort to frequent "micro"-caching and similar tricks to try to improve code performance - all of these were surfacing in plain sight, hard to ignore. While code base could clearly be improved, the limited Dev resources and having to innovate to stay competitive always limits ability to go about refactoring. So how can one go about addressing performance and other needs without a full blown effort across the entire team ? For us, the answer was aiCache - a Web caching and application acceleration product (aicache.com). The idea behind caching is simple - handle the requests before they ever hit your regular Apache<->JK
3 0.25104761 1517 high scalability-2013-09-16-The Hidden DNS Tax - Cascading Timeouts and Errors
Introduction: This is a guest post by Nick Burling , VP of Product Management of Bluestripe . Readers of High Scalability know are well versed in performance optimization techniques. Reverse proxies, Varnish, Redis — you hear about them daily. But what you may not realize is that one of the oldest technologies in your stack can be one of your biggest bottlenecks: DNS. People don't spend a lot of time thinking about DNS. It's not sexy. It's an infrastructure service, and it's just supposed to work. At BlueStripe, we work with many teams running applications that support millions of web requests a day. We keep seeing DNS delays and errors that the platform operations team never knows about. It's so common we've start calling it the Hidden DNS Tax . What is the Hidden DNS Tax? The Hidden DNS Tax is a hard-to-see performance hit your users take from DNS timeouts and errors in your back-end architecture. We've seen it bring down the main web application for a Fortune 10 company.
Introduction: This is a guest post by Dave Hagler Systems Architect at AOL. The AOL homepages receive more than 8 million visitors per day . That’s more daily viewers than Good Morning America or the Today Show on television. Over a billion page views are served each month. AOL.com has been a major internet destination since 1996, and still has a strong following of loyal users. The architecture for AOL.com is in it’s 5th generation . It has essentially been rebuilt from scratch 5 times over two decades. The current architecture was designed 6 years ago. Pieces have been upgraded and new components have been added along the way, but the overall design remains largely intact. The code, tools, development and deployment processes are highly tuned over 6 years of continual improvement, making the AOL.com architecture battle tested and very stable. The engineering team is made up of developers, testers, and operations and totals around 25 people . The majority are in Dulles, Virginia
5 0.14634924 289 high scalability-2008-03-27-Amazon Announces Static IP Addresses and Multiple Datacenter Operation
Introduction: Amazon is fixing two of their major problems: no static IP addresses and single datacenter operation. By adding these two new features developers can finally build a no apology system on Amazon. Before you always had to throw in an apology or two. No, we don't have low failover times because of the silly DNS games and unexceptionable DNS update and propagation times and no, we don't operate in more than one datacenter. No more. Now Amazon is adding Elastic IP Addresses and Availability Zones . Elastic IP addresses are far better than normal IP addresses because they are both in tight with Jessica Alba and they are: Static IP addresses designed for dynamic cloud computing. An Elastic IP address is associated with your account, not a particular instance, and you control that address until you choose to explicitly release it. Unlike traditional static IP addresses, however, Elastic IP addresses allow you to mask instance or availability zone failures by programmatica
6 0.1436508 290 high scalability-2008-03-28-How to Get DNS Names of a Web Server
7 0.12481849 576 high scalability-2009-04-21-What CDN would you recommend?
8 0.12039539 796 high scalability-2010-03-16-Justin.tv's Live Video Broadcasting Architecture
9 0.1203413 1359 high scalability-2012-11-15-Gone Fishin': Justin.Tv's Live Video Broadcasting Architecture
10 0.11591496 1557 high scalability-2013-12-02-Evolution of Bazaarvoice’s Architecture to 500M Unique Users Per Month
11 0.1118378 1289 high scalability-2012-07-23-State of the CDN: More Traffic, Stable Prices, More Products, Profits - Not So Much
12 0.10836665 138 high scalability-2007-10-30-Feedblendr Architecture - Using EC2 to Scale
13 0.10335331 382 high scalability-2008-09-09-Content Delivery Networks (CDN) – a comprehensive list of providers
14 0.10329983 788 high scalability-2010-03-04-How MySpace Tested Their Live Site with 1 Million Concurrent Users
15 0.10319521 1335 high scalability-2012-10-08-How UltraDNS Handles Hundreds of Thousands of Zones and Tens of Millions of Records
16 0.09878131 1331 high scalability-2012-10-02-An Epic TripAdvisor Update: Why Not Run on the Cloud? The Grand Experiment.
17 0.098483346 1123 high scalability-2011-09-23-The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month…
20 0.096145809 220 high scalability-2008-01-22-The high scalability community
topicId topicWeight
[(0, 0.17), (1, 0.044), (2, -0.036), (3, -0.088), (4, -0.048), (5, -0.091), (6, 0.012), (7, -0.025), (8, -0.02), (9, 0.008), (10, -0.019), (11, 0.0), (12, -0.028), (13, -0.054), (14, 0.019), (15, 0.042), (16, 0.074), (17, 0.033), (18, -0.018), (19, -0.069), (20, 0.003), (21, 0.051), (22, 0.019), (23, -0.026), (24, 0.012), (25, 0.029), (26, -0.062), (27, 0.02), (28, -0.029), (29, -0.032), (30, -0.001), (31, 0.041), (32, -0.01), (33, 0.024), (34, 0.042), (35, 0.017), (36, -0.002), (37, -0.029), (38, -0.031), (39, 0.002), (40, 0.025), (41, 0.03), (42, 0.021), (43, -0.001), (44, 0.006), (45, 0.074), (46, -0.015), (47, 0.035), (48, 0.003), (49, 0.029)]
simIndex simValue blogId blogTitle
same-blog 1 0.97074753 773 high scalability-2010-02-06-GEO-aware traffic load balancing and caching at CNBC.com
Introduction: CNBC, like many large web sites, relied on a CDN for content delivery. Recently, we started looking to see if we could improve this model. Our criteria was: - improve response time - have better control over traffic (real time reporting, change management and alerting) - better utilize internal datacenters and their infrastructure - shield users from any troubles at the origin infrastructure - cost out After researching the market, we turned to two vendors: Dyn (Dynamic Network Services) and aiScaler . We' have had about a year worth of experience with aiScaler (search for "CNBC" to see my previous post ), but Dyn was a new vendor for us. We started building our relationship at Velocity conference in the summer of 2009. Dyn has recently started offering a geo-aware DNS load balancing solution, using Anycast and the distributed nature of their DNS presence to enable a key component of what we were trying to achieve: steer users to geographically closest origin
2 0.77709275 270 high scalability-2008-03-08-DNS-Record TTL on worst case scenarios
Introduction: i didnt find a nearly good solution for this problem yet: imagine, you're responsible for a small CDN network (static images), with two different datacenter. the balancing for the two DC is done with a anycast nameservice (a nameserver in every DC, user gets on nearest location). so, one of the scenario is that one of the datacenters goes down completly. you can do a monitoring on the nameserver and only route to the dc which is still alive, no problem. But what about the TTL from the DNS-Records? Tiny TTLs like 2 min. are often ignored by several ISP (e.g. AOL). so, the client doesn't get the IP from the other Datacenter. what could be a solution in this scenario?
3 0.77569306 1517 high scalability-2013-09-16-The Hidden DNS Tax - Cascading Timeouts and Errors
Introduction: This is a guest post by Nick Burling , VP of Product Management of Bluestripe . Readers of High Scalability know are well versed in performance optimization techniques. Reverse proxies, Varnish, Redis — you hear about them daily. But what you may not realize is that one of the oldest technologies in your stack can be one of your biggest bottlenecks: DNS. People don't spend a lot of time thinking about DNS. It's not sexy. It's an infrastructure service, and it's just supposed to work. At BlueStripe, we work with many teams running applications that support millions of web requests a day. We keep seeing DNS delays and errors that the platform operations team never knows about. It's so common we've start calling it the Hidden DNS Tax . What is the Hidden DNS Tax? The Hidden DNS Tax is a hard-to-see performance hit your users take from DNS timeouts and errors in your back-end architecture. We've seen it bring down the main web application for a Fortune 10 company.
4 0.76012099 573 high scalability-2009-04-16-Serving 250M quotes-day at CNBC.com with aiCache
Introduction: As traffic to cnbc.com continued to grow, we found ourselves in an all-too-familiar situation where one feels that a BIG change in how things are done was in order, the status-quo was a road to nowhere. The spending on HW, amount of space and power required to host additional servers, less-than-stellar response times, having to resort to frequent "micro"-caching and similar tricks to try to improve code performance - all of these were surfacing in plain sight, hard to ignore. While code base could clearly be improved, the limited Dev resources and having to innovate to stay competitive always limits ability to go about refactoring. So how can one go about addressing performance and other needs without a full blown effort across the entire team ? For us, the answer was aiCache - a Web caching and application acceleration product (aicache.com). The idea behind caching is simple - handle the requests before they ever hit your regular Apache<->JK
Introduction: This is a guest post by Barry Abrahamson , Chief Systems Wrangler at Automattic, and Nginx's Coufounder Andrew Alexeev. WordPress.com serves more than 33 million sites attracting over 339 million people and 3.4 billion pages each month. Since April 2008, WordPress.com has experienced about 4.4 times growth in page views. WordPress.com VIP hosts many popular sites including CNN’s Political Ticker, NFL, Time Inc’s The Page, People Magazine’s Style Watch, corporate blogs for Flickr and KROQ, and many more. Automattic operates two thousand servers in twelve, globally distributed, data centers. WordPress.com customer data is instantly replicated between different locations to provide an extremely reliable and fast web experience for hundreds of millions of visitors. Problem WordPress.com, which began in 2005, started on shared hosting, much like all of the WordPress.org sites. It was soon moved to a single dedicated server and then to two servers. In late 2005, WordPress.com
6 0.72376639 1401 high scalability-2013-02-06-Super Bowl Advertisers Ready for the Traffic? Nope..It's Lights Out.
8 0.69875675 1335 high scalability-2012-10-08-How UltraDNS Handles Hundreds of Thousands of Zones and Tens of Millions of Records
9 0.68583131 800 high scalability-2010-03-26-Strategy: Caching 404s Saved the Onion 66% on Server Time
10 0.68441635 1267 high scalability-2012-06-18-The Clever Ways Chrome Hides Latency by Anticipating Your Every Need
11 0.68410009 228 high scalability-2008-01-28-Product: ISPMan Centralized ISP Management System
12 0.68360299 1359 high scalability-2012-11-15-Gone Fishin': Justin.Tv's Live Video Broadcasting Architecture
13 0.68335193 796 high scalability-2010-03-16-Justin.tv's Live Video Broadcasting Architecture
14 0.68290591 1587 high scalability-2014-01-29-10 Things Bitly Should Have Monitored
15 0.67999727 138 high scalability-2007-10-30-Feedblendr Architecture - Using EC2 to Scale
16 0.67875159 1392 high scalability-2013-01-23-Building Redundant Datacenter Networks is Not For Sissies - Use an Outside WAN Backbone
17 0.67259169 788 high scalability-2010-03-04-How MySpace Tested Their Live Site with 1 Million Concurrent Users
18 0.67181969 290 high scalability-2008-03-28-How to Get DNS Names of a Web Server
19 0.66850978 987 high scalability-2011-02-10-Dispelling the New SSL Myth
20 0.66536736 533 high scalability-2009-03-11-The Implications of Punctuated Scalabilium for Website Architecture
topicId topicWeight
[(1, 0.127), (2, 0.153), (10, 0.041), (30, 0.077), (32, 0.012), (47, 0.029), (56, 0.025), (61, 0.065), (77, 0.02), (79, 0.089), (81, 0.191), (85, 0.016), (93, 0.032), (94, 0.041)]
simIndex simValue blogId blogTitle
1 0.88973051 540 high scalability-2009-03-16-Cisco and Sun to Compete for Unified Computing?
Introduction: A recent InfoWorld article claims that "With Cisco expected to enter the blade market and Sun expected to offer networking equipment, things could get interesting awfully fast." How does this effect your infrastructure strategy and decisions? Would you consider to build scalable web applications on the Cisco Unified Computing System? Or would you consider to build a router out of a server with the use of OpenSolaris and Project Crossbow as the article suggests? Will any of these initiatives change the way we build scalable web infrastructure or are these just attempts to sale these systems? What do you think?
same-blog 2 0.8702246 773 high scalability-2010-02-06-GEO-aware traffic load balancing and caching at CNBC.com
Introduction: CNBC, like many large web sites, relied on a CDN for content delivery. Recently, we started looking to see if we could improve this model. Our criteria was: - improve response time - have better control over traffic (real time reporting, change management and alerting) - better utilize internal datacenters and their infrastructure - shield users from any troubles at the origin infrastructure - cost out After researching the market, we turned to two vendors: Dyn (Dynamic Network Services) and aiScaler . We' have had about a year worth of experience with aiScaler (search for "CNBC" to see my previous post ), but Dyn was a new vendor for us. We started building our relationship at Velocity conference in the summer of 2009. Dyn has recently started offering a geo-aware DNS load balancing solution, using Anycast and the distributed nature of their DNS presence to enable a key component of what we were trying to achieve: steer users to geographically closest origin
3 0.831056 19 high scalability-2007-07-16-Paper: Replication Under Scalable Hashing
Introduction: Replication Under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution From the abstract: Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or disks. RUSH algorithms distribute objects to servers according to user-specified server weighting. While all RUSH variants support addition of servers to the system, different variants have different characteristics with respect to lookup time in petabyte-scale systems, performance with mirroring (as opposed to redundancy codes), and storage server removal. All RUSH variants redistribute as few objects as possible when new servers are added or existing servers
4 0.80212069 100 high scalability-2007-09-26-Use a CDN to Instantly Improve Your Website's Performance by 20% or More
Introduction: If you have a lot of static content to store and you aren't looking forward to setting up and maintaining your own giganto SAN, maybe you can push off a lot of the hard lifting to a CDN? Jesse Robbins at O'Reilly Radar posts that you have a lot more options now because the number of Content Distribution Networks have doubled since last year . In fact, Dan Rayburn says there are now 28 CDN providers in the market. Hopefully you can find reasonable pricing at one of them. Other than easing your burden, why might a CDN work for you? Because it makes your site faster and customers like that. How can a CDN so dramatically improve your site's performance? Steve Saunders, author of High Performance Web Sites: Essential Knowledge for Front-End Engineers , has using a CDN has one of his "Thirteen Simple Rules for Speeding Up Your Web Site." About CDNs Steve says: Remember that 80-90% of the end-user response time is spent downloading all the components in
5 0.79734021 133 high scalability-2007-10-26-How Gravatar scales on WordPress.com hardware
Introduction: Automattic recently purchase Gravatar and have switched the server onto their hosting platform. WordPress.com host over 1.7 million blogs with well over 60'000 new posts submitted each day generating 10 - 12 million page views per day. Barry on WordPress.com has a great post on the changes they've introduced to help Gravatar scale .
6 0.79163712 1375 high scalability-2012-12-21-Stuff The Internet Says On Scalability For December 21, 2012
7 0.77821553 985 high scalability-2011-02-08-Mollom Architecture - Killing Over 373 Million Spams at 100 Requests Per Second
8 0.77577257 1575 high scalability-2014-01-08-Under Snowden's Light Software Architecture Choices Become Murky
10 0.76715851 195 high scalability-2007-12-28-Amazon's EC2: Pay as You Grow Could Cut Your Costs in Half
11 0.75872606 1330 high scalability-2012-09-28-Stuff The Internet Says On Scalability For September 28, 2012
12 0.75772101 1450 high scalability-2013-05-01-Myth: Eric Brewer on Why Banks are BASE Not ACID - Availability Is Revenue
13 0.75678152 1109 high scalability-2011-09-02-Stuff The Internet Says On Scalability For September 2, 2011
14 0.75588894 576 high scalability-2009-04-21-What CDN would you recommend?
15 0.75503683 573 high scalability-2009-04-16-Serving 250M quotes-day at CNBC.com with aiCache
16 0.75415421 1011 high scalability-2011-03-25-Did the Microsoft Stack Kill MySpace?
17 0.75369471 881 high scalability-2010-08-16-Scaling an AWS infrastructure - Tools and Patterns
18 0.75367647 851 high scalability-2010-07-02-Hot Scalability Links for July 2, 2010
19 0.75364083 1476 high scalability-2013-06-14-Stuff The Internet Says On Scalability For June 14, 2013
20 0.75344503 857 high scalability-2010-07-13-DbShards Part Deux - The Internals