high_scalability high_scalability-2009 high_scalability-2009-573 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: As traffic to cnbc.com continued to grow, we found ourselves in an all-too-familiar situation where one feels that a BIG change in how things are done was in order, the status-quo was a road to nowhere. The spending on HW, amount of space and power required to host additional servers, less-than-stellar response times, having to resort to frequent "micro"-caching and similar tricks to try to improve code performance - all of these were surfacing in plain sight, hard to ignore. While code base could clearly be improved, the limited Dev resources and having to innovate to stay competitive always limits ability to go about refactoring. So how can one go about addressing performance and other needs without a full blown effort across the entire team ? For us, the answer was aiCache - a Web caching and application acceleration product (aicache.com). The idea behind caching is simple - handle the requests before they ever hit your regular Apache<->JK
sentIndex sentText sentNum sentScore
1 The spending on HW, amount of space and power required to host additional servers, less-than-stellar response times, having to resort to frequent "micro"-caching and similar tricks to try to improve code performance - all of these were surfacing in plain sight, hard to ignore. [sent-3, score-0.258]
2 For us, the answer was aiCache - a Web caching and application acceleration product (aicache. [sent-6, score-0.207]
3 The idea behind caching is simple - handle the requests before they ever hit your regular Apache<->JK<->Java<->Database response generation train (we're mostly a Java shop). [sent-8, score-0.306]
4 In our case we have many more caching sub-systems, aimed at speeding up access to stock and company-related information. [sent-10, score-0.195]
5 aiCache takes this basic idea of caching and front-ending the user traffic to your Web environment to a whole new level. [sent-16, score-0.267]
6 I don't believe any of aiCache's features are revolutionary in nature, rather it is the sheer number of features it offers that seems to address our every imaginable need. [sent-17, score-0.215]
7 In interest of space, here're some quick facts about our experience with the product, in no particular order: · Runs on any Linux distro, our standard happens to be RedHat 5, 64bit on HP DL360G5 · The responses are cached in the RAM, not on disk. [sent-19, score-0.43]
8 No latency for cached responses - stress tests show TTFB at 0 ms. [sent-21, score-0.569]
9 Extremely low resource utilization - aiCache servers serving in excess of 2000 req/sec are reported to be 99% idle ! [sent-22, score-0.246]
10 Being not a trusting type, I verified the vendor's claim and stress tested these to about 25,000 req/sec per server - with load averages of about 2 (! [sent-23, score-0.276]
11 · We cache both GET and POST results, with query and parameter busting (selectively removing those semi-random parameters that complicate caching) · For user comments, we use response-driven expiration to refresh comment threads when a new comment is posted. [sent-25, score-0.283]
12 · Had a chance to use site-fallback feature (where aiCache serves cached responses and shields origin servers from any traffic) to expedite service recovery · Used origin-server tagging a few times to get us out of code-deployment-gone-bad situations. [sent-26, score-0.956]
13 Have already downsized a number of production Web farms, having offloaded so much traffic from origin server infrastructure, we see much lower resource utilization across Web, DB and other backend systems · Keynote reports significant improvement in response times - about 30%. [sent-28, score-0.629]
14 You get to see req/sec, response time, number of good/bad origin servers, client and origin server connections, input and output BW and so on - all reported per cached sub-domain. [sent-30, score-0.987]
15 · Their CLI interface is something I like a lot too: you see the inventory of responses, can write out any response, expire responses, report responses sorted by request, size, fill time, refreshes and so on, in real time, no log crunching is required. [sent-33, score-0.401]
16 We use F5 load balancers and have configured the virtual IPs to have both aiCache servers _and origin server enabled at the same time. [sent-39, score-0.371]
17 Using F5's VIP priority feature, we direct all of the traffic to aiCache servers, as long as at least one is available, but have ability to automatically, or on demand, failover all of the traffic to origin servers. [sent-40, score-0.576]
18 It probably helped that I have experience with other caching products - going back to circa 2000, using Novell ICS. [sent-43, score-0.199]
19 But it all mostly boils down to knowing what URLs can be cached and for how long. [sent-44, score-0.239]
20 And lastly - when you want stress test aiCache, make sure to hit it directly, right by server's IP - otherwise you will most likely melt down one or more of other network infrastructure components ! [sent-45, score-0.282]
wordName wordTfidf (topN-words)
[('aicache', 0.348), ('origin', 0.302), ('responses', 0.256), ('cached', 0.174), ('imaginable', 0.15), ('nbc', 0.15), ('stress', 0.139), ('traffic', 0.137), ('caching', 0.13), ('response', 0.111), ('reported', 0.098), ('infrastructures', 0.092), ('reporting', 0.085), ('surfacing', 0.08), ('bw', 0.08), ('cnbc', 0.08), ('expedite', 0.08), ('melt', 0.08), ('refreshes', 0.08), ('utilization', 0.079), ('product', 0.077), ('auxiliary', 0.075), ('shields', 0.075), ('uucp', 0.075), ('stumbled', 0.072), ('russia', 0.072), ('weighs', 0.072), ('blazing', 0.072), ('busier', 0.072), ('complicate', 0.072), ('sprinkled', 0.072), ('trusting', 0.072), ('comment', 0.071), ('distro', 0.069), ('snmp', 0.069), ('busting', 0.069), ('circa', 0.069), ('servers', 0.069), ('cli', 0.067), ('vip', 0.067), ('resort', 0.067), ('ratios', 0.065), ('sight', 0.065), ('sheer', 0.065), ('crunching', 0.065), ('aimed', 0.065), ('hw', 0.065), ('verified', 0.065), ('mostly', 0.065), ('lastly', 0.063)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999952 573 high scalability-2009-04-16-Serving 250M quotes-day at CNBC.com with aiCache
Introduction: As traffic to cnbc.com continued to grow, we found ourselves in an all-too-familiar situation where one feels that a BIG change in how things are done was in order, the status-quo was a road to nowhere. The spending on HW, amount of space and power required to host additional servers, less-than-stellar response times, having to resort to frequent "micro"-caching and similar tricks to try to improve code performance - all of these were surfacing in plain sight, hard to ignore. While code base could clearly be improved, the limited Dev resources and having to innovate to stay competitive always limits ability to go about refactoring. So how can one go about addressing performance and other needs without a full blown effort across the entire team ? For us, the answer was aiCache - a Web caching and application acceleration product (aicache.com). The idea behind caching is simple - handle the requests before they ever hit your regular Apache<->JK
2 0.27482149 773 high scalability-2010-02-06-GEO-aware traffic load balancing and caching at CNBC.com
Introduction: CNBC, like many large web sites, relied on a CDN for content delivery. Recently, we started looking to see if we could improve this model. Our criteria was: - improve response time - have better control over traffic (real time reporting, change management and alerting) - better utilize internal datacenters and their infrastructure - shield users from any troubles at the origin infrastructure - cost out After researching the market, we turned to two vendors: Dyn (Dynamic Network Services) and aiScaler . We' have had about a year worth of experience with aiScaler (search for "CNBC" to see my previous post ), but Dyn was a new vendor for us. We started building our relationship at Velocity conference in the summer of 2009. Dyn has recently started offering a geo-aware DNS load balancing solution, using Anycast and the distributed nature of their DNS presence to enable a key component of what we were trying to achieve: steer users to geographically closest origin
Introduction: This is a guest post by Dave Hagler Systems Architect at AOL. The AOL homepages receive more than 8 million visitors per day . That’s more daily viewers than Good Morning America or the Today Show on television. Over a billion page views are served each month. AOL.com has been a major internet destination since 1996, and still has a strong following of loyal users. The architecture for AOL.com is in it’s 5th generation . It has essentially been rebuilt from scratch 5 times over two decades. The current architecture was designed 6 years ago. Pieces have been upgraded and new components have been added along the way, but the overall design remains largely intact. The code, tools, development and deployment processes are highly tuned over 6 years of continual improvement, making the AOL.com architecture battle tested and very stable. The engineering team is made up of developers, testers, and operations and totals around 25 people . The majority are in Dulles, Virginia
Introduction: Who's Hiring? Torbit is hiring ! Care about performance? Care about making the internet faster and better? At Torbit we use lots of Golang, Node.js, JavaScript and PHP to solve big challenges. Fun and Informative Events GigaSpaces Upcoming Events: webinar on Transactional Cross-Site Data Replication , CloudCamp lightning talk , A Groovy Kind of Java , Cloud Computing World Forum , QCon . Cool Products and Services New Relic - real user monitoring optimize for humans, not bots. Live application stats, SQL/NoSQL performance, web transactions, proactive notifications. Take 2 minutes to sign up for a free trial. NetDNA , a Tier-1 GlobalContent Delivery Network, offers a Dual-CDN strategy which allows companies to utilize a redundant infrastructure while leveraging the advantages of multiple CDNs to reduce costs. Digital Ocean is a Simple Cloud Hosting platform that offers Free Unlimited Bandwidth and Virtual Servers from $10 per month. Sign up f
Introduction: Who's Hiring? Torbit is hiring ! Care about performance? Care about making the internet faster and better? At Torbit we use lots of Golang, Node.js, JavaScript and PHP to solve big challenges. Fun and Informative Events GigaSpaces Upcoming Events: webinar on Transactional Cross-Site Data Replication , CloudCamp lightning talk , A Groovy Kind of Java , Cloud Computing World Forum , QCon . O'Reilly Velocity , the Web Performance and Operations conference, is happening in Santa Clara, CA from June 25-27. Learn from your peers, exchange ideas with experts, and share best practices and lessons learned. Register here . Sign up for this free 30-minute webinar exploring how new technology can determine which ads have been seen by users and will discuss the C3 Metrics Labs analysis of over 2 billion impressions. Cool Products and Services NetDNA , a Tier-1 GlobalContent Delivery Network, offers a Dual-CDN strategy which allows companies to utilize a re
topicId topicWeight
[(0, 0.234), (1, 0.028), (2, -0.078), (3, -0.108), (4, -0.032), (5, -0.051), (6, 0.042), (7, 0.016), (8, -0.042), (9, 0.012), (10, 0.009), (11, -0.019), (12, -0.013), (13, 0.013), (14, 0.008), (15, -0.017), (16, 0.059), (17, -0.005), (18, -0.017), (19, -0.046), (20, -0.013), (21, 0.009), (22, 0.036), (23, -0.002), (24, 0.023), (25, 0.024), (26, -0.025), (27, 0.001), (28, -0.046), (29, -0.042), (30, -0.021), (31, 0.035), (32, 0.011), (33, 0.023), (34, 0.025), (35, 0.025), (36, 0.008), (37, 0.027), (38, 0.002), (39, 0.013), (40, 0.022), (41, -0.002), (42, 0.015), (43, -0.01), (44, -0.034), (45, -0.015), (46, 0.01), (47, 0.019), (48, 0.004), (49, 0.054)]
simIndex simValue blogId blogTitle
same-blog 1 0.97198713 573 high scalability-2009-04-16-Serving 250M quotes-day at CNBC.com with aiCache
Introduction: As traffic to cnbc.com continued to grow, we found ourselves in an all-too-familiar situation where one feels that a BIG change in how things are done was in order, the status-quo was a road to nowhere. The spending on HW, amount of space and power required to host additional servers, less-than-stellar response times, having to resort to frequent "micro"-caching and similar tricks to try to improve code performance - all of these were surfacing in plain sight, hard to ignore. While code base could clearly be improved, the limited Dev resources and having to innovate to stay competitive always limits ability to go about refactoring. So how can one go about addressing performance and other needs without a full blown effort across the entire team ? For us, the answer was aiCache - a Web caching and application acceleration product (aicache.com). The idea behind caching is simple - handle the requests before they ever hit your regular Apache<->JK
Introduction: This is a guest post by Dave Hagler Systems Architect at AOL. The AOL homepages receive more than 8 million visitors per day . That’s more daily viewers than Good Morning America or the Today Show on television. Over a billion page views are served each month. AOL.com has been a major internet destination since 1996, and still has a strong following of loyal users. The architecture for AOL.com is in it’s 5th generation . It has essentially been rebuilt from scratch 5 times over two decades. The current architecture was designed 6 years ago. Pieces have been upgraded and new components have been added along the way, but the overall design remains largely intact. The code, tools, development and deployment processes are highly tuned over 6 years of continual improvement, making the AOL.com architecture battle tested and very stable. The engineering team is made up of developers, testers, and operations and totals around 25 people . The majority are in Dulles, Virginia
3 0.81241977 1401 high scalability-2013-02-06-Super Bowl Advertisers Ready for the Traffic? Nope..It's Lights Out.
Introduction: Advertising for the Super Bowl is bigger than the game for many viewers. So you gotta figure advertisers are ready for the traffic bursts generated by their expensive ads? Not exactly... Yottaa reports an amazing 13 advertiser websites crashed during the Super Bowl. Coke was interactively au currant, asking viewers to vote for the ending of a commercial, but load times went to 62 seconds. SodaStream, Calvin Klein, Axe, Got Milk? The Walking Dead, many movie sites, and many car sites, all were flagged with delay of fame penalties. Lots of time, money, and creative energy is spent lovingly perfecting every detail of these commercials. It won't be a surprise to any programmer that this can't usually be said of the follow through on the backend. So what can you do? Yottaa has some good tips and Michael Hamrah has a wonderful post on dealing with the Super Bowl Burst Problem: Yottaa's tips: Reduce the number of assets and asset weight to create smaller, more lightweight page
4 0.81209564 773 high scalability-2010-02-06-GEO-aware traffic load balancing and caching at CNBC.com
Introduction: CNBC, like many large web sites, relied on a CDN for content delivery. Recently, we started looking to see if we could improve this model. Our criteria was: - improve response time - have better control over traffic (real time reporting, change management and alerting) - better utilize internal datacenters and their infrastructure - shield users from any troubles at the origin infrastructure - cost out After researching the market, we turned to two vendors: Dyn (Dynamic Network Services) and aiScaler . We' have had about a year worth of experience with aiScaler (search for "CNBC" to see my previous post ), but Dyn was a new vendor for us. We started building our relationship at Velocity conference in the summer of 2009. Dyn has recently started offering a geo-aware DNS load balancing solution, using Anycast and the distributed nature of their DNS presence to enable a key component of what we were trying to achieve: steer users to geographically closest origin
5 0.79716372 985 high scalability-2011-02-08-Mollom Architecture - Killing Over 373 Million Spams at 100 Requests Per Second
Introduction: Mollom is one of those cool SaaS companies every developer dreams of creating when they wrack their brains looking for a viable software-as-a-service startup. Mollom profitably runs a useful service— spam filtering —with a small group of geographically distributed developers. Mollom helps protect nearly 40,000 websites from spam, including one of mine , which is where I first learned about Mollom. In a desperate attempt to stop spam on a Drupal site, where every other form of CAPTCHA had failed miserably, I installed Mollom in about 10 minutes and it immediately started working. That's the out of the box experience I was looking for. From the time Mollom opened its digital inspection system they've rejected over 373 million spams and in the process they've learned that a stunning 90% of all messages are spam. This spam torrent is handled by only two geographically distributed machines that handle 100 requests/ second, each running a Java application server and Cassandra. So few res
6 0.79568988 800 high scalability-2010-03-26-Strategy: Caching 404s Saved the Onion 66% on Server Time
7 0.76612347 808 high scalability-2010-04-12-Poppen.de Architecture
8 0.76490164 821 high scalability-2010-05-03-MocoSpace Architecture - 3 Billion Mobile Page Views a Month
9 0.75870907 942 high scalability-2010-11-15-Strategy: Biggest Performance Impact is to Reduce the Number of HTTP Requests
10 0.75592977 136 high scalability-2007-10-28-Scaling Early Stage Startups
11 0.7500506 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
12 0.74666554 998 high scalability-2011-03-03-Stack Overflow Architecture Update - Now at 95 Million Page Views a Month
13 0.74460417 1438 high scalability-2013-04-10-Check Yourself Before You Wreck Yourself - Avocado's 5 Early Stages of Architecture Evolution
14 0.74456006 1333 high scalability-2012-10-04-LinkedIn Moved from Rails to Node: 27 Servers Cut and Up to 20x Faster
15 0.74389356 987 high scalability-2011-02-10-Dispelling the New SSL Myth
16 0.74201971 884 high scalability-2010-08-23-6 Ways to Kill Your Servers - Learning How to Scale the Hard Way
17 0.73910755 150 high scalability-2007-11-12-Slashdot Architecture - How the Old Man of the Internet Learned to Scale
19 0.73793888 1284 high scalability-2012-07-16-Cinchcast Architecture - Producing 1,500 Hours of Audio Every Day
20 0.73790407 1336 high scalability-2012-10-09-Batoo JPA - The new JPA Implementation that runs over 15 times faster...
topicId topicWeight
[(1, 0.175), (2, 0.203), (10, 0.04), (11, 0.016), (30, 0.074), (40, 0.027), (56, 0.02), (61, 0.06), (77, 0.024), (79, 0.085), (85, 0.028), (93, 0.148), (94, 0.035)]
simIndex simValue blogId blogTitle
1 0.95202744 403 high scalability-2008-10-06-Paper: Scaling Genome Sequencing - Complete Genomics Technology Overview
Introduction: Although the problem of scaling human genome sequencing is not exactly about building bigger, faster and more reliable websites it is most interesting in terms of scalability. The paper describes a new technology by the startup company Complete Genomics to sequence the full human genome for the fraction of the cost of earlier possibilities. Complete Genomics is building the world’s largest commercial human genome sequencing center to provide turnkey, outsourced complete human genome sequencing to customers worldwide. By 2010, their data center will contain approximately 60,000 processors with 30 petabytes of storage running their sequencing software on Linux clusters. Do you find this interesting and relevant to HighScalability.com?
2 0.93230927 1450 high scalability-2013-05-01-Myth: Eric Brewer on Why Banks are BASE Not ACID - Availability Is Revenue
Introduction: In NoSQL: Past, Present, Future Eric Brewer has a particularly fine section on explaining the often hard to understand ideas of BASE (Basically Available, Soft State, Eventually Consistent), ACID (Atomicity, Consistency, Isolation, Durability), CAP (Consistency Availability, Partition Tolerance), in terms of a pernicious long standing myth about the sanctity of consistency in banking. Myth : Money is important, so banks must use transactions to keep money safe and consistent, right? Reality : Banking transactions are inconsistent, particularly for ATMs. ATMs are designed to have a normal case behaviour and a partition mode behaviour. In partition mode Availability is chosen over Consistency. Why? 1) Availability correlates with revenue and consistency generally does not. 2) Historically there was never an idea of perfect communication so everything was partitioned. Your ATM transaction must go through so Availability is more important than
same-blog 3 0.93184382 573 high scalability-2009-04-16-Serving 250M quotes-day at CNBC.com with aiCache
Introduction: As traffic to cnbc.com continued to grow, we found ourselves in an all-too-familiar situation where one feels that a BIG change in how things are done was in order, the status-quo was a road to nowhere. The spending on HW, amount of space and power required to host additional servers, less-than-stellar response times, having to resort to frequent "micro"-caching and similar tricks to try to improve code performance - all of these were surfacing in plain sight, hard to ignore. While code base could clearly be improved, the limited Dev resources and having to innovate to stay competitive always limits ability to go about refactoring. So how can one go about addressing performance and other needs without a full blown effort across the entire team ? For us, the answer was aiCache - a Web caching and application acceleration product (aicache.com). The idea behind caching is simple - handle the requests before they ever hit your regular Apache<->JK
4 0.92367119 1330 high scalability-2012-09-28-Stuff The Internet Says On Scalability For September 28, 2012
Introduction: It's HighScalability Time: Quotable Quotes: @dbasch : The world is full of "scalability engineers" who would die from an orgasm if their software ever saw 10,000 requests in a day. @mtnygard : “Scaling issues are always expressed as a queue backing up somewhere.” —@moonpolysoft #strangeloop @rbranson : If your data fits in main memory, you're doing it wrong. #strangeloop @peakscale : Using schemaless DBs an "overreaction" & "confuses the poor impl. of schemas with the value that schemas provide" @adrianco : GM: Performance analysis is complicated by your brain thinking LINEARLY about a computer system that is NONLINEAR. @littleidea : it's better to have infinite scalability and not need it, than to need infinite scalability and not have it Looks like Google is on the right track with their language understanding efforts. How hierarchical is language use : In this paper, we review evidence from the recen
5 0.91316289 1198 high scalability-2012-02-24-Stuff The Internet Says On Scalability For February 24, 2012
Introduction: This is not your father's HighScalability: 13,000 times the world’s GDP : Cost of the Death Star Quotable quotes: @chrissalzman : Scalability is the enemy of right now. @resatsch : I like our IT team: "We used Redis before Youporn did it" @virtual_bill : Mixing flash and spinning disk to balance cost is like strapping a rocket to a turtle. @jaksprats : HDDs got slower at random access as they got bigger, cuz disk seeks stayed almost the same, similar phenomenon w/ Flash Priam, king of Troy, begat a daughter, Cassandra, and Netflix, king of true distributed Amazon infrastructure, begat a co-processor for Cassandra, Priam , used for Backup and recovery, Bootstrapping, Centralized configuration management, and RESTful monitoring and metrics. This is why Troy was never actually destroyed, it was simply backedup in-situ to another region. Evernote is everfaithful to SQL because SQL gives it all the ACID it needs to keep its billion Note
6 0.9072926 168 high scalability-2007-11-30-Strategy: Efficiently Geo-referencing IPs
7 0.90578079 349 high scalability-2008-07-10-Can cloud computing smite down evil zombie botnet armies?
8 0.90056241 1513 high scalability-2013-09-06-Stuff The Internet Says On Scalability For September 6, 2013
9 0.90021622 944 high scalability-2010-11-17-Some Services are More Equal than Others
10 0.89529032 1637 high scalability-2014-04-25-Stuff The Internet Says On Scalability For April 25th, 2014
11 0.88344592 58 high scalability-2007-08-04-Product: Cacti
12 0.88125426 788 high scalability-2010-03-04-How MySpace Tested Their Live Site with 1 Million Concurrent Users
13 0.88002557 1011 high scalability-2011-03-25-Did the Microsoft Stack Kill MySpace?
16 0.87804765 1533 high scalability-2013-10-16-Interview With Google's Ilya Grigorik On His New Book: High Performance Browser Networking
18 0.87605149 1279 high scalability-2012-07-09-Data Replication in NoSQL Databases
19 0.87558919 233 high scalability-2008-01-30-How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
20 0.87544096 195 high scalability-2007-12-28-Amazon's EC2: Pay as You Grow Could Cut Your Costs in Half