high_scalability high_scalability-2008 high_scalability-2008-241 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Hi, We're running a enterprise SaaS solution that currently holds about 700 customers with up to 50.000 users per customer (growing quickly). Our customers have SLA agreements with us that contains guaranteed uptimes, response times and other performance counters. With an increasing number of customers and traffic we find it difficult to provide our customer with actual SLA data. We could set up external probes that monitors certain parts of the application, but this is time consuming with 700 customers (we do it today for our biggest clients). We can also extract data from web logs but they are now approaching about 30-40 GB a day. What we really need is monitoring software that not only focuses on the internal performance counters but also lets us see the application from the customers viewpoint and allows us to aggregate data in different ways. Would the best approach be to develop a custom solution (for instance a distributed app that aggregates data from different logs e
sentIndex sentText sentNum sentScore
1 Hi, We're running a enterprise SaaS solution that currently holds about 700 customers with up to 50. [sent-1, score-0.754]
2 Our customers have SLA agreements with us that contains guaranteed uptimes, response times and other performance counters. [sent-3, score-1.02]
3 With an increasing number of customers and traffic we find it difficult to provide our customer with actual SLA data. [sent-4, score-0.782]
4 We could set up external probes that monitors certain parts of the application, but this is time consuming with 700 customers (we do it today for our biggest clients). [sent-5, score-1.314]
5 We can also extract data from web logs but they are now approaching about 30-40 GB a day. [sent-6, score-0.594]
6 What we really need is monitoring software that not only focuses on the internal performance counters but also lets us see the application from the customers viewpoint and allows us to aggregate data in different ways. [sent-7, score-1.375]
7 Would the best approach be to develop a custom solution (for instance a distributed app that aggregates data from different logs every night and store them in a data warehouse) or are there products out there that are suitable for a high scalability environment? [sent-8, score-1.269]
wordName wordTfidf (topN-words)
[('customers', 0.353), ('sla', 0.308), ('probes', 0.237), ('logs', 0.207), ('agreements', 0.204), ('aggregates', 0.18), ('appreciated', 0.174), ('approaching', 0.174), ('customer', 0.158), ('consuming', 0.148), ('holds', 0.148), ('focuses', 0.148), ('monitors', 0.146), ('us', 0.145), ('suitable', 0.142), ('extract', 0.142), ('guaranteed', 0.14), ('warehouse', 0.133), ('counters', 0.131), ('greatly', 0.131), ('night', 0.118), ('input', 0.117), ('lets', 0.113), ('contains', 0.112), ('aggregate', 0.112), ('solution', 0.104), ('gb', 0.103), ('actual', 0.103), ('external', 0.101), ('biggest', 0.092), ('saas', 0.089), ('clients', 0.089), ('internal', 0.088), ('custom', 0.087), ('increasing', 0.087), ('certain', 0.086), ('develop', 0.082), ('parts', 0.082), ('difficult', 0.081), ('currently', 0.075), ('growing', 0.074), ('enterprise', 0.074), ('environment', 0.073), ('data', 0.071), ('quickly', 0.071), ('instance', 0.071), ('different', 0.069), ('today', 0.069), ('products', 0.067), ('response', 0.066)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 241 high scalability-2008-02-05-SLA monitoring
Introduction: Hi, We're running a enterprise SaaS solution that currently holds about 700 customers with up to 50.000 users per customer (growing quickly). Our customers have SLA agreements with us that contains guaranteed uptimes, response times and other performance counters. With an increasing number of customers and traffic we find it difficult to provide our customer with actual SLA data. We could set up external probes that monitors certain parts of the application, but this is time consuming with 700 customers (we do it today for our biggest clients). We can also extract data from web logs but they are now approaching about 30-40 GB a day. What we really need is monitoring software that not only focuses on the internal performance counters but also lets us see the application from the customers viewpoint and allows us to aggregate data in different ways. Would the best approach be to develop a custom solution (for instance a distributed app that aggregates data from different logs e
2 0.13925485 1289 high scalability-2012-07-23-State of the CDN: More Traffic, Stable Prices, More Products, Profits - Not So Much
Introduction: CDNs ( content delivery networks ) are the secret shadow super powers behind the web and Dan Rayburn at streamingmedia.com is the go to investigative reporter for quality information on CDNs. Every year Dan has a Content Delivery Summit on all things CDN and those videos are now available . Dan also gives a kind of state of the industry talk where he does something wonderful, he gives real numbers and prices. Dan really knows his stuff and is an excellent speaker, so watch the video, but here’s my gloss on the state of the CDN so far this year: Massive growth . Large customers are expecting 126% growth in video traffic over last year; medium size customers are seeing 48% traffic growth, small sized customer are seeing 73.3% traffic growth. More traffic != More profit . Traffic growth doesn’t lead to more profit because the traffic growth is concentrated in larger customers that can make the best deals. Video takes up the largest amount of traffic on a
3 0.12790094 38 high scalability-2007-07-30-Build an Infinitely Scalable Infrastructure for $100 Using Amazon Services
Introduction: Can you really create an infinitely scalable infrastructure for less than $100 using Amazon's storage, grid, and queuing services platform? It appears so, at least for the right application. Amazon beams a spot light on the future battle of the roll-your-own versus the connect-the-dots approach to building next generation websites using core external services. Their argument is strong. Using Amazon's platform you can quickly build an infrastructure that would otherwise take an eternity to make, a pile of money to create, and an unbounded mass of people to implement and maintain. Yet Amazon doesn't provide SLAs, so you can you really trust them with your crown jewels? Facebook recently leap frogged Amazon's vision with an even more comprehensive set of services. The battle for the future is on. Site: http://aws.amazon.com/ Information Sources Slides: Building Highly Scalable Web Applications Podcast: Technometria: Amazon Web Services Amazon Services Home . Platform
4 0.12551667 1082 high scalability-2011-07-18-New Relic Architecture - Collecting 20+ Billion Metrics a Day
Introduction: This is a guest post by Brian Doll , Application Performance Engineer at New Relic. New Relic’s multitenant, SaaS web application monitoring service collects and persists over 100,000 metrics every second on a sustained basis, while still delivering an average page load time of 1.5 seconds. We believe that good architecture and good tools can help you handle an extremely large amount of data while still providing extremely fast service. Here we'll show you how we do it. New Relic is Application Performance Management (APM) as a Service In-app agent instrumentation (bytecode instrumentation, etc.) Support for 5 programming languages (Ruby, Java, PHP, .NET, Python) 175,000+ app processes monitored globally 10,000+ customers The Stats 20+ Billion application metrics collected every day 1.7+ Billion web page metrics collected every week Each "timeslice" metric is about 250 bytes 100k timeslice records inserted every second 7 Billion new rows of d
5 0.12513138 204 high scalability-2008-01-08-Virus Scanning for Uploaded content
Introduction: All, What is the best way to scan the content being uploaded by the users? Is there any open source solution available to do that? How does YouTube, flickr and other user uploadable content sites handle this? Any insight would be greatly appreciated! Regards, Janakan Rajendran.
8 0.11826557 663 high scalability-2009-07-28-37signals Architecture
9 0.1161874 233 high scalability-2008-01-30-How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
16 0.1076032 96 high scalability-2007-09-18-Amazon Architecture
17 0.10537437 691 high scalability-2009-08-31-Squarespace Architecture - A Grid Handles Hundreds of Millions of Requests a Month
20 0.098637514 236 high scalability-2008-02-03-Ideas on how to scale a shared inventory database???
topicId topicWeight
[(0, 0.166), (1, 0.001), (2, -0.024), (3, -0.016), (4, 0.002), (5, -0.028), (6, 0.051), (7, -0.058), (8, 0.024), (9, -0.014), (10, 0.005), (11, 0.008), (12, 0.025), (13, -0.026), (14, 0.043), (15, 0.018), (16, -0.01), (17, 0.013), (18, 0.014), (19, -0.024), (20, -0.011), (21, 0.031), (22, 0.029), (23, 0.019), (24, 0.044), (25, -0.042), (26, -0.075), (27, -0.053), (28, 0.044), (29, -0.0), (30, 0.064), (31, -0.01), (32, 0.008), (33, 0.018), (34, -0.061), (35, 0.007), (36, -0.02), (37, -0.006), (38, 0.012), (39, -0.015), (40, 0.008), (41, -0.016), (42, -0.004), (43, -0.012), (44, -0.014), (45, 0.027), (46, -0.005), (47, 0.046), (48, -0.021), (49, -0.055)]
simIndex simValue blogId blogTitle
same-blog 1 0.94762731 241 high scalability-2008-02-05-SLA monitoring
Introduction: Hi, We're running a enterprise SaaS solution that currently holds about 700 customers with up to 50.000 users per customer (growing quickly). Our customers have SLA agreements with us that contains guaranteed uptimes, response times and other performance counters. With an increasing number of customers and traffic we find it difficult to provide our customer with actual SLA data. We could set up external probes that monitors certain parts of the application, but this is time consuming with 700 customers (we do it today for our biggest clients). We can also extract data from web logs but they are now approaching about 30-40 GB a day. What we really need is monitoring software that not only focuses on the internal performance counters but also lets us see the application from the customers viewpoint and allows us to aggregate data in different ways. Would the best approach be to develop a custom solution (for instance a distributed app that aggregates data from different logs e
2 0.72794783 1082 high scalability-2011-07-18-New Relic Architecture - Collecting 20+ Billion Metrics a Day
Introduction: This is a guest post by Brian Doll , Application Performance Engineer at New Relic. New Relic’s multitenant, SaaS web application monitoring service collects and persists over 100,000 metrics every second on a sustained basis, while still delivering an average page load time of 1.5 seconds. We believe that good architecture and good tools can help you handle an extremely large amount of data while still providing extremely fast service. Here we'll show you how we do it. New Relic is Application Performance Management (APM) as a Service In-app agent instrumentation (bytecode instrumentation, etc.) Support for 5 programming languages (Ruby, Java, PHP, .NET, Python) 175,000+ app processes monitored globally 10,000+ customers The Stats 20+ Billion application metrics collected every day 1.7+ Billion web page metrics collected every week Each "timeslice" metric is about 250 bytes 100k timeslice records inserted every second 7 Billion new rows of d
3 0.66478753 944 high scalability-2010-11-17-Some Services are More Equal than Others
Introduction: Remember when the iPhone launched? Remember the complaints about the device not maintaining calls well? Was it really the hardware? Or was it the service provider network, overwhelmed by not just the call volume but millions of hyper-customers experimenting with their new toy? Look – a video! Look a video and a call. Hey, I’m on Facebook, Twitter, YouTube, and streaming audio at the same time I’m making a call! How awesome is that? Meanwhile, there’s an entire army of operators at a service provider’s NOC who are stalking through the data center with scissors because it’s the only way to stop the madness. Service providers, probably better than any other, understand “services”. For longer than the enterprise has been talking about them, service providers have been implementing them. They’ve got their own set of standards and reference architectures and even language to describe them, but in a nutshell that’s what a service provider does: offers services. The proble
4 0.65655369 663 high scalability-2009-07-28-37signals Architecture
Introduction: Update 7: Basecamp, now with more vroom . Basecamp application servers running Ruby code were upgraded and virtualization was removed. The result: A 66 % reduction in the response time while handling multiples of the traffic is beyond what I expected . They still use virtualization (Linux KVM), just less of it now. Update 6: Things We’ve Learned at 37Signals . Themes: less is more; don't worry be happy. Update 5: Nuts & Bolts: HAproxy . Nice explanation (post, screencast) by Mark Imbriaco of why HAProxy (load balancing proxy server) is their favorite (fast, efficient, graceful configuration, queues requests when Mongrels are busy) for spreading dynamic content between Apache web servers and Mongrel application servers. Update 4: O'Rielly's Tim O'Brien interviews David Hansson , Rails creator and 37signals partner. Says BaseCamp scales horizontally on the application and web tier. Scales up for the database, using one "big ass" 128GB machine. Says: As technology moves on,
5 0.65267193 1090 high scalability-2011-08-01-Peecho Architecture - scalability on a shoestring
Introduction: This is a guest post by Marcel Panse and Sander Nagtegaal from Peecho . Although architecture descriptions are an interesting read, the problems that start-ups face are hardly ever addressed. We would like to change that, so here is our architecture story. Introducing a start-up The Amsterdam-based company Peecho offers print-as-a-service. Our embeddable print button allows you to sell your digital content as professionally printed products, like photo books, magazines or canvases - straight from your own website. There is an API, too. Printcloud is the system that powers the print button. It exists in the cloud only, growing when needed and becoming smaller if it can. The system takes in print orders, magically transforms tough data into print-ready files and routes the orders to the production facility that is closest to the intended recipient. To preserve the environment, Peecho's philosophy is to facilitate global ordering, but to aim for local production on
6 0.64846152 1557 high scalability-2013-12-02-Evolution of Bazaarvoice’s Architecture to 500M Unique Users Per Month
7 0.64066797 250 high scalability-2008-02-17-Web Accelerators - snake oil or miracle remedy?
8 0.63986701 1331 high scalability-2012-10-02-An Epic TripAdvisor Update: Why Not Run on the Cloud? The Grand Experiment.
10 0.62841725 1068 high scalability-2011-06-27-TripAdvisor Architecture - 40M Visitors, 200M Dynamic Page Views, 30TB Data
11 0.62826949 1521 high scalability-2013-09-23-Salesforce Architecture - How they Handle 1.3 Billion Transactions a Day
12 0.62003529 1482 high scalability-2013-06-26-Leveraging Cloud Computing at Yelp - 102 Million Monthly Vistors and 39 Million Reviews
13 0.61971825 1485 high scalability-2013-07-01-PRISM: The Amazingly Low Cost of Using BigData to Know More About You in Under a Minute
14 0.61470252 195 high scalability-2007-12-28-Amazon's EC2: Pay as You Grow Could Cut Your Costs in Half
15 0.61104816 881 high scalability-2010-08-16-Scaling an AWS infrastructure - Tools and Patterns
17 0.60725486 38 high scalability-2007-07-30-Build an Infinitely Scalable Infrastructure for $100 Using Amazon Services
18 0.60424596 1260 high scalability-2012-06-07-Case Study on Scaling PaaS infrastructure
19 0.60204351 1362 high scalability-2012-11-26-BigData using Erlang, C and Lisp to Fight the Tsunami of Mobile Data
20 0.60094947 1508 high scalability-2013-08-28-Sean Hull's 20 Biggest Bottlenecks that Reduce and Slow Down Scalability
topicId topicWeight
[(1, 0.217), (2, 0.137), (94, 0.532)]
simIndex simValue blogId blogTitle
Introduction: The reference configurations described in this blueprint are starting points for building Sun Customer Ready HPC Clusters configured with Sun Fire X2100 M2 and X2200 M2 servers. The configurations define how Sun Systems Group products can be configured in a typical grid rack deployment. This document describes configurations in detail using Sun Fire X2100 M2 and X2200 M2 servers with a Gigabit Ethernet data fabric, as well as configurations using Sun Fire X2200 M2 servers with a high-speed InfiniBand fabric. These configurations focus on single rack solutions, with external connections through uplink ports of the switches. These reference configurations have been architected using Sun's expertise gained in actual, real-world installations. Within certain constraints, as described in the later sections, the system can be tailored to the customer needs. Certain system components described in this document are only available through Sun's factory integration. Although the information
2 0.98884064 559 high scalability-2009-04-07-Six Lessons Learned Deploying a Large-scale Infrastructure in Amazon EC2
Introduction: Lessons learned from OpenX's large-scale deployment to Amazon EC2: Expect failures; what's more, embrace them Fully automate your infrastructure deployments Design your infrastructure so that it scales horizontally Establish clear measurable goals Be prepared to quickly identify and eliminate bottlenecks Play wack-a-mole for a while, until things get stable
3 0.97809464 605 high scalability-2009-05-22-Distributed content system with bandwidth balancing
Introduction: I am looking for a way to distribute files over servers in different physical locations. My main concern is that I have bandwidth limitations on each location, and wish to spread the bandwidth load evenly. Atm. I just have 1:1 copies of the files on all servers, and have the application pick a random server to serve the file as a temp fix... It's a small video streaming service. I want to spoonfeed the stream to the client with a max bandwidth output, and support seek. At present I use php to limit the network stream, and read the file at a given offset sendt as a get parameter from the player for seek. It's psuedo streaming, but it works. I have been looking at MogileFS, which would solve the storage part. With MogileFS I can make use of my current php solution as it supports lighttpd and apache (with mod_rewrite or similar). However I don't see how I can apply MogileFS to check for bandwidth % usage? Any reccomendations for how I can solve this?
4 0.9652704 115 high scalability-2007-10-07-Using ThreadLocal to pass context information around in web applications
Introduction: Hi, In java web servers, each http request is handled by a thread in thread pool. So for a Servlet handling the request, a thread is assigned. It is tempting (and very convinient) to keep context information in the threadlocal variable. I recently had a requirement where we need to assign logged in user id and timestamp to request sent to web services. Because we already had the code in place, it was extremely difficult to change the method signatures to pass user id everywhere. The solution I thought is class ReferenceIdGenerator { public static setReferenceId(String login) { threadLocal.set(login + System.currentMillis()); } public static String getReferenceId() { return threadLocal.get(); } private static ThreadLocal threadLocal = new ThreadLocal(); } class MySevlet { void service(.....) { HttpSession session = request.getSession(false); String userId = session.get("userId"); ReferenceIdGenerator.setRefernceId(userId
5 0.92296982 1305 high scalability-2012-08-16-Paper: A Provably Correct Scalable Concurrent Skip List
Introduction: In MemSQL Architecture we learned one of the core strategies MemSQL uses to achieve their need for speed is lock-free skip lists. Skip lists are used to efficiently handle range queries. Making the skip-lists lock-free helps eliminate contention and make writes fast. If this all sounds a little pie-in-the-sky then here's a very good paper on the subject that might help make it clearer: A Provably Correct Scalable Concurrent Skip List . From the abstract: We propose a new concurrent skip list algorithm distinguished by a combination of simplicity and scalability. The algorithm employs optimistic synchronization, searching without acquiring locks, followed by short lock-based validation before adding or removing nodes. It also logically removes an item before physically unlinking it. Unlike some other concurrent skip list algorithms, this algorithm preserves the skiplist properties at all times, which facilitates reasoning about its correctness. Experimental evidence shows that
6 0.91336036 1601 high scalability-2014-02-25-Peter Norvig's 9 Master Steps to Improving a Program
same-blog 7 0.89403462 241 high scalability-2008-02-05-SLA monitoring
8 0.87008786 834 high scalability-2010-06-01-Web Speed Can Push You Off of Google Search Rankings! What Can You Do?
10 0.8365382 91 high scalability-2007-09-13-Design Preparations for Scaling
11 0.8289488 39 high scalability-2007-07-30-Product: Akamai
12 0.81622201 1412 high scalability-2013-02-25-SongPop Scales to 1 Million Active Users on GAE, Showing PaaS is not Passé
13 0.81101263 970 high scalability-2011-01-06-BankSimple Mini-Architecture - Using a Next Generation Toolchain
14 0.80588901 827 high scalability-2010-05-14-Hot Scalability Links for May 14, 2010
15 0.78993034 1025 high scalability-2011-04-16-The NewSQL Market Breakdown
16 0.78735423 78 high scalability-2007-09-01-2 tier switch selection for colocation
17 0.77660573 1084 high scalability-2011-07-22-Stuff The Internet Says On Scalability For July 22, 2011
18 0.75704169 44 high scalability-2007-07-30-Product: Photobucket
19 0.73122412 1223 high scalability-2012-04-06-Stuff The Internet Says On Scalability For April 6, 2012
20 0.72914296 1174 high scalability-2012-01-13-Stuff The Internet Says On Scalability For January 13, 2012