high_scalability high_scalability-2009 high_scalability-2009-553 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: It's been awhile since I've said anything about collectl and I wanted to let this group know I'm currently working on an interface to ganglia since I've seen a variety of posts ranging from how much data to log and where to log it as well as which tools/mechanism to use for logging. From my perspective there are essentially 2 camps on the monitoring front - one says to have distributed agents all sending their data to a central point, but don't send too much or too often. The other camp (which is the one I'm in) says do it all locally with a highly efficient data collector, because you need a lot of data (I also read a post in here about logging everything) and you can't possibly monitors 100s or 1Ks of nodes remotely at the granularity necessary to get anything meaningful. Enter collectl and its evolving interface for ganglia. This will allow you to log lots of detailed data on local nodes at the usual 10 sec interval (or more frequent if you prefer) at about 0.1% system overhe
sentIndex sentText sentNum sentScore
1 It's been awhile since I've said anything about collectl and I wanted to let this group know I'm currently working on an interface to ganglia since I've seen a variety of posts ranging from how much data to log and where to log it as well as which tools/mechanism to use for logging. [sent-1, score-2.296]
2 From my perspective there are essentially 2 camps on the monitoring front - one says to have distributed agents all sending their data to a central point, but don't send too much or too often. [sent-2, score-0.984]
3 Enter collectl and its evolving interface for ganglia. [sent-4, score-0.595]
4 This will allow you to log lots of detailed data on local nodes at the usual 10 sec interval (or more frequent if you prefer) at about 0. [sent-5, score-1.001]
5 1% system overhead while sending a subset at a lower rate to the ganglia gmonds. [sent-6, score-0.639]
6 This would give you the best of both worlds but I don't know if people are too married to the centralized concept to try something different. [sent-7, score-0.635]
7 I don't know how many people who follow this forum have actually tried it, I know at least a few of you have, but to learn more just go to http://collectl. [sent-8, score-0.558]
8 net/ and look at some of documentation or just download the rpm and type 'collectl'. [sent-10, score-0.315]
wordName wordTfidf (topN-words)
[('collectl', 0.339), ('ganglia', 0.269), ('log', 0.201), ('married', 0.19), ('sending', 0.183), ('camps', 0.169), ('sec', 0.169), ('camp', 0.165), ('interval', 0.148), ('awhile', 0.146), ('remotely', 0.144), ('granularity', 0.14), ('interface', 0.138), ('agents', 0.138), ('says', 0.136), ('know', 0.135), ('anything', 0.129), ('collector', 0.126), ('rpm', 0.125), ('monitors', 0.125), ('worlds', 0.125), ('evolving', 0.118), ('forum', 0.115), ('documentation', 0.114), ('frequent', 0.114), ('nodes', 0.112), ('prefer', 0.11), ('since', 0.109), ('locally', 0.107), ('ranging', 0.106), ('subset', 0.104), ('essentially', 0.102), ('usual', 0.1), ('logging', 0.098), ('centralized', 0.097), ('possibly', 0.097), ('tried', 0.093), ('perspective', 0.092), ('central', 0.088), ('concept', 0.088), ('posts', 0.088), ('variety', 0.087), ('wanted', 0.085), ('overhead', 0.083), ('detailed', 0.081), ('follow', 0.08), ('necessary', 0.079), ('said', 0.078), ('data', 0.076), ('download', 0.076)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 553 high scalability-2009-04-03-Collectl interface to Ganglia - any interest?
Introduction: It's been awhile since I've said anything about collectl and I wanted to let this group know I'm currently working on an interface to ganglia since I've seen a variety of posts ranging from how much data to log and where to log it as well as which tools/mechanism to use for logging. From my perspective there are essentially 2 camps on the monitoring front - one says to have distributed agents all sending their data to a central point, but don't send too much or too often. The other camp (which is the one I'm in) says do it all locally with a highly efficient data collector, because you need a lot of data (I also read a post in here about logging everything) and you can't possibly monitors 100s or 1Ks of nodes remotely at the granularity necessary to get anything meaningful. Enter collectl and its evolving interface for ganglia. This will allow you to log lots of detailed data on local nodes at the usual 10 sec interval (or more frequent if you prefer) at about 0.1% system overhe
2 0.36072877 558 high scalability-2009-04-06-How do you monitor the performance of your cluster?
Introduction: I had posted a note the other day about collectl and its ganglia interface but perhaps I wasn't provocative enough to get any responses so let me ask it a different way, specifically how do people monitor their clusters and more importantly how often? Do you monitor to get a general sense of what the system is doing OR do you monitor with the expectation that when something goes wrong you'll have enough data to diagnose the problem? Or both? I suspect both... Many cluster-based monitoring tools tend to have a data collection daemon running on each target node which periodically sends data to some central management station. That machine typically writes the data to some database from which it can then extract historical plots. Some even put up graphics in real-time. From my experience working with large clusters - and I'm talking either many hundreds or even 1000s of nodes, most have to limit both the amount of data they manage centrally as well as the frequency that they
3 0.18829516 77 high scalability-2007-08-30-Log Everything All the Time
Introduction: This JoelOnSoftware thread asks the age old question of what and how to log. The usual trace/error/warning/info advice is totally useless in a large scale distributed system. Instead, you need to log everything all the time so you can solve problems that have already happened across a potentially huge range of servers. Yes, it can be done. To see why the typical logging approach is broken, imagine this scenario: Your site has been up and running great for weeks. No problems. A foreshadowing beeper goes off at 2AM. It seems some users can no longer add comments to threads. Then you hear the debugging deathknell: it's an intermittent problem and customers are pissed. Fix it. Now. So how are you going to debug this? The monitoring system doesn't show any obvious problems or errors. You quickly post a comment and it works fine. This won't be easy. So you think. Commenting involves a bunch of servers and networks. There's the load balancer, spam filter, web server, database server,
Introduction: Todd had originally posted an entry on collectl here at Collectl - Performance Data Collector . Collectl collects real-time data from a large number of subsystems like buddyinfo, cpu, disk, inodes, infiniband, lustre, memory, network, nfs, processes, quadrics, slabs, sockets and tcp, all using one tool and in one consistent format. Since then a lot has happened. It's now part of both Fedora and Debian distros, not to mention several others. There has also been a pretty good summary written up by Joe Brockmeier . It's also pretty well documented (I like to think) on sourceforge . There have also been a few blog postings by Martin Bach on his blog. Anyhow, awhile back I released a new version of collectl-utils and gave a complete face-lift to one of the utilities, colmux, which is a collectl multiplexor. This tool has the ability to run collectl on multiple systems, which in turn send all their output back to colmux. Colmux then sorts the output on a user-specified column
5 0.17342745 719 high scalability-2009-10-09-Have you collectl'd yet? If not, maybe collectl-utils will make it easier to do so
Introduction: I'm not sure how many people who follow this have even tried collectl but I wanted to let you all know that I just released a set of utilities called strangely enough collectl-utils, which you can get at http://collectl-utils.sourceforge.net . One web-based utility called colplot gives you the ability to very easily plot data from multiple systems in a way that makes correlating them over time very easy. Another utility called colmux lets you look at multiple systems in real time. In fact if you go the page that describes it in more detail you'll see a photo which shows the CPU loads on 192 systems one a second, one set of data/line! in fact the display so wide it takes 3 large monitors side-by-side to see it all and even though you can't actually read the displays you can easily see which systems are loaded and which aren't. Anyhow give it a look and let me know what you think. -mark
6 0.14575593 233 high scalability-2008-01-30-How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
8 0.12037378 323 high scalability-2008-05-19-Twitter as a scalability case study
9 0.11759448 237 high scalability-2008-02-03-Product: Collectl - Performance Data Collector
10 0.11425724 30 high scalability-2007-07-26-Product: AWStats a Log Analyzer
11 0.10644431 37 high scalability-2007-07-28-Product: Web Log Storming
12 0.1050081 541 high scalability-2009-03-16-Product: Smart Inspect
13 0.097988546 449 high scalability-2008-11-24-Product: Scribe - Facebook's Scalable Logging System
14 0.089433089 937 high scalability-2010-11-09-Paper: Hyder - Scaling Out without Partitioning
15 0.086451098 1578 high scalability-2014-01-14-Ask HS: Design and Implementation of scalable services?
16 0.081448123 1008 high scalability-2011-03-22-Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day
17 0.076701649 105 high scalability-2007-10-01-Statistics Logging Scalability
18 0.076154724 821 high scalability-2010-05-03-MocoSpace Architecture - 3 Billion Mobile Page Views a Month
19 0.074839354 1492 high scalability-2013-07-17-How do you create a 100th Monkey software development culture?
20 0.07434047 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
topicId topicWeight
[(0, 0.138), (1, 0.047), (2, -0.023), (3, -0.017), (4, 0.019), (5, 0.008), (6, 0.045), (7, 0.036), (8, 0.029), (9, -0.006), (10, -0.009), (11, 0.044), (12, 0.025), (13, -0.04), (14, 0.094), (15, 0.0), (16, 0.035), (17, 0.001), (18, -0.055), (19, -0.0), (20, -0.009), (21, -0.048), (22, -0.053), (23, 0.132), (24, 0.045), (25, -0.044), (26, -0.043), (27, 0.011), (28, -0.056), (29, -0.022), (30, -0.04), (31, -0.102), (32, 0.03), (33, 0.002), (34, -0.039), (35, 0.044), (36, -0.002), (37, -0.04), (38, 0.047), (39, 0.014), (40, 0.011), (41, 0.029), (42, 0.013), (43, -0.009), (44, -0.01), (45, 0.045), (46, 0.02), (47, -0.003), (48, 0.0), (49, -0.022)]
simIndex simValue blogId blogTitle
same-blog 1 0.91502631 553 high scalability-2009-04-03-Collectl interface to Ganglia - any interest?
Introduction: It's been awhile since I've said anything about collectl and I wanted to let this group know I'm currently working on an interface to ganglia since I've seen a variety of posts ranging from how much data to log and where to log it as well as which tools/mechanism to use for logging. From my perspective there are essentially 2 camps on the monitoring front - one says to have distributed agents all sending their data to a central point, but don't send too much or too often. The other camp (which is the one I'm in) says do it all locally with a highly efficient data collector, because you need a lot of data (I also read a post in here about logging everything) and you can't possibly monitors 100s or 1Ks of nodes remotely at the granularity necessary to get anything meaningful. Enter collectl and its evolving interface for ganglia. This will allow you to log lots of detailed data on local nodes at the usual 10 sec interval (or more frequent if you prefer) at about 0.1% system overhe
2 0.8001712 77 high scalability-2007-08-30-Log Everything All the Time
Introduction: This JoelOnSoftware thread asks the age old question of what and how to log. The usual trace/error/warning/info advice is totally useless in a large scale distributed system. Instead, you need to log everything all the time so you can solve problems that have already happened across a potentially huge range of servers. Yes, it can be done. To see why the typical logging approach is broken, imagine this scenario: Your site has been up and running great for weeks. No problems. A foreshadowing beeper goes off at 2AM. It seems some users can no longer add comments to threads. Then you hear the debugging deathknell: it's an intermittent problem and customers are pissed. Fix it. Now. So how are you going to debug this? The monitoring system doesn't show any obvious problems or errors. You quickly post a comment and it works fine. This won't be easy. So you think. Commenting involves a bunch of servers and networks. There's the load balancer, spam filter, web server, database server,
Introduction: This is a guest post by Gordon Worley , a Software Engineer at Korrelate , where they correlate (see what they did there) online purchases to offline purchases. Several weeks ago, we came into the office one morning to find every server alarm going off. Pixel log processing was behind by 8 hours and not making headway. Checking the logs, we discovered that a big client had come online during the night and was giving us 10 times more traffic than we were originally told to expect. I wouldn’t say we panicked, but the office was certainly more jittery than usual. Over the next several hours, though, thanks both to foresight and quick thinking, we were able to scale up to handle the added load and clear the backlog to return log processing to a steady state. At Korrelate, we deploy tracking pixels , also known beacons or web bugs, that our partners use to send us information about their users. These tiny web objects contain no visible content, but may include transparent 1 by 1 gif
4 0.7416749 449 high scalability-2008-11-24-Product: Scribe - Facebook's Scalable Logging System
Introduction: In Log Everything All the Time I advocate applications shouldn't bother logging at all. Why waste all that time and code? No, wait, that's not right. I preach logging everything all the time. Doh. Facebook obviously feels similarly which is why they opened sourced Scribe , their internal logging system, capable of logging 10s of billions of messages per day. These messages include access logs, performance statistics, actions that went to News Feed, and many others. Imagine hundreds of thousands of machines across many geographical dispersed datacenters just aching to send their precious log payload to the central repository off all knowledge. Because really, when you combine all the meta data with all the events you pretty much have a complete picture of your operations. Once in the central repository logs can be scanned, indexed, summarized, aggregated, refactored, diced, data cubed, and mined for every scrap of potentially useful information. Just imagine the log stream from a
5 0.71726733 541 high scalability-2009-03-16-Product: Smart Inspect
Introduction: Smart Inspect has added quite a few features specifically tailored to high scalability and high performance environments to our tool over the years. This includes the ability to log to memory and dump log files on demand (when a crash occurs for example), special backlog queue features, a log service application for central log storage and a lot more. Additionally, our SmartInspect Console (the viewer application) makes viewing, filtering and inspecting large amounts of logging data a lot easier/practical.
6 0.68345195 30 high scalability-2007-07-26-Product: AWStats a Log Analyzer
7 0.67179227 233 high scalability-2008-01-30-How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
8 0.66291225 237 high scalability-2008-02-03-Product: Collectl - Performance Data Collector
9 0.66116112 558 high scalability-2009-04-06-How do you monitor the performance of your cluster?
10 0.65588707 1498 high scalability-2013-08-07-RAFT - In Search of an Understandable Consensus Algorithm
11 0.6527698 937 high scalability-2010-11-09-Paper: Hyder - Scaling Out without Partitioning
12 0.64904606 719 high scalability-2009-10-09-Have you collectl'd yet? If not, maybe collectl-utils will make it easier to do so
13 0.64794129 1104 high scalability-2011-08-25-Colmux - Finding Memory Leaks, High I-O Wait Times, and Hotness on 3000 Node Clusters
14 0.64523852 45 high scalability-2007-07-30-Product: SmarterStats
15 0.61965001 105 high scalability-2007-10-01-Statistics Logging Scalability
16 0.61696446 304 high scalability-2008-04-19-How to build a real-time analytics system?
17 0.61005622 295 high scalability-2008-04-02-Product: Supervisor - Monitor and Control Your Processes
18 0.6009165 36 high scalability-2007-07-28-Product: Web Log Expert
19 0.60015237 488 high scalability-2009-01-08-file synchronization solutions
20 0.59820771 570 high scalability-2009-04-15-Implementing large scale web analytics
topicId topicWeight
[(1, 0.174), (2, 0.213), (10, 0.071), (57, 0.246), (61, 0.108), (77, 0.025), (94, 0.057)]
simIndex simValue blogId blogTitle
1 0.95033604 159 high scalability-2007-11-18-Reverse Proxy
Introduction: Hi, I saw an year ago that Netapp sold netcache to blu-coat, my site is a heavy NetCache user and we cached 83% of our site. We tested with Blue-coat and F5 WA and we are not getting same performce as NetCache. Any of you guys have the same issue? or somebody knows another product can handle much traffic? Thanks Rodrigo
2 0.91928804 1144 high scalability-2011-11-17-Five Misconceptions on Cloud Portability
Introduction: The term "cloud portability" is often considered a synonym for "Cloud API portability," which implies a series of misconceptions. If we break away from dogma, we can find that what we really looking for in cloud portability is Application portability between clouds which can be a vastly simpler requirement, as we can achieve application portability without settling on a common Cloud API. In this post i'll be covering five common misconceptions people have WRT to cloud portability. Cloud portability = Cloud API portability . API portability is easy; cloud API portability is not. The main incentive for Cloud Portability is - Avoiding Vendor lock-in .Cloud portability is more about business agility than it is about vendor lock-in. Cloud portability isn’t for startups . Every startup that is expecting rapid growth should re-examine their deployments and plan for cloud portability rather than wait to be forced to make the switch when you are least prepared to do so.
3 0.9064585 731 high scalability-2009-10-28-Need for change in your IT infrastructure
Introduction: Companies earnings outstrip forecasts , consumer confidence is retuning and city bonuses are back . What does this mean for business? Growth! After the recent years of cost cutting in IT budgets, there is the sudden fear induced from increased demand. Pre-existing trouble points in IT infrastructures that have lain dormant will suddenly be exposed. Monthly reporting and real time analytics will suffer as data grows. IT departments across the land will be crying out “The engine canna take no more captain”. What can be done? What we need is a scalable system that grows with the business. A system that can handle sudden increases in data growth without falling over. There are two core principles to a scalable system (1) Users experience constant QoS as demand grows (2) System Architects can grow system capacity proportionally with the available resources. In other words, if demand increases twofold, it is “enough” to purchase twice the hardware. This is linear growth. Is it e
4 0.89887226 968 high scalability-2011-01-04-Map-Reduce With Ruby Using Hadoop
Introduction: A demonstration, with repeatable steps, of how to quickly fire-up a Hadoop cluster on Amazon EC2, load data onto the HDFS (Hadoop Distributed File-System), write map-reduce scripts in Ruby and use them to run a map-reduce job on your Hadoop cluster. You will not need to ssh into the cluster, as all tasks are run from your local machine. Below I am using my MacBook Pro as my local machine, but the steps I have provided should be reproducible on other platforms running bash and Java. Fire-Up Your Hadoop Cluster I choose the Cloudera distribution of Hadoop which is still 100% Apache licensed, but has some additional benefits. One of these benefits is that it is released by Doug Cutting , who started Hadoop and drove it’s development at Yahoo! He also started Lucene , which is another of my favourite Apache Projects, so I have good faith that he knows what he is doing. Another benefit, as you will see, is that it is simple to fire-up a Hadoop cluster. I am going to use C
5 0.87604475 807 high scalability-2010-04-09-Vagrant - Build and Deploy Virtualized Development Environments Using Ruby
Introduction: One of the cool things we are seeing is more tools and tool chains for performing very high level operations quite simply. Vagrant is such a tool for building and distributing virtualized development environments . Web developers use virtual environments every day with their web applications. From EC2 and Rackspace Cloud to specialized solutions such as EngineYard and Heroku, virtualization is the tool of choice for easy deployment and infrastructure management. Vagrant aims to take those very same principles and put them to work in the heart of the application lifecycle. By providing easy to configure, lightweight, reproducible, and portable virtual machines targeted at development environments, Vagrant helps maximize your productivity and flexibility. If you've created a build and deployment system before Vagrant does a lot of the work for you: Automated virtual machine creation using Oracle’s VirtualBox Automated provisioning of virtual environments
6 0.87174439 433 high scalability-2008-10-29-CTL - Distributed Control Dispatching Framework
7 0.86584735 1211 high scalability-2012-03-19-LinkedIn: Creating a Low Latency Change Data Capture System with Databus
same-blog 8 0.84963995 553 high scalability-2009-04-03-Collectl interface to Ganglia - any interest?
9 0.84821159 218 high scalability-2008-01-17-Moving old to new. Do not be afraid of the re-write -- but take some help
10 0.84135979 1138 high scalability-2011-11-07-10 Core Architecture Pattern Variations for Achieving Scalability
11 0.82811427 6 high scalability-2007-07-11-Friendster Architecture
12 0.80056256 855 high scalability-2010-07-11-So, Why is Twitter Really Not Using Cassandra to Store Tweets?
13 0.79445583 232 high scalability-2008-01-29-When things aren't scalable
14 0.77969623 351 high scalability-2008-07-16-The Mother of All Database Normalization Debates on Coding Horror
15 0.77331322 1087 high scalability-2011-07-26-Web 2.0 Killed the Middleware Star
16 0.75732601 64 high scalability-2007-08-10-How do we make a large real-time search engine?
17 0.75388163 1329 high scalability-2012-09-26-WordPress.com Serves 70,000 req-sec and over 15 Gbit-sec of Traffic using NGINX
18 0.75298655 691 high scalability-2009-08-31-Squarespace Architecture - A Grid Handles Hundreds of Millions of Requests a Month
19 0.75218797 1507 high scalability-2013-08-26-Reddit: Lessons Learned from Mistakes Made Scaling to 1 Billion Pageviews a Month
20 0.75135303 1312 high scalability-2012-08-27-Zoosk - The Engineering behind Real Time Communications