high_scalability high_scalability-2007 high_scalability-2007-30 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: AWStats is a free powerful and featureful tool that generates advanced web, streaming, ftp or mail server statistics, graphically. This log analyzer works as a CGI or from command line and shows you all possible information your log contains, in few graphical web pages. It uses a partial information file to be able to process large log files, often and quickly. It can analyze log files from all major server tools like Apache log files (NCSA combined/XLF/ELF log format or common/CLF log format), WebStar, IIS (W3C log format) and a lot of other web, proxy, wap, streaming servers, mail servers and some ftp servers.
sentIndex sentText sentNum sentScore
1 AWStats is a free powerful and featureful tool that generates advanced web, streaming, ftp or mail server statistics, graphically. [sent-1, score-1.22]
2 This log analyzer works as a CGI or from command line and shows you all possible information your log contains, in few graphical web pages. [sent-2, score-1.953]
3 It uses a partial information file to be able to process large log files, often and quickly. [sent-3, score-1.045]
4 It can analyze log files from all major server tools like Apache log files (NCSA combined/XLF/ELF log format or common/CLF log format), WebStar, IIS (W3C log format) and a lot of other web, proxy, wap, streaming servers, mail servers and some ftp servers. [sent-4, score-4.687]
wordName wordTfidf (topN-words)
[('log', 0.563), ('ftp', 0.318), ('format', 0.302), ('mail', 0.256), ('wap', 0.213), ('files', 0.207), ('featureful', 0.2), ('streaming', 0.175), ('analyzer', 0.173), ('cgi', 0.165), ('iis', 0.156), ('graphical', 0.147), ('partial', 0.119), ('generates', 0.116), ('proxy', 0.103), ('contains', 0.101), ('statistics', 0.101), ('command', 0.1), ('servers', 0.091), ('analyze', 0.09), ('information', 0.087), ('advanced', 0.086), ('web', 0.078), ('shows', 0.074), ('powerful', 0.073), ('apache', 0.073), ('line', 0.069), ('major', 0.064), ('tool', 0.063), ('server', 0.057), ('tools', 0.055), ('often', 0.053), ('file', 0.052), ('free', 0.051), ('possible', 0.05), ('uses', 0.049), ('works', 0.049), ('able', 0.046), ('process', 0.043), ('large', 0.033), ('lot', 0.032), ('like', 0.018)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 30 high scalability-2007-07-26-Product: AWStats a Log Analyzer
Introduction: AWStats is a free powerful and featureful tool that generates advanced web, streaming, ftp or mail server statistics, graphically. This log analyzer works as a CGI or from command line and shows you all possible information your log contains, in few graphical web pages. It uses a partial information file to be able to process large log files, often and quickly. It can analyze log files from all major server tools like Apache log files (NCSA combined/XLF/ELF log format or common/CLF log format), WebStar, IIS (W3C log format) and a lot of other web, proxy, wap, streaming servers, mail servers and some ftp servers.
2 0.33573663 233 high scalability-2008-01-30-How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
Introduction: How do you query hundreds of gigabytes of new data each day streaming in from over 600 hyperactive servers? If you think this sounds like the perfect battle ground for a head-to-head skirmish in the great MapReduce Versus Database War , you would be correct. Bill Boebel, CTO of Mailtrust (Rackspace's mail division), has generously provided a fascinating account of how they evolved their log processing system from an early amoeba'ic text file stored on each machine approach, to a Neandertholic relational database solution that just couldn't compete, and finally to a Homo sapien'ic Hadoop based solution that works wisely for them and has virtually unlimited scalability potential. Rackspace faced a now familiar problem. Lots and lots of data streaming in. Where do you store all that data? How do you do anything useful with it? In the first version of their system logs were stored in flat text files and had to be manually searched by engineers logging into each individual machine. T
Introduction: This is a guest post by Gordon Worley , a Software Engineer at Korrelate , where they correlate (see what they did there) online purchases to offline purchases. Several weeks ago, we came into the office one morning to find every server alarm going off. Pixel log processing was behind by 8 hours and not making headway. Checking the logs, we discovered that a big client had come online during the night and was giving us 10 times more traffic than we were originally told to expect. I wouldn’t say we panicked, but the office was certainly more jittery than usual. Over the next several hours, though, thanks both to foresight and quick thinking, we were able to scale up to handle the added load and clear the backlog to return log processing to a steady state. At Korrelate, we deploy tracking pixels , also known beacons or web bugs, that our partners use to send us information about their users. These tiny web objects contain no visible content, but may include transparent 1 by 1 gif
4 0.28348666 37 high scalability-2007-07-28-Product: Web Log Storming
Introduction: Web Log Storming is an interactive, desktop-based Web Log Analyzer for Windows. The whole new concept of log analysis makes it clearly different from any other web log analyzer. Browse through statistics to get into details - down to individual visitor's session. Check individual visitor behavior pattern and how it fits into your desired scenario. Web Log Storming does far more than just generate common reports - it displays detailed web site statistics with interactive graphs and reports. Very complete detailed log analysis of activity from every visitor to your web site is only a mouse-click away. In other words, analyze your web logs like never before! It's easy to track sessions, hits, page views, downloads, or whatever metric is most important to each user. You can look at referring pages and see which search engines and keywords were used to bring visitors to the site. Web site behavior, from the top entry and exit pages, to the paths that users follow, can be analyzed. You
5 0.27000982 77 high scalability-2007-08-30-Log Everything All the Time
Introduction: This JoelOnSoftware thread asks the age old question of what and how to log. The usual trace/error/warning/info advice is totally useless in a large scale distributed system. Instead, you need to log everything all the time so you can solve problems that have already happened across a potentially huge range of servers. Yes, it can be done. To see why the typical logging approach is broken, imagine this scenario: Your site has been up and running great for weeks. No problems. A foreshadowing beeper goes off at 2AM. It seems some users can no longer add comments to threads. Then you hear the debugging deathknell: it's an intermittent problem and customers are pissed. Fix it. Now. So how are you going to debug this? The monitoring system doesn't show any obvious problems or errors. You quickly post a comment and it works fine. This won't be easy. So you think. Commenting involves a bunch of servers and networks. There's the load balancer, spam filter, web server, database server,
6 0.23025817 541 high scalability-2009-03-16-Product: Smart Inspect
7 0.21310468 105 high scalability-2007-10-01-Statistics Logging Scalability
8 0.20861992 36 high scalability-2007-07-28-Product: Web Log Expert
9 0.20449072 937 high scalability-2010-11-09-Paper: Hyder - Scaling Out without Partitioning
10 0.18136902 208 high scalability-2008-01-11-FTP Sanity: Redundancy, archiving, consolidation.
11 0.17725281 35 high scalability-2007-07-28-Product: FastStats Log Analyzer
12 0.15631682 570 high scalability-2009-04-15-Implementing large scale web analytics
13 0.15205415 304 high scalability-2008-04-19-How to build a real-time analytics system?
14 0.13979743 449 high scalability-2008-11-24-Product: Scribe - Facebook's Scalable Logging System
15 0.12531279 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
16 0.12434482 1008 high scalability-2011-03-22-Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day
17 0.12173943 80 high scalability-2007-09-06-Product: Perdition Mail Retrieval Proxy
18 0.11425724 553 high scalability-2009-04-03-Collectl interface to Ganglia - any interest?
19 0.11382544 1640 high scalability-2014-04-30-10 Tips for Optimizing NGINX and PHP-fpm for High Traffic Sites
20 0.098397657 283 high scalability-2008-03-18-Shared filesystem on EC2
topicId topicWeight
[(0, 0.094), (1, 0.02), (2, -0.018), (3, -0.113), (4, 0.011), (5, 0.009), (6, 0.105), (7, -0.008), (8, 0.057), (9, 0.096), (10, 0.012), (11, -0.045), (12, 0.045), (13, -0.109), (14, 0.104), (15, 0.035), (16, 0.024), (17, 0.023), (18, -0.038), (19, -0.002), (20, 0.061), (21, -0.13), (22, -0.126), (23, 0.244), (24, 0.198), (25, -0.05), (26, -0.145), (27, 0.041), (28, -0.003), (29, -0.053), (30, -0.082), (31, -0.151), (32, 0.03), (33, -0.075), (34, -0.1), (35, 0.02), (36, -0.126), (37, 0.019), (38, 0.152), (39, -0.064), (40, 0.001), (41, 0.093), (42, 0.005), (43, -0.078), (44, -0.067), (45, -0.114), (46, 0.067), (47, 0.017), (48, -0.009), (49, -0.062)]
simIndex simValue blogId blogTitle
same-blog 1 0.98577106 30 high scalability-2007-07-26-Product: AWStats a Log Analyzer
Introduction: AWStats is a free powerful and featureful tool that generates advanced web, streaming, ftp or mail server statistics, graphically. This log analyzer works as a CGI or from command line and shows you all possible information your log contains, in few graphical web pages. It uses a partial information file to be able to process large log files, often and quickly. It can analyze log files from all major server tools like Apache log files (NCSA combined/XLF/ELF log format or common/CLF log format), WebStar, IIS (W3C log format) and a lot of other web, proxy, wap, streaming servers, mail servers and some ftp servers.
2 0.87500685 541 high scalability-2009-03-16-Product: Smart Inspect
Introduction: Smart Inspect has added quite a few features specifically tailored to high scalability and high performance environments to our tool over the years. This includes the ability to log to memory and dump log files on demand (when a crash occurs for example), special backlog queue features, a log service application for central log storage and a lot more. Additionally, our SmartInspect Console (the viewer application) makes viewing, filtering and inspecting large amounts of logging data a lot easier/practical.
Introduction: This is a guest post by Gordon Worley , a Software Engineer at Korrelate , where they correlate (see what they did there) online purchases to offline purchases. Several weeks ago, we came into the office one morning to find every server alarm going off. Pixel log processing was behind by 8 hours and not making headway. Checking the logs, we discovered that a big client had come online during the night and was giving us 10 times more traffic than we were originally told to expect. I wouldn’t say we panicked, but the office was certainly more jittery than usual. Over the next several hours, though, thanks both to foresight and quick thinking, we were able to scale up to handle the added load and clear the backlog to return log processing to a steady state. At Korrelate, we deploy tracking pixels , also known beacons or web bugs, that our partners use to send us information about their users. These tiny web objects contain no visible content, but may include transparent 1 by 1 gif
4 0.8064658 35 high scalability-2007-07-28-Product: FastStats Log Analyzer
Introduction: FastStats Log Analyzer enables you to: Determine whether your CPC advertising is profitable: Are you spending $0.75 per click on Google or Overture, but only receiving $0.56 per click in revenue? Tune site traffic patterns: FastStats's Hyperlink Tree View feature lets you visually see how traffic flows through your web site. High-performance solution for even the busiest web sites: Our software has been clocked at over 1000 MB/min. Other popular log file analysis tools (we won't name names), run at 1/40th the speed. We've been in the business for over 6 years, delivering value, quality, and good customer service to our clients. Our products are used for data mining at some of the world's busiest web sites -- why not give FastStats a try at your web site? FastStats log file analysis supports a wide variety of web server log files, including Apache logs and Microsoft IIS logs.
5 0.79840404 37 high scalability-2007-07-28-Product: Web Log Storming
Introduction: Web Log Storming is an interactive, desktop-based Web Log Analyzer for Windows. The whole new concept of log analysis makes it clearly different from any other web log analyzer. Browse through statistics to get into details - down to individual visitor's session. Check individual visitor behavior pattern and how it fits into your desired scenario. Web Log Storming does far more than just generate common reports - it displays detailed web site statistics with interactive graphs and reports. Very complete detailed log analysis of activity from every visitor to your web site is only a mouse-click away. In other words, analyze your web logs like never before! It's easy to track sessions, hits, page views, downloads, or whatever metric is most important to each user. You can look at referring pages and see which search engines and keywords were used to bring visitors to the site. Web site behavior, from the top entry and exit pages, to the paths that users follow, can be analyzed. You
6 0.7780987 937 high scalability-2010-11-09-Paper: Hyder - Scaling Out without Partitioning
7 0.73977578 77 high scalability-2007-08-30-Log Everything All the Time
8 0.7373957 449 high scalability-2008-11-24-Product: Scribe - Facebook's Scalable Logging System
9 0.70122802 36 high scalability-2007-07-28-Product: Web Log Expert
10 0.67179155 233 high scalability-2008-01-30-How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
11 0.65657252 570 high scalability-2009-04-15-Implementing large scale web analytics
12 0.60889477 105 high scalability-2007-10-01-Statistics Logging Scalability
13 0.57429689 45 high scalability-2007-07-30-Product: SmarterStats
14 0.56629342 553 high scalability-2009-04-03-Collectl interface to Ganglia - any interest?
15 0.55934107 304 high scalability-2008-04-19-How to build a real-time analytics system?
16 0.54191703 1640 high scalability-2014-04-30-10 Tips for Optimizing NGINX and PHP-fpm for High Traffic Sites
17 0.53791648 1196 high scalability-2012-02-20-Berkeley DB Architecture - NoSQL Before NoSQL was Cool
18 0.49260542 186 high scalability-2007-12-13-un-article: the setup behind microsoft.com
19 0.4600856 1301 high scalability-2012-08-08-3 Tips and Tools for Creating Reliable Billion Page View Web Services
20 0.43939474 1096 high scalability-2011-08-10-LevelDB - Fast and Lightweight Key-Value Database From the Authors of MapReduce and BigTable
topicId topicWeight
[(1, 0.184), (2, 0.089), (85, 0.017), (93, 0.422), (94, 0.128)]
simIndex simValue blogId blogTitle
1 0.91403145 175 high scalability-2007-12-05-how to: Load Balancing with iis
Introduction: he l l o wor l d, can you te l l me how i can i mp l ement a l oad ba l anc i ng of a web s i te runn i ng under i i s - w i ndows server 2003/08
2 0.80644387 403 high scalability-2008-10-06-Paper: Scaling Genome Sequencing - Complete Genomics Technology Overview
Introduction: Although the problem of scaling human genome sequencing is not exactly about building bigger, faster and more reliable websites it is most interesting in terms of scalability. The paper describes a new technology by the startup company Complete Genomics to sequence the full human genome for the fraction of the cost of earlier possibilities. Complete Genomics is building the world’s largest commercial human genome sequencing center to provide turnkey, outsourced complete human genome sequencing to customers worldwide. By 2010, their data center will contain approximately 60,000 processors with 30 petabytes of storage running their sequencing software on Linux clusters. Do you find this interesting and relevant to HighScalability.com?
same-blog 3 0.69931436 30 high scalability-2007-07-26-Product: AWStats a Log Analyzer
Introduction: AWStats is a free powerful and featureful tool that generates advanced web, streaming, ftp or mail server statistics, graphically. This log analyzer works as a CGI or from command line and shows you all possible information your log contains, in few graphical web pages. It uses a partial information file to be able to process large log files, often and quickly. It can analyze log files from all major server tools like Apache log files (NCSA combined/XLF/ELF log format or common/CLF log format), WebStar, IIS (W3C log format) and a lot of other web, proxy, wap, streaming servers, mail servers and some ftp servers.
4 0.55976987 349 high scalability-2008-07-10-Can cloud computing smite down evil zombie botnet armies?
Introduction: In the more cool stuff I've never heard of before department is something called Self Cleansing Intrusion Tolerance (SCIT). Botnets are created when vulnerable computers live long enough to become infected with the will to do the evil bidding of their evil masters. Security is almost always about removing vulnerabilities (a process which to outside observers often looks like a dog chasing its tail ). SCIT takes a different approach, it works on the availability angle. Something I never thought of before, but which makes a great deal of sense once I thought about it. With SCIT you stop and restart VM instances every minute (or whatever depending in your desired window vulnerability).... This short exposure window means worms and viri do not have long enough to fully infect a machine and carry out a coordinated attack. A machine is up for a while. Does work. And then is torn down again only to be reborn as a clean VM with no possibility of infection (unless of course the VM
5 0.54123336 168 high scalability-2007-11-30-Strategy: Efficiently Geo-referencing IPs
Introduction: A lot of apps need to map IP addresses to locations. Jeremy Cole in On efficiently geo-referencing IPs with MaxMind GeoIP and MySQL GIS succinctly explains the many uses for such a feature: Geo-referencing IPs is, in a nutshell, converting an IP address, perhaps from an incoming web visitor, a log file, a data file, or some other place, into the name of some entity owning that IP address. There are a lot of reasons you may want to geo-reference IP addresses to country, city, etc., such as in simple ad targeting systems, geographic load balancing, web analytics, and many more applications. This is difficult to do efficiently, at least it gives me a bit of brain freeze. In the same post Jeremy nicely explains where to get the geo-rereferncing data, how to load data, and the performance of different approaches for IP address searching. It's a great practical introduction to the subject.
6 0.53721869 58 high scalability-2007-08-04-Product: Cacti
7 0.53304023 1450 high scalability-2013-05-01-Myth: Eric Brewer on Why Banks are BASE Not ACID - Availability Is Revenue
8 0.48674536 241 high scalability-2008-02-05-SLA monitoring
9 0.48118043 1330 high scalability-2012-09-28-Stuff The Internet Says On Scalability For September 28, 2012
10 0.47294194 573 high scalability-2009-04-16-Serving 250M quotes-day at CNBC.com with aiCache
11 0.46808779 944 high scalability-2010-11-17-Some Services are More Equal than Others
12 0.46729904 1160 high scalability-2011-12-21-In Memory Data Grid Technologies
13 0.46093494 1513 high scalability-2013-09-06-Stuff The Internet Says On Scalability For September 6, 2013
14 0.4603962 1198 high scalability-2012-02-24-Stuff The Internet Says On Scalability For February 24, 2012
15 0.45712319 42 high scalability-2007-07-30-Product: GridLayer. Utility computing for online application
16 0.45557886 39 high scalability-2007-07-30-Product: Akamai
17 0.45311642 35 high scalability-2007-07-28-Product: FastStats Log Analyzer
19 0.44774133 616 high scalability-2009-06-02-GigaSpaces Launches a New Version of its Cloud Computing Framework