high_scalability high_scalability-2007 high_scalability-2007-77 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: This JoelOnSoftware thread asks the age old question of what and how to log. The usual trace/error/warning/info advice is totally useless in a large scale distributed system. Instead, you need to log everything all the time so you can solve problems that have already happened across a potentially huge range of servers. Yes, it can be done. To see why the typical logging approach is broken, imagine this scenario: Your site has been up and running great for weeks. No problems. A foreshadowing beeper goes off at 2AM. It seems some users can no longer add comments to threads. Then you hear the debugging deathknell: it's an intermittent problem and customers are pissed. Fix it. Now. So how are you going to debug this? The monitoring system doesn't show any obvious problems or errors. You quickly post a comment and it works fine. This won't be easy. So you think. Commenting involves a bunch of servers and networks. There's the load balancer, spam filter, web server, database server,
sentIndex sentText sentNum sentScore
1 Instead, you need to log everything all the time so you can solve problems that have already happened across a potentially huge range of servers. [sent-3, score-0.7]
2 You can't deploy a new build with more logging because that build has not been tested and you have no idea when the problem will happen again anyway. [sent-24, score-0.587]
3 You need to log everything that will help you diagnose any future problem. [sent-35, score-0.618]
4 Over time systems usually evolve to the point of logging everything. [sent-43, score-0.651]
5 But the problem is the logging isn't systematic or well thought out, which leads to poor coverage and poor performance. [sent-46, score-0.627]
6 Every hop a request takes should log meta information about how long the request took to process, how big the request was, what the status of the request was. [sent-59, score-1.073]
7 System is logging everything you need to log to debug the system. [sent-64, score-1.321]
8 Developers can add more detailed log levels for their code that can be turned on and off on a module by module basis. [sent-68, score-0.783]
9 But then I make each process have a command port hosting a simple embedded web server and telnet processor so you can change debug levels and other setting on the fly through the web or telnet interface. [sent-74, score-0.718]
10 There are lots of tricks you can use to make logging fast enough that you can do it all the time: Make logging efficient from the start so you aren't afraid to use it. [sent-86, score-1.111]
11 Create a dead simple to use log library that makes logging trivial for developers. [sent-87, score-1.068]
12 Log to a separate task and let the task push out log data when it can. [sent-91, score-0.592]
13 Use a preallocated buffer pool for log messages so memory allocation is just pop and push. [sent-92, score-0.796]
14 When it's not you can use reference counted data structures and do the formatting in the logging thread. [sent-95, score-0.665]
15 Don't do any formatting before it is determined the log is needed. [sent-98, score-0.55]
16 Make the log message directly queueable to the log task so queuing doesn't take more memory allocations. [sent-104, score-0.97]
17 Tie your logging system into your monitoring system so all the logging data from every process on every host winds its way to your centralized monitoring system. [sent-112, score-1.257]
18 Add a command ports to processes that make it easy to set program behaviors at run-time and view important statistics and logging information. [sent-117, score-0.58]
19 In large scale distributed systems logging data is all you have to debug most problems. [sent-119, score-0.775]
20 So log everything all the time and you may still get that call at 2AM, but at least you'll know you'll have a fighting chance to fix any problems that do come up. [sent-120, score-0.576]
wordName wordTfidf (topN-words)
[('logging', 0.529), ('log', 0.414), ('debug', 0.246), ('request', 0.148), ('formatting', 0.136), ('preallocated', 0.12), ('trace', 0.118), ('levels', 0.108), ('telnet', 0.102), ('module', 0.094), ('allocation', 0.092), ('task', 0.089), ('debugging', 0.088), ('mutex', 0.076), ('usually', 0.073), ('add', 0.073), ('diagnose', 0.072), ('system', 0.072), ('trivial', 0.07), ('sensitive', 0.07), ('need', 0.07), ('meta', 0.067), ('everything', 0.062), ('messages', 0.06), ('passed', 0.059), ('counts', 0.059), ('happen', 0.058), ('buffer', 0.057), ('dead', 0.055), ('process', 0.055), ('embedded', 0.054), ('beeper', 0.054), ('controllable', 0.054), ('foreshadowing', 0.054), ('timeso', 0.054), ('relevant', 0.054), ('happened', 0.054), ('afraid', 0.053), ('memory', 0.053), ('format', 0.051), ('hear', 0.051), ('problems', 0.051), ('command', 0.051), ('debugger', 0.051), ('drop', 0.05), ('time', 0.049), ('poor', 0.049), ('dear', 0.049), ('outliers', 0.049), ('unavoidable', 0.049)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000005 77 high scalability-2007-08-30-Log Everything All the Time
Introduction: This JoelOnSoftware thread asks the age old question of what and how to log. The usual trace/error/warning/info advice is totally useless in a large scale distributed system. Instead, you need to log everything all the time so you can solve problems that have already happened across a potentially huge range of servers. Yes, it can be done. To see why the typical logging approach is broken, imagine this scenario: Your site has been up and running great for weeks. No problems. A foreshadowing beeper goes off at 2AM. It seems some users can no longer add comments to threads. Then you hear the debugging deathknell: it's an intermittent problem and customers are pissed. Fix it. Now. So how are you going to debug this? The monitoring system doesn't show any obvious problems or errors. You quickly post a comment and it works fine. This won't be easy. So you think. Commenting involves a bunch of servers and networks. There's the load balancer, spam filter, web server, database server,
2 0.29765046 233 high scalability-2008-01-30-How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
Introduction: How do you query hundreds of gigabytes of new data each day streaming in from over 600 hyperactive servers? If you think this sounds like the perfect battle ground for a head-to-head skirmish in the great MapReduce Versus Database War , you would be correct. Bill Boebel, CTO of Mailtrust (Rackspace's mail division), has generously provided a fascinating account of how they evolved their log processing system from an early amoeba'ic text file stored on each machine approach, to a Neandertholic relational database solution that just couldn't compete, and finally to a Homo sapien'ic Hadoop based solution that works wisely for them and has virtually unlimited scalability potential. Rackspace faced a now familiar problem. Lots and lots of data streaming in. Where do you store all that data? How do you do anything useful with it? In the first version of their system logs were stored in flat text files and had to be manually searched by engineers logging into each individual machine. T
Introduction: This is a guest post by Gordon Worley , a Software Engineer at Korrelate , where they correlate (see what they did there) online purchases to offline purchases. Several weeks ago, we came into the office one morning to find every server alarm going off. Pixel log processing was behind by 8 hours and not making headway. Checking the logs, we discovered that a big client had come online during the night and was giving us 10 times more traffic than we were originally told to expect. I wouldn’t say we panicked, but the office was certainly more jittery than usual. Over the next several hours, though, thanks both to foresight and quick thinking, we were able to scale up to handle the added load and clear the backlog to return log processing to a steady state. At Korrelate, we deploy tracking pixels , also known beacons or web bugs, that our partners use to send us information about their users. These tiny web objects contain no visible content, but may include transparent 1 by 1 gif
4 0.27000982 30 high scalability-2007-07-26-Product: AWStats a Log Analyzer
Introduction: AWStats is a free powerful and featureful tool that generates advanced web, streaming, ftp or mail server statistics, graphically. This log analyzer works as a CGI or from command line and shows you all possible information your log contains, in few graphical web pages. It uses a partial information file to be able to process large log files, often and quickly. It can analyze log files from all major server tools like Apache log files (NCSA combined/XLF/ELF log format or common/CLF log format), WebStar, IIS (W3C log format) and a lot of other web, proxy, wap, streaming servers, mail servers and some ftp servers.
5 0.26049584 541 high scalability-2009-03-16-Product: Smart Inspect
Introduction: Smart Inspect has added quite a few features specifically tailored to high scalability and high performance environments to our tool over the years. This includes the ability to log to memory and dump log files on demand (when a crash occurs for example), special backlog queue features, a log service application for central log storage and a lot more. Additionally, our SmartInspect Console (the viewer application) makes viewing, filtering and inspecting large amounts of logging data a lot easier/practical.
6 0.23185658 449 high scalability-2008-11-24-Product: Scribe - Facebook's Scalable Logging System
7 0.21996391 815 high scalability-2010-04-27-Paper: Dapper, Google's Large-Scale Distributed Systems Tracing Infrastructure
8 0.18829516 553 high scalability-2009-04-03-Collectl interface to Ganglia - any interest?
9 0.18279076 37 high scalability-2007-07-28-Product: Web Log Storming
10 0.15734568 105 high scalability-2007-10-01-Statistics Logging Scalability
11 0.15357697 1368 high scalability-2012-12-07-Stuff The Internet Says On Scalability For December 7, 2012
12 0.15191118 937 high scalability-2010-11-09-Paper: Hyder - Scaling Out without Partitioning
13 0.14899439 1413 high scalability-2013-02-27-42 Monster Problems that Attack as Loads Increase
14 0.14660503 1196 high scalability-2012-02-20-Berkeley DB Architecture - NoSQL Before NoSQL was Cool
15 0.14114611 1008 high scalability-2011-03-22-Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day
16 0.14053091 304 high scalability-2008-04-19-How to build a real-time analytics system?
17 0.13378 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
18 0.13234335 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
19 0.13102201 1425 high scalability-2013-03-18-Beyond Threads and Callbacks - Application Architecture Pros and Cons
20 0.12577944 1197 high scalability-2012-02-21-Pixable Architecture - Crawling, Analyzing, and Ranking 20 Million Photos a Day
topicId topicWeight
[(0, 0.218), (1, 0.112), (2, -0.047), (3, -0.047), (4, 0.02), (5, -0.0), (6, 0.118), (7, 0.097), (8, -0.02), (9, -0.021), (10, -0.012), (11, 0.036), (12, 0.036), (13, -0.067), (14, 0.077), (15, -0.012), (16, 0.043), (17, -0.001), (18, -0.037), (19, 0.04), (20, 0.049), (21, -0.13), (22, -0.053), (23, 0.196), (24, 0.155), (25, -0.053), (26, -0.088), (27, 0.081), (28, -0.002), (29, -0.028), (30, -0.052), (31, -0.133), (32, 0.077), (33, -0.027), (34, -0.041), (35, 0.02), (36, -0.084), (37, 0.031), (38, 0.107), (39, -0.022), (40, 0.015), (41, 0.076), (42, 0.029), (43, -0.022), (44, -0.061), (45, -0.107), (46, 0.039), (47, 0.019), (48, -0.02), (49, -0.056)]
simIndex simValue blogId blogTitle
same-blog 1 0.96659505 77 high scalability-2007-08-30-Log Everything All the Time
Introduction: This JoelOnSoftware thread asks the age old question of what and how to log. The usual trace/error/warning/info advice is totally useless in a large scale distributed system. Instead, you need to log everything all the time so you can solve problems that have already happened across a potentially huge range of servers. Yes, it can be done. To see why the typical logging approach is broken, imagine this scenario: Your site has been up and running great for weeks. No problems. A foreshadowing beeper goes off at 2AM. It seems some users can no longer add comments to threads. Then you hear the debugging deathknell: it's an intermittent problem and customers are pissed. Fix it. Now. So how are you going to debug this? The monitoring system doesn't show any obvious problems or errors. You quickly post a comment and it works fine. This won't be easy. So you think. Commenting involves a bunch of servers and networks. There's the load balancer, spam filter, web server, database server,
Introduction: This is a guest post by Gordon Worley , a Software Engineer at Korrelate , where they correlate (see what they did there) online purchases to offline purchases. Several weeks ago, we came into the office one morning to find every server alarm going off. Pixel log processing was behind by 8 hours and not making headway. Checking the logs, we discovered that a big client had come online during the night and was giving us 10 times more traffic than we were originally told to expect. I wouldn’t say we panicked, but the office was certainly more jittery than usual. Over the next several hours, though, thanks both to foresight and quick thinking, we were able to scale up to handle the added load and clear the backlog to return log processing to a steady state. At Korrelate, we deploy tracking pixels , also known beacons or web bugs, that our partners use to send us information about their users. These tiny web objects contain no visible content, but may include transparent 1 by 1 gif
3 0.87434757 541 high scalability-2009-03-16-Product: Smart Inspect
Introduction: Smart Inspect has added quite a few features specifically tailored to high scalability and high performance environments to our tool over the years. This includes the ability to log to memory and dump log files on demand (when a crash occurs for example), special backlog queue features, a log service application for central log storage and a lot more. Additionally, our SmartInspect Console (the viewer application) makes viewing, filtering and inspecting large amounts of logging data a lot easier/practical.
4 0.86263132 449 high scalability-2008-11-24-Product: Scribe - Facebook's Scalable Logging System
Introduction: In Log Everything All the Time I advocate applications shouldn't bother logging at all. Why waste all that time and code? No, wait, that's not right. I preach logging everything all the time. Doh. Facebook obviously feels similarly which is why they opened sourced Scribe , their internal logging system, capable of logging 10s of billions of messages per day. These messages include access logs, performance statistics, actions that went to News Feed, and many others. Imagine hundreds of thousands of machines across many geographical dispersed datacenters just aching to send their precious log payload to the central repository off all knowledge. Because really, when you combine all the meta data with all the events you pretty much have a complete picture of your operations. Once in the central repository logs can be scanned, indexed, summarized, aggregated, refactored, diced, data cubed, and mined for every scrap of potentially useful information. Just imagine the log stream from a
5 0.82279605 30 high scalability-2007-07-26-Product: AWStats a Log Analyzer
Introduction: AWStats is a free powerful and featureful tool that generates advanced web, streaming, ftp or mail server statistics, graphically. This log analyzer works as a CGI or from command line and shows you all possible information your log contains, in few graphical web pages. It uses a partial information file to be able to process large log files, often and quickly. It can analyze log files from all major server tools like Apache log files (NCSA combined/XLF/ELF log format or common/CLF log format), WebStar, IIS (W3C log format) and a lot of other web, proxy, wap, streaming servers, mail servers and some ftp servers.
6 0.77646357 937 high scalability-2010-11-09-Paper: Hyder - Scaling Out without Partitioning
7 0.75480205 553 high scalability-2009-04-03-Collectl interface to Ganglia - any interest?
8 0.74327964 233 high scalability-2008-01-30-How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
9 0.71722317 37 high scalability-2007-07-28-Product: Web Log Storming
10 0.6938237 35 high scalability-2007-07-28-Product: FastStats Log Analyzer
11 0.68894386 105 high scalability-2007-10-01-Statistics Logging Scalability
12 0.67608166 1640 high scalability-2014-04-30-10 Tips for Optimizing NGINX and PHP-fpm for High Traffic Sites
13 0.6482079 304 high scalability-2008-04-19-How to build a real-time analytics system?
14 0.64425051 36 high scalability-2007-07-28-Product: Web Log Expert
15 0.63627845 1301 high scalability-2012-08-08-3 Tips and Tools for Creating Reliable Billion Page View Web Services
16 0.62576574 570 high scalability-2009-04-15-Implementing large scale web analytics
17 0.60885936 1096 high scalability-2011-08-10-LevelDB - Fast and Lightweight Key-Value Database From the Authors of MapReduce and BigTable
18 0.60692006 1498 high scalability-2013-08-07-RAFT - In Search of an Understandable Consensus Algorithm
19 0.57701337 45 high scalability-2007-07-30-Product: SmarterStats
20 0.57198966 1196 high scalability-2012-02-20-Berkeley DB Architecture - NoSQL Before NoSQL was Cool
topicId topicWeight
[(1, 0.212), (2, 0.281), (10, 0.06), (13, 0.072), (18, 0.019), (26, 0.012), (30, 0.02), (47, 0.021), (56, 0.012), (61, 0.055), (77, 0.011), (79, 0.069), (85, 0.027), (94, 0.058)]
simIndex simValue blogId blogTitle
1 0.98408943 391 high scalability-2008-09-23-The 7 Stages of Scaling Web Apps
Introduction: By John Engales CTO, Rackspace. Good presentation of the stages a typical successful website goes through: Stage 1 - The Beginning: Simple architecture, low complexity. no redundancy. Firewall, load balancer, a pair of web servers, database server, and internal storage. Stage 2 - More of the same, just bigger. Stage 3 - The Pain Begins: publicity hits. Use reverse proxy, cache static content, load balancers, more databases, re-coding. Stage 4 - The Pain Intensifies: caching with memcached, writes overload and replication takes too long, start database partitioning, shared storage makes sense for content, significant re-architecting for DB. Stage 5 - This Really Hurts!: rethink entire application, partition on geography user ID, etc, create user clusters, using hashing scheme for locating which user belongs to which cluster. Stage 6 - Getting a little less painful: scalable application and database architecture, acceptable performance, starting to add ne features again, op
same-blog 2 0.97709274 77 high scalability-2007-08-30-Log Everything All the Time
Introduction: This JoelOnSoftware thread asks the age old question of what and how to log. The usual trace/error/warning/info advice is totally useless in a large scale distributed system. Instead, you need to log everything all the time so you can solve problems that have already happened across a potentially huge range of servers. Yes, it can be done. To see why the typical logging approach is broken, imagine this scenario: Your site has been up and running great for weeks. No problems. A foreshadowing beeper goes off at 2AM. It seems some users can no longer add comments to threads. Then you hear the debugging deathknell: it's an intermittent problem and customers are pissed. Fix it. Now. So how are you going to debug this? The monitoring system doesn't show any obvious problems or errors. You quickly post a comment and it works fine. This won't be easy. So you think. Commenting involves a bunch of servers and networks. There's the load balancer, spam filter, web server, database server,
3 0.97054529 664 high scalability-2009-07-29-Strategy: Devirtualize for More Vroom
Introduction: Virtualization offers a lot of benefits, but it also comes with a cost (memory, CPU, network, IO, licensing). If you are in or running a cloud then some form of virtualization may not even be an option. But if you are running your own string of servers you can choose to go without. Free will and all that. Should you or shouldn't you? In a detailed comparison the folks at 37signals found that running their Rails application servers without virtualization resulted in A 66% reduction in the response time while handling multiples of the traffic is beyond what I expected . As is common 37signals runs their big database servers without virtualization. They use a scale-up approach at the database tier so extracting every bit of performance out of those servers is key. Application servers typically use a scale-out approach for scalability, which is virtualization friendly, but that says nothing about performance. Finding performance increases, especially when you are running on a d
4 0.97042894 758 high scalability-2010-01-11-Have We Reached the End of Scaling?
Introduction: This is an excerpt from my article Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud. Have we reached the end of scaling? That's what I asked myself one day after noticing a bunch of "The End of" headlines. We've reached The End of History because the Western liberal democracy is the "end point of humanity's sociocultural evolution and the final form of human government." We've reached The End of Science because of the "fact that there aren't going to be any obvious, cataclysmic revolutions." We've even reached The End of Theory because all answers can be found in the continuous stream of data we're collecting. And doesn't always seem like we're at The End of the World ? Motivated by the prospect of everything ending, I began to wonder: have we really reached The End of Scaling? For a while I thought this might be true. The reason I thought the End of Scaling might be near is because of the slow down of potential articles at m
5 0.97028589 233 high scalability-2008-01-30-How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
Introduction: How do you query hundreds of gigabytes of new data each day streaming in from over 600 hyperactive servers? If you think this sounds like the perfect battle ground for a head-to-head skirmish in the great MapReduce Versus Database War , you would be correct. Bill Boebel, CTO of Mailtrust (Rackspace's mail division), has generously provided a fascinating account of how they evolved their log processing system from an early amoeba'ic text file stored on each machine approach, to a Neandertholic relational database solution that just couldn't compete, and finally to a Homo sapien'ic Hadoop based solution that works wisely for them and has virtually unlimited scalability potential. Rackspace faced a now familiar problem. Lots and lots of data streaming in. Where do you store all that data? How do you do anything useful with it? In the first version of their system logs were stored in flat text files and had to be manually searched by engineers logging into each individual machine. T
6 0.96977907 1368 high scalability-2012-12-07-Stuff The Internet Says On Scalability For December 7, 2012
7 0.96831703 881 high scalability-2010-08-16-Scaling an AWS infrastructure - Tools and Patterns
8 0.96723974 511 high scalability-2009-02-12-MySpace Architecture
9 0.96718669 1508 high scalability-2013-08-28-Sean Hull's 20 Biggest Bottlenecks that Reduce and Slow Down Scalability
11 0.96649927 1646 high scalability-2014-05-12-4 Architecture Issues When Scaling Web Applications: Bottlenecks, Database, CPU, IO
12 0.96633482 1220 high scalability-2012-04-02-YouPorn - Targeting 200 Million Views a Day and Beyond
14 0.96595562 558 high scalability-2009-04-06-How do you monitor the performance of your cluster?
15 0.96570128 1364 high scalability-2012-11-29-Performance data for LevelDB, Berkley DB and BangDB for Random Operations
16 0.96547651 313 high scalability-2008-05-02-Friends for Sale Architecture - A 300 Million Page View-Month Facebook RoR App
17 0.96535349 1179 high scalability-2012-01-23-Facebook Timeline: Brought to You by the Power of Denormalization
18 0.96447355 904 high scalability-2010-09-21-Playfish's Social Gaming Architecture - 50 Million Monthly Users and Growing
19 0.96396756 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
20 0.96374786 1565 high scalability-2013-12-16-22 Recommendations for Building Effective High Traffic Web Software