Introduction
Logs must be thought out. An application's log design should probably be signed-off by a software architect. Enhancements to the log design, as the system evolves, should be encouraged.Checklist:
- Log Design
- Timestamps
- Always Log To Local Disk
- Use Log4j
- Log rotation and archival
- Cosmetic considerations
Log Design
There are only two things about a log that can be controlled by the application: the individual lines, and the separation into different log files.Greppable Logs
What is the end result that we want?We want a greppable log. This means log design is focused entirely on a single line of log. A SINGLE LINE. Remember that. A log entry SHOULD NEVER go over multiple-lines. The only exception is java stacktraces.
We want a greppable log. In other words, what will we be asking of the log? Probably questions like:
- When did user 555444, branch 10, last log in?
- Is our incoming connection from <soap.partner.com> receiving any messages?
- How many times did our web-services partner go dead yesterday?
- What's the longest time required to make a transaction in September?
- Did any transaction fail to complete?
Also, if your application EVER reports a problem, make sure your application logs it. It's not enough to just show a webpage or send an email. You need the log (and hopefully the stacktrace/register-dump/diagnostics).
Four problems to avoid
- Log is missing necessary information. We cannot answer, "When did user 555444, branch 10, last log in?" if the log file doesn't ever contain the information.
- Log unsuitable for grep because of redundant information. We cannot answer, "Is <soap.partner.com> recieving any messages?" if the log file mentions <soap.partner.com> for every line of log, even lines unrelated to <soap.partner.com>.
- Information split across more than one line (bad for grep). Answering, "What's the longest time required to make a billpayment in September?" or "Did any billpayments fail to complete" is impossible with grep if "billPayment start" and "billPayment done" are separate log statements. Answering might even be impossible with any tool if other log statements are present inbetween the "start" and "finish."
Instead, just log a single statement of "billPayment took 285ms" or "billPayment timed out".
- Error reported to user, but not logged. Don't write code like this! ;-)
try { // set some variable } catch ( Exception e ) { // do nothing } // if variable is null, tell user there was a problem
Separate files for different areas of interest
Log design is primarily focused on the individual line of log. However, partitioning the lines into various files can make useful information even more accessible.So far my approach is to try and segregate quieter events (that occur seldom) from noisier events. For a 100 txn/second application I wrote I created three main logs: a message log, a connection log (disconnects/reconnects), and a stacktrace log.
I don't worry about redundancy. I might log the same line in several files.
Timestamps
Logs must contain timestamps, with timezone, to the millisecond or nanosecond.2004-10-04/09:03:00.141141000/PDT*bad event 2004-10-04/09:03:20.217588221/PDT good event 2004-10-04/09:03:26.006912999/PDT another good event
Why timezone?
Answer: maybe half our servers are in Toronto. If our sysadmins decide every computer should use GMT (or PST), great, but let's give the sysadmins the option! Timezone also helps around daylight-savings changes.Why nanosecond? Why not to the second?
Answer: we want to minimize the number of lines occuring with identical timestamps. This is very important if you want to split logs into separate files covering different areas of interest (e.g. connection-history.log and messages.log). If your timestamps are to the nanosecond, you can re-merge these files together and usually not lose the order of events ("sort -m").Personally, for Java I prefer to use "yyyy-MM-dd/HH:mm:ss.SSS/zzz". (I would use nanoseconds if Java supported it).
High Volume Logging: It is probably impossible for any logging system to guarantee logs get written chronologically (in same order as the original events). In an extremely highly loaded system you will sometimes see out-of-order events in your log, usually by a few milliseconds. If you only log to the second, you cannot reestablish the chronology. Log analysis tools also need to be aware of this and not blowup when fed data that is not strictly chronological.
Always Log To Local Disk
Applications should not try to log directly to the network. Don't be tempted by the syslog dream, and be careful not to accidentally (or overtly) log to a shared directory. So many times people say, "The network is not reliable." This means it's unsuitable for direct logging. Local disk is your loyal and trustworthy friend.We do want our logs to be ultimately centralized and archived, but that task is best accomplished with cron or some other batch-oriented background process using a reliable transport (e.g. ftp, scp, nfs, smbfs). Also, be sure to compress your logs before they hit the network.
Keep a week or two of the archived files in both places (local-disk and archive server), both to make trouble shooting easier for sysadmins, and to give you some breathing room should your archival process break down temporarily. Having a couple weeks of logs on the local-disk can be quite convenient, especially when faced with a question like, "Did this problem start in the last few days?" The sysadmin will be happier if she doesn't have to always ssh into the central archive just to look at yesterday's logs.
Use Log4j 1.2.x (or a worthy competitor - e.g. Logback)
There are several logging frameworks available for java.- log4j-1.2.x (Note: the 1.3.x and 2.x versions are not production quality yet - avoid for now)
- Logback
- java.util.logging
- logKit
- For those that can't decide: commons-logging, or slf4j
Commons-logging and slf4j are interesting. The are both generalizations of logging, capable of supporting any underlying implementation. If you cannot decide on an implementation, or if you think you might want to change your implementation in the future, then commons-logging or slf4j are appropriate. Commons-logging is also good to know about, because many 3rd party tools (JBoss, HttpClient, Tomcat, etc...) use it.
I recommend skipping the generalization if possible. You will have more control over log4j if you use it directly. You cannot execute log4j's PropertyConfigurator.configure( "/etc/log4j.properties" ) at your leisure if using "commons-logging".
Programming with log4j
Every class must contain this line:private static Logger log = Logger.getLogger( MyClass.class )EVERY CLASS! I've configured IntelliJ to automatically generate that code whenever I create a new class. Some people recommend against making that line static, but Logger.getLogger() is synchronized, so non-static can degrade performance. It's easier just to use "static" and not worry about it. Normally I don't try to avoid synchronized calls, but this really is one of those rare places in Java these days, even with Java6, where synchronized can hurt, since every object's constructor will now contend for the global log4j lock.
DO NOT: create a single class "com.foo.app.Log" and run all logging calls through that! You are severely impairing log4j's flexibility if you do that.
In log4j, all loggers are named. Dots in the name "." act as hierarchy dividers. Through configuration, you can turn off "com.foo", turn on INFO for "com.foo.abc" and turn on DEBUG for "com.foo.abc.Message". But you can only do this if every logger is named after the class calling it.
4 Types of Logging
- log.error()
- log.warn()
- log.info()
- log.debug()
Logging Exceptions
If the Exception is significant, at the very minimum, always do this:try { // Code that might throw an exception... } catch (Exception e) { log.error(e, e); }If you're sending the stacktraces to a separate log file (more about this below), you can do this:
try { // Code that might throw an exception... } catch (Exception e) { log.error(e); stacktraceLog.error(e,e); }This way the error is in both files, but the actual stacktrace itself is only in the "stacktraces" logger.
Note to log4j developers: Please add the timestamp and level to each line of the stacktrace to keep it grep-friendly. I don't like doing "grep -C 15" (e.g. show 15 lines before and after)!
Performance Tweak
Before assembling your greppable log statement you may want to check log.isInfoEnabled() or log.isDebugEnabled(). This way you save on some cpu cycles:// Uses many cpu cycles: String fancyLogString = buildFancyLogString(); // Does nothing because "debug" is currently disabled in log4j.properties: log.debug( fancyLogString );Better:
if ( log.isDebugEnabled() ) { // Uses many cpu cycles: String fancyLogString = buildFancyLogString(); // Does nothing because "debug" is currently disabled in log4j.properties: log.debug( fancyLogString ); }
Separate files for different areas of interest
For each file, you'll need a separate Logger.private static Logger log = Logger.getLogger( MyClass.class ) private static Logger connectionsLog = Logger.getLogger( "connections." + MyClass.class.getName() ) private static Logger stacktracesLog = Logger.getLogger( "stacktraces." + MyClass.class.getName() ) private static Logger httpLog = Logger.getLogger( "http." + MyClass.class.getName() )
Configuration
Log4j's main disadvantage is that the documentation is confusing and hard to navigate. If you are brave, the javadoc for PatternLayout is handy. So is the manual http://logging.apache.org/log4j/docs/manual.html.I find log4j configuration much easier to deal with once you have an example:
log.dir=/var/log/foo rrd.dir=${log.dir}/rrd datestamp=yyyy-MM-dd/HH:mm:ss.SSS/zzz roll.pattern.hourly=.yyyy-MM-dd.HH roll.pattern.daily=.yyyy-MM-dd # catchAll.log -- Default catch-all. log4j.rootLogger=DEBUG, defaultLog log4j.appender.defaultLog=org.apache.log4j.DailyRollingFileAppender log4j.appender.defaultLog.DatePattern=${roll.pattern.daily} log4j.appender.defaultLog.File=${log.dir}/catchAll.log log4j.appender.defaultLog.layout=org.apache.log4j.PatternLayout log4j.appender.defaultLog.layout.ConversionPattern=%d{${datestamp}} [%t] %-5p %m%n # foo.log log4j.logger.com.foo.shared=INFO,fooLog log4j.logger.com.foo.abc=INFO,fooLog log4j.additivity.com.foo.shared=false log4j.additivity.com.foo.abc=false log4j.appender.fooLog=org.apache.log4j.DailyRollingFileAppender log4j.appender.fooLog.File=${log.dir}/foo.log log4j.appender.fooLog.DatePattern=${roll.pattern.hourly} log4j.appender.fooLog.layout=org.apache.log4j.PatternLayout log4j.appender.fooLog.layout.ConversionPattern=%d{${datestamp}}%p%m%n # fooConnections.log log4j.logger.connections.com.foo=INFO,fooConnections log4j.additivity.connections=false log4j.appender.fooConnections=org.apache.log4j.DailyRollingFileAppender log4j.appender.fooConnections.File=${log.dir}/fooConnections.log log4j.appender.fooConnections.DatePattern=${roll.pattern.daily} log4j.appender.fooConnections.layout=org.apache.log4j.PatternLayout log4j.appender.fooConnections.layout.ConversionPattern=%d{${datestamp}}%p%m%n # fooStacktraces.log log4j.logger.stacktraces.com.foo=INFO,stacktraces log4j.additivity.stacktraces=false log4j.appender.stacktraces=org.apache.log4j.DailyRollingFileAppender log4j.appender.stacktraces.File=${log.dir}/fooStacktraces.log log4j.appender.stacktraces.DatePattern=${roll.pattern.daily} log4j.appender.stacktraces.layout=org.apache.log4j.PatternLayout log4j.appender.stacktraces.layout.ConversionPattern=%d{${datestamp}}%p%m%nNotice the use of property substitution! Very handy! Also, since I use single-character level strings (* instead of ERROR), I don't have to put spaces before and after %p.
Loading Configuration
Log4j will automatically load the configuration if it is stored in a file called "log4j.properties" and is present on the classpath under "" (e.g. WEB-INF/classes/log4j.properties).I don't like that approach and prefer to load the configuration explicitly by calling:
PropertyConfigurator.configure( Config.ETC + "/log4j.properties" );This way I can reload the configuration at any time as long as my application is still running. I like to add a button to an administrative jsp, "Reload Log4J".
Dynamic Log File Location
Many people complain that Log4j forces you to hard-code the location where your logs will be kept. Actually, it is possible to dynamically choose the log-file location, especially if you use the ${log.dir} property substitution technique above. Here's how:String dynamicLog = // log directory somehow chosen... Properties p = new Properties( Config.ETC + "/log4j.properties" ); p.put( "log.dir", dynamicLog ); // overwrite "log.dir" PropertyConfigurator.configure( p );
Log rotation and archival
WARNING: Log4J has some catastrophic bugs (logs accidentally deleted) if more than one running JVM tries to rotate the same target log file. This is especially likely to happen if two instances of the same application are running in parallel in separate processes (e.g. two Tomcat instances), but both use identical log4j.properties.Logs should be rotated every hour, or every day. They should not be rotated by size (e.g. every 100MB). A log file containing only events between "13:00:00.000-13:59:59.999" is inherently organized in a way useful for humans, whereas a log file containing exactly 100MB of data is not.
Log4j handles hourly/daily log rotation very nicely:
log4j.appender.defaultLog=org.apache.log4j.DailyRollingFileAppender log4j.appender.defaultLog.DatePattern=.yyyy-MM-dd.HHThis particular config rotates the log every hour (yyyy-MM-dd.HH).
Logs should be periodically gzipped and moved to a server devoted to log-archival. This should happen every night, or every hour. Archival is also a good stage to run all logs through the unix "sort" command to reestablish chronologically in case some events were logged out-of-order.
You should be arhiving logs on QA as well as Production, so that you can make sure you got it right.
Personally I prefer gzip to bzip2. bzip2 makes smaller files, but it is a lot slower! When viewing compressed logs, use zgrep (or bzgrep). In other words, don't decompress the logs to disk. Just decompress them to standard-out and run grep against that. You may also want to look into these related commands: zcat, zdiff, zcmp, zless.
A Log Rotation Best Practice That Is Currently Not Possible
When a log file rotates (either because of size, or daily/hourly), the first line of the new log file should contain important version information about the running application:2008-06-06/09:00:00.000 APP=[FooApp-1.2.3] UPTIME=[382 Days, 10:51:11.231] VM=[Java HotSpot(TM) Client VM 1.4.2_16-b05] OS=[Windows XP 5.1 x86] CONTAINER=[Apache Tomcat/5.0.30] HEAP=[69MB/281MB 24.6%]Additional log-related investigations become possible with this at the top of every rotated log file. For example, "Did the application encounter more of bug X when we upgraded the Linux kernel?" or "How much did response time improve after we upgraded Java?" Analyses like these are impossible without a "version-line" at the top of each log file.
That version line should also print in the middle of the current log whenever the application restarts. The "version line" should be sent to all active log files. So if log4j.properties is managing 100 files, then all 100 should get this line. Personally, I would even recommend that during really quiet times logging frameworks should still create the file, and log the single version line, even if no other log events occurred during the day or hour in question. This way the absence of the file in the log archive would almost always mean "the application was shut down," rather than "the application was either shut down, or was not doing anything," as it currently means.
Unfortunately, this "version line" feature is not possible in any existing Java logging framework. Hopefully one day someone will implement it. (Why is everyone looking at me?)
Cosmetic considerations
Consider this log snippet.2004-09-28/18:37:35.138/PDT Loading FooApp configuration from /home/julius/dev/foo/webapps/foo/WEB-INF/etc/foo.xml 2004-09-28/18:37:35.938/PDT 200-HEAD 325ms 0.0kB KEY=GOOGLE DST=http://www.google.ca/ [health] 2004-09-28/18:37:35.939/PDT!400-HEAD 238ms 0.0kB KEY=DLL DST=http://192.1.2.3/cgi-bin/cgi.dll/ext/ [health] 2004-09-28/18:37:37.745/PDT*ECHOSERVER DST=socket(localhost:5555) [java.net.ConnectException: Connection refused] 2004-09-28/18:37:41.886/PDT 200-HEAD 6306ms 0.0kB KEY=SOAP DST=https-easy://198.5.6.7/ [health] 2004-09-28/18:37:43.083/PDT*Timeout 7501ms KEY=TXN DST=http://10.1.2.3:8080/ [health] 2004-09-28/18:37:47.750/PDT ECHOSERVER DST=socket(localhost:5555) [Connected!] 2004-09-28/18:39:47.429/PDT*Timeout 7502ms KEY=TXN SRC=localhost.localdomain DST=http://10.1.2.3:8080/blah [bar:xyz foo:5050] 2004-09-28/18:40:44.581/PDT!com.foo.net.TimeoutException: Timeout 7501ms KEY=TXN DST=http://10.1.2.3:8080/ [client.executeMethod() failed] 2004-09-28/18:40:51.778/PDT!404-GET 86ms 1.22kB KEY=GOOGLE SRC=localhost.localdomain DST=http://www.google.ca/blah [bar:hamburger foo:5050] 2004-09-28/18:41:03.480/PDT SOCKET 558ms 0.26kB KEY=ECHOSERVER SRC=localhost.localdomain DST=DST=socket(localhost:5555) [bar:hamburger foo:5050] 2004-09-28/18:41:06.016/PDT SOCKET 13ms 0.26kB KEY=ECHOSERVER SRC=localhost.localdomain DST=DST=socket(localhost:5555) [bar:hamburger foo:5050] 2004-09-28/18:41:08.562/PDT SOCKET 7ms 0.26kB KEY=ECHOSERVER SRC=localhost.localdomain DST=DST=socket(localhost:5555) [bar:hamburger foo:5050] 2004-09-28/18:43:53.553/PDT!com.foo.net.TimeoutException: Timeout 7502ms KEY=TXN DST=http://10.1.2.3:8080/blah [client.executeMethod() failed] 2004-09-28/22:05:12.950/PDT*ECHOSERVER DST=socket(localhost:5555) [Closed]I'm trying to keep earlier parts of the log statement aligned. Notice the "millisecond" response-time reads are aligned.
I find ! * + and [space] more convenient to work with than the verbose "INFO, DEBUG, ERROR, WARN". They are a little trickier to grep for, though, since the shell will try to expand them. I use single-quotes to stop the shell from interfering:
grep '*' foo.log Find all errors in log. grep '!' foo.log Find all warnings in log. grep '*\|!' foo.log Find all warnings AND errors in log! The pipe is the OR operator in grep, but it needs backslash protection.Here's the code to change Log4J's level strings:
package org.apache.log4j; public class LevelChanger { public static void changeLevels() { Level.DEBUG.levelStr = "_"; Level.INFO.levelStr = " "; Level.WARN.levelStr = "!"; Level.ERROR.levelStr = "*"; Level.FATAL.levelStr = "***"; } }Even though LevelChanger is code I write and package in a separate jar file, as long as I declare it part of the "org.apache.log4j", it can change the level strings.
No comments:
Post a Comment