10 Ways to Improve Your Output File

cloudcomputingjoke-300x259So you have written an app or log – it’s brilliant, it grabs all the data you need and runs like greased lightning. All you need to do now is ensure your output file has a nice clean format – preferably one that means Logscape does all the work for you! So here are some of my top tips.

1) Add a full time stamp to every line. You wouldn’t believe how much trouble can be caused by people using just times or dates. At the best, you have to struggle to get your data properly organised. At worst, you end up with a mess and data appears in the wrong place on the graph. Do it right, set the date and time!

2) Add a time zone to that stamp. My computer will never move time-zone, surely it’ll be fine? Don’t count on it. British Summer Time changing the system time on half your servers, servers being reset to US time, data centres moving locations… All these things can and will happen. Adding the time zone to the stamp gives you a cast iron assurance that the data will always be correct. That peace of mind is worth a few bytes.

3) If you’re logging, assign the Level correctly. One of the first elements in your line after the time stamp should be the level. You know, these ones: WARN,ERROR,FATAL,INFO,DEBUG,TRACE. Having these correctly assigned is immensely useful for a user, since using them in searches quickly filters the messages to the required level. Be discerning about what comes into ERROR – if they’re too numerous you risk spamming the user.

4) Standardise your initial output: It doesn’t matter if you get the format from Log4J or anywhere else, but use a nice standardised output for the initial few columns of all your logs. If they all follow a similar format, they’ll fit cleanly together. An ideal set up is: date,level,server,component,message

5) Why server? Just in case you move the logs from their original host to analyse them elsewhere. You say you never will… until you do and lose all reference to which host it came from.

6) Log consistent IDs across logs:This may seem obvious, but very important. If you want to track a PID, trade or purchase through different systems, you need to have a consistent ID on all the systems.

7) To track across lines requires an ID. Logscape works on a line by line basis. This means if you only reference an ID on the first line of the log, you’ll have no way of knowing that the next 50 lines also refer to that ID. If a line refers exclusively to a trade or process with an ID, state that ID in the line.

8) If your data output varies, use JSON. JSON is perfect for logging when you have data output that varies between lines. It’s simple and understandable and although a little verbose, is accepted almost everywhere like the best credit cards. It’s human readable in it’s native form and Logscape will automatically extrapolate the key value pairs for you.

9) If your output never varies, use CSV. Some files are literally just outputs of figures on a schedule: task queues, memory stats. They’ll run regularly, maybe every few seconds, which may produce larger volumes of data. To keep the file size down, just put the figures in CSV format and use a data type in Logscape to extrapolate the column names by position.

10) For efficiency, calculate numbers then report them. Logscape can indeed sum large numbers from large data sets. However, it’s inefficient to calculate the totals every time, especially over months of data. So if you need the daily total regularly, report them separately and use those for reporting.

I hope you find these useful! If you can think of some I’ve missed, let us know!