Visualizing UK accident data with Logscape

In my ever onward quest to show to the world how easy it is to get up and started with Logscape, today I’m going to use a Logscape docker container in order to build visualisations based off some publicly available CSV files in no time at all. If you’ve never used the Logscape docker image, then check out my previous blog.

Today we’re going to be analysing data made available via the gov.uk website, which offers statistics for crashes in the UK for the year of 2015. The specific dataset is available for download here.

Continue reading

Concatenation or Parameters? Both? What’s the top method of Java logging.

Concatenation or Parameters? Which should we use.

Now it’s undeniable, we techies love to argue about anything we can. Emacs or Vi? Tabs or spaces? Dark theme or Light theme? Brackets on the method line, or the next? to name but a few. We can even see examples of these arguments if you follow discussions on Twitter.

However, whilst you sit in your corner and argue Emacs or Vi (The winner is Vi for the record) we decided to take action by looking at the top Java repositories on GitHub and settling once and for all, which is the more used method of logging.

Take a guess, we dare you.

Again, looking at Twitter (Do we spend too much time staring at that scrolling feed?) polls in the past have shown parameterized logging to have a distinct lead over its String concatenation cousin. But regardless let’s take a look at the actual data.

The proof is in the pu-… repository.

We ingested the top Java repositories on GitHub, pruned out those using less than 200 log statements in the entire file, glared at the inconsiderate repo’s that were ruining our stats with their outliers, and then broke the logging statements down by the most common methods.

We took the results of our search, dropped them into Logscape, generated a pie chart and got….

Use of Logging method across all repo’s (Click to enlarge)

It’s fair to say we were as surprised as you. Of all the log lines we ingested and checked, a whopping 52% have no parameterized elements at all. They’re just plain old strings. Coming in second, we have parameterized statements, with about 33%, concatenation at 14%, and trying it’s best to avoid notice, less than 1% of log statements use both.

The fact so many messages contain no variables at all is interesting, as it means the application has no way of telling you what state it was in, only that it executed that particular log line. However good logs vs bad logs is a discussion for another time.

So we now know that no parameters seem to be vastly more common than any other type of message, but what’s the spread of logging styles? Do people who use static logging always log that way, or do they mix it up. That brings us to our second graph, the average breakdown of logging style on a per-repo basis, and the results are…

Logging type by repo (Click to enlarge)

…What I would say is a better look at the breakdown of statements. Parameterized takes the lead, but only by the smallest slice of pie, coming in at 42% compared to the 36% of statements where both are used. This shows us that whilst the most commonly used format is parameterized, a similar number of devs either can’t decide, or just don’t care, and opt for whichever format suits them best. This begs the question, what is the actual difference.

Concatenation or Parameters, the who, what and where.

So, from an end users point of view, when they open the log file, regardless of which logging format was used, they’re going to see the same thing. A (hopefully) nicely formatted log, full of data they’re interested in. So where’s the difference? the difference lies in the code.

  • String concatenation is combining strings, i.e
     LOGGER.info(failure + " Just blew up")
  • Parameterized uses a formatting anchor i.e
    i.e LOGGER.info(`{} just blew up`, failure)

From a visual standpoint, they’re really not that different, but what if we look a level deeper?

The Deep Dark

So we know that Parameters are the most popular logging method, and we know that from a code perspective, they both look reasonably similar. So what is the actual difference between them? Well, it mainly comes down to how the JVM treats each statement.

For the case of concatenation, if we take a line such as –

LOGGER.debug("This " + item + " went wrong, with state " + state);

Regardless of the log level, the variables in this message will be converted to a string, meaning if the log level is actually currently INFO, we’ve just converted those variables into Strings, and then we’re not going to use them.

This can admittedly be avoided, but it makes your code even more verbose,

if(LOGGER.isDebugEnabled()) LOGGER.debug("This " + item + " went wrong, with state " + state);

Even using this as an inlined if statement, that’s plenty of visual clutter.

Looking instead at parameterization that same log message is going to look something like this,

LOGGER.debug("This {} went wrong, with state {}", item, state);

If the Logger is set to INFO, this object will never be converted into a String.

There are arguments for and against the visual style of the two, but the biggest fact is a fairly simple one. Using parameterization your objects will only be converted to Strings when they’re needed, saving you time and memory.

Thoughts

For us, the most surprising thing to come out of our research was discovering the sheer number of static log messages that we saw in the first graph. The second showed us that whilst parameterization has a lead, it’s not much of one. This probably reflects the fact that in the long run, there really isn’t that much difference between the two methods. However, once you enter the realms of large-scale logging, it’s clear that parameterization can be simpler and more performant. The fact that you’re not performing additional calls to toString, and thus spending time, and resources for nought seems small now, but scale that up to a system that is potentially making that call hundreds, if not thousands of times per minute, and you see why people prefer parametrization. The major drawback of concatenation can be avoided, but it will cost your visual clarity, and so developer time.

Hopefully, this has helped to shine some light on the matter, and persuaded you to use paramaterization over concatentation. Your applications will thank me.