Visualizing UK accident data with Logscape

In my ever onward quest to show to the world how easy it is to get up and started with Logscape, today I’m going to use a Logscape docker container in order to build visualisations based off some publicly available CSV files in no time at all. If you’ve never used the Logscape docker image, then check out my previous blog.

Today we’re going to be analysing data made available via the website, which offers statistics for crashes in the UK for the year of 2015. The specific dataset is available for download here.

Continue reading

Concatenation or Parameters? Both? What’s the top method of Java logging.

Concatenation or Parameters? Which should we use.

Now it’s undeniable, we techies love to argue about anything we can. Emacs or Vi? Tabs or spaces? Dark theme or Light theme? Brackets on the method line, or the next? to name but a few. We can even see examples of these arguments if you follow discussions on Twitter.

However, whilst you sit in your corner and argue Emacs or Vi (The winner is Vi for the record) we decided to take action by looking at the top Java repositories on GitHub and settling once and for all, which is the more used method of logging.

Take a guess, we dare you.

Again, looking at Twitter (Do we spend too much time staring at that scrolling feed?) polls in the past have shown parameterized logging to have a distinct lead over its String concatenation cousin. But regardless let’s take a look at the actual data.

The proof is in the pu-… repository.

We ingested the top Java repositories on GitHub, pruned out those using less than 200 log statements in the entire file, glared at the inconsiderate repo’s that were ruining our stats with their outliers, and then broke the logging statements down by the most common methods.

We took the results of our search, dropped them into Logscape, generated a pie chart and got….

Use of Logging method across all repo’s (Click to enlarge)

It’s fair to say we were as surprised as you. Of all the log lines we ingested and checked, a whopping 52% have no parameterized elements at all. They’re just plain old strings. Coming in second, we have parameterized statements, with about 33%, concatenation at 14%, and trying it’s best to avoid notice, less than 1% of log statements use both.

The fact so many messages contain no variables at all is interesting, as it means the application has no way of telling you what state it was in, only that it executed that particular log line. However good logs vs bad logs is a discussion for another time.

So we now know that no parameters seem to be vastly more common than any other type of message, but what’s the spread of logging styles? Do people who use static logging always log that way, or do they mix it up. That brings us to our second graph, the average breakdown of logging style on a per-repo basis, and the results are…

Logging type by repo (Click to enlarge)

…What I would say is a better look at the breakdown of statements. Parameterized takes the lead, but only by the smallest slice of pie, coming in at 42% compared to the 36% of statements where both are used. This shows us that whilst the most commonly used format is parameterized, a similar number of devs either can’t decide, or just don’t care, and opt for whichever format suits them best. This begs the question, what is the actual difference.

Concatenation or Parameters, the who, what and where.

So, from an end users point of view, when they open the log file, regardless of which logging format was used, they’re going to see the same thing. A (hopefully) nicely formatted log, full of data they’re interested in. So where’s the difference? the difference lies in the code.

  • String concatenation is combining strings, i.e + " Just blew up")
  • Parameterized uses a formatting anchor i.e
    i.e`{} just blew up`, failure)

From a visual standpoint, they’re really not that different, but what if we look a level deeper?

The Deep Dark

So we know that Parameters are the most popular logging method, and we know that from a code perspective, they both look reasonably similar. So what is the actual difference between them? Well, it mainly comes down to how the JVM treats each statement.

For the case of concatenation, if we take a line such as –

LOGGER.debug("This " + item + " went wrong, with state " + state);

Regardless of the log level, the variables in this message will be converted to a string, meaning if the log level is actually currently INFO, we’ve just converted those variables into Strings, and then we’re not going to use them.

This can admittedly be avoided, but it makes your code even more verbose,

if(LOGGER.isDebugEnabled()) LOGGER.debug("This " + item + " went wrong, with state " + state);

Even using this as an inlined if statement, that’s plenty of visual clutter.

Looking instead at parameterization that same log message is going to look something like this,

LOGGER.debug("This {} went wrong, with state {}", item, state);

If the Logger is set to INFO, this object will never be converted into a String.

There are arguments for and against the visual style of the two, but the biggest fact is a fairly simple one. Using parameterization your objects will only be converted to Strings when they’re needed, saving you time and memory.


For us, the most surprising thing to come out of our research was discovering the sheer number of static log messages that we saw in the first graph. The second showed us that whilst parameterization has a lead, it’s not much of one. This probably reflects the fact that in the long run, there really isn’t that much difference between the two methods. However, once you enter the realms of large-scale logging, it’s clear that parameterization can be simpler and more performant. The fact that you’re not performing additional calls to toString, and thus spending time, and resources for nought seems small now, but scale that up to a system that is potentially making that call hundreds, if not thousands of times per minute, and you see why people prefer parametrization. The major drawback of concatenation can be avoided, but it will cost your visual clarity, and so developer time.

Hopefully, this has helped to shine some light on the matter, and persuaded you to use paramaterization over concatentation. Your applications will thank me.



Native JSON Support with JSON in Logscape 3.2

Logscape 3.2 introduced native JSON support, meaning that when working with JSON data there’s no need for datatypes, instead Logscape automatically pulls the keys from your structure.

This removes the sometimes daunting configuration step, and instead lets you get straight down to business with visualising your data. With that in mind, today we’re going to be embracing our inner geek, and get to work visualising some JSON from the game EvE Online™.


Continue reading

Logscape 3.2 Touches Down

ssksLVBLogscape version 3.2 is now available for public download, you can get it now from the Logscape Website.

A brief rundown of Logscape 3.2 brings with it, and what we’re going to cover today…

  • File Explorer
  • JSON Support (Including JSON Arrays)
  • Failover Overhaul
  • Performance and Stability Changes



Continue reading

Converting Splunk Searches into Logscape

universal_converter_boxConverting Splunk searches into Logscape

Logscape and Splunk share a lot of overlap, and there is one question we get asked quite often by people looking to migrate from Splunk to Logscape.

 How do we convert Splunk searches and Workspaces into Logscape?

Unfortunately, there’s no magic cure or just click here style solution. Fortunately it is significantly easier than you think.

We’re going to cover converting Splunk searches into their Logscape equivalent.

Continue reading

Advanced data analytics and use-cases in Logscape

self_descriptionLogscape Analytics’ are incredibly powerful, however, are you using them to their full potential? In this blog post we’re going to go over some of the less used analytics, show you how to use them, and hopefully inspire you to use your Logscape instance in new and exciting ways. So, without further ado let’s get into some searches. Continue reading

Logscape Tutorials – Logscape in 10 minutes

Recently we’ve been working on creating new learning materials for the release of Logscape 3.0.Materials appropriate for both the Logscape expert and an individual just picking Logscape up for the first time. The first person to be addressed by this was of cof course the beginner, as such here’s a 10 minute introduction to the basics of Logscape 3.0.



Hopefully this help some of our newer users, and keep an eye out for more advanced tutorials!

CSV Discovery in Logscape

New in Logscape 3.0

cloud_43-595x553Logscape 3.0 introduces a new feature that makes working with CSV data easier, and faster. Logscape will now automatically generate a datatype from imported CSV data, you’ll be free to immediately build a workspace around your data rather than having to worry about setting up your datatype. Continue reading

Using Logscape with HPC, 2 of 3

Today Ben Newton returns for the second in a series of three Blog articles covering his progression through building a monitoring solution for Microsoft HPC through Logscape, todays article covers Data collection, both in where the data was sourced, and how he chose to format the data. You can find more of Ben’s work on his Github page, or his LinkedIn.

cartoon3Data Collection: Find it, mine it, record it

Data! Data! Data!” he cried impatiently. “I can’t make bricks without clay.

    -Sherlock Holmes, The Adventure of the Copper Beeches

Continue reading

10 Ways to Improve Your Output File

cloudcomputingjoke-300x259So you have written an app or log – it’s brilliant, it grabs all the data you need and runs like greased lightning. All you need to do now is ensure your output file has a nice clean format – preferably one that means Logscape does all the work for you! So here are some of my top tips.

1) Add a full time stamp to every line. You wouldn’t believe how much trouble can be caused by people using just times or dates. At the best, you have to struggle to get your data properly organised. At worst, you end up with a mess and data appears in the wrong place on the graph. Do it right, set the date and time!

2) Add a time zone to that stamp. My computer will never move time-zone, surely it’ll be fine? Don’t count on it. British Summer Time changing the system time on half your servers, servers being reset to US time, data centres moving locations… All these things can and will happen. Adding the time zone to the stamp gives you a cast iron assurance that the data will always be correct. That peace of mind is worth a few bytes.

Continue reading