Visualizing UK accident data with Logscape

In my ever onward quest to show to the world how easy it is to get up and started with Logscape, today I’m going to use a Logscape docker container in order to build visualisations based off some publicly available CSV files in no time at all. If you’ve never used the Logscape docker image, then check out my previous blog.

Today we’re going to be analysing data made available via the gov.uk website, which offers statistics for crashes in the UK for the year of 2015. The specific dataset is available for download here.

Continue reading

Logscape 3.2 Touches Down

ssksLVBLogscape version 3.2 is now available for public download, you can get it now from the Logscape Website.

A brief rundown of Logscape 3.2 brings with it, and what we’re going to cover today…

  • File Explorer
  • JSON Support (Including JSON Arrays)
  • Failover Overhaul
  • Performance and Stability Changes

 


 

Continue reading

Advanced data analytics and use-cases in Logscape

Introduction
self_descriptionLogscape Analytics’ are incredibly powerful, however, are you using them to their full potential? In this blog post we’re going to go over some of the less used analytics, show you how to use them, and hopefully inspire you to use your Logscape instance in new and exciting ways. So, without further ado let’s get into some searches. Continue reading

Realtime WebSocket streaming from the cloud to you: Part II

webSocket-AWS-Running

I’ve got the ‘green-light’ and an IP allocated.

In Part 1 I built a Groovy WebSocket
Server  and a Java and HTML Client. In Part 2 I’ll deploy it into AWS, fire up the Clients and add the Github link. With WebSocket Clients, I can run Logscape in the ‘wild’ and make use of the Alert-Feed WebSocket functionality to stream data to my local servers.

AWS Deployment: Before running on the AWS server I need to find the right AMI – one with Java installed. The OpenJDK is installed on most Linux flavours, and I prefer to work with Ubuntu. In the following grab you can see where I’ve fired up the AMI instance.

Continue reading

Realtime WebSocket streaming from the cloud to you: Part I

This is a 2 part post where iwebSocketClientn Part 1 I build the ‘spike’ using Groovy to run a WebSocketServer to stream data to HTML5-WebSocket & JavaWebSocket Clients. The HTML Client uses the elegant smoothie charts (great for streaming). In Part 2 Ill show you how to run it on Amazons AWS.
At the end we have a real-time feed plotting the data from the cloud; it looks something like the grab on the right.

Intelligent Log Analysis – Field Discovery

Field discovery..

.. is cool because it does most of the hard-work for you. It finds system metrics, emails, ipAddress and all sorts of things that you never really realised were filling up your logs. Log analysis has never been so powerful J. Its nice that you can add data, click on Search and see stuff. Log analysis tools keep getting smarter and smarter.

Logscape 2.1 builds on the already popular auto-field discovery by providing users with the ability to add their own, ‘auto-patterns’. The system is called grokIt. Im going to discuss the two approaches and how they work within Logscape.

Implementations:

  • Auto-Field discovery (Key-Value pairs)
  • GrokIt Pattern based discovery (Well known patterns)

Automatic Log Analysis of Key-Value pairs

With 2.0 we launched Key-Value pattern extraction. The idea is simple, whenever a recognised Key-Value pattern is found we index the pair and make them searchable terms.

For example:    CPU:99 hostname:travisio 

OR      { “user”:”john barness”,”ip:”128.10.8.150″,”action”:”login” }

Pattern based extraction (GrokIt):

With this release we have included the ability to extract known patterns such as, email-addresses, hostnames, log-levels, paths, etc. So every time john@jj-pennies.com is seen, then the data is extracted and indexed against the key (_email). The standard config file is logscape/downloads/grokit.properties

#field-name, substring match(leave blank if unavailable), and regular expression matchers that extract a single group for the value
_email::.*?([_A-Za-z0-9-\.]+@[A-Za-z0-9-]+\.[A-Za-z]{2,}).*?
_ipAddress::.*?([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*?
_exception::.*?([_A-Za-z0-9-\.]+Exception).*?
_url::.*?([A-Za-z]{4,4}://[A-Za-z.0-9]+[:0-9]{0,6}[A-Za-z/]+).*?
_level::.*?(INFO|ERROR|WARN|DEBUG|FATAL|TRACE|SEVERE|DEBUG).*?
_hour::.*?[,.\s-]([0-9]{2,2}):[0-9]{2,2}:[0-9]{2,2}[,.\s-].*?
_minute::.*?[,.\s-][0-9]{2,2}:([0-9]{2,2}):[0-9]{2,2}[,.\s-].*?
_gpath::.*?(\/[A-Za-z0-9]+\/[\/A-Za-z0-9]+).*?

Each of these patterns were considered to be the most practical in terms of a) – seeing useful information or b) – slicing your data by time (hour of day).

Each entry contains the FieldName (lhs) : Expression (rhs).
The regular expression must return a group that contains the value (see the orange brackets above). At the bottom we reference some of the awesome regular expression tools we used for these.

How do I configure it?

To make changes you can add or remove entries. Open your favourite text editor (vim?) – make the changes and save it (make sure you test it) . Once saved, then upload the file via the deployments page where the file is replicated to all agents on the network.

Any new files being monitored will pick up the configuration change (note: it wont happen mid-point through a file). To have the change applied retrospectively you will need to re-index the Datasource.

When is it applied?

As with anything, we have tried to make both discovery systems as fast as possible. Key-Value extraction can perform at a rate of 17-20MB/s per pattern, unfortunately the supported 8 different rules cumulatively slow things down. GrokIt – or regular expression parsing is about 14MB/s per compiled pattern. Again this is too slow; as you will see from above, there are 8 of them.

IndexTime: The easiest way to remove the performance penalty is to do the work once, and not when the user is waiting. In our case, when either of the discovery systems are enabled, a Field Database is used to store the data in its most efficient form (dictionary oriented maps). This decouples the processing and provides reasonable search performance on attributes that are unlikely to change.

SearchTime: At search time the executor will pull in any discovered fields and make them available for that event. This provides decent performance and better system scalability.

Configurable by the DataSource

To allow better performance, we have exposed FieldDiscovery flags on the DataSource/Advanced tab. Standard logscape sources have discovery disabled.

data-sources-discovery

Some great regular expression tools:

https://www.debuggex.com

http://regex101.com/

Regards Neil

 

 

 

 

Logscape 2.0.4 is out!

Logscape 2.0.4 is now available for download. It features GeoIp field extraction and D3Datamaps for visualisation. Part of the release also included updating to D3V3

The release includes work around improving Journaling reliability with some of the technology being used (MapDB and Persistit).

Into next year we will continue to drive the product to extract more performance as well as improve the user experience by adding high-level workspace filters and linkage. In other work we are also looking to provide a light-https-agent which can be used in a ‘disconnected’ environment such as cloud or through remote network links.

Web Log Analysis with GeoIp and Logscape 2.0.4

Understanding customer profiles is key for any web business. It allows you to gain geographic insights into visitor behaviour.

Important questions can be answered:

“How many people visited our site, from which city, which country…. and what did they do?”

Logscape 2.0.4 adds support for GeoIp in 2 ways.

  1. We now include MaxMinds GeoCityLite. It is exposed as a groovy-bound variable that allows you to interact with their java-api and bind it to your data. See the bottom of the page for more details.
  2. GeoIP is boring, until you show it on a map! For this purpose (and being D3.js fans) we have integrated using the awesome: http://datamaps.github.io/

Step 1: Extracting the GeoIp Country Code

First you need to extract a public IP address as a field. Then use that within the GeoIP field (Groovy script based). In the following example I’m taking a weblog ClientIP and extracting the country:

Import the  Web Access Log by adding a DataSource:

94.143.249.82 - - [29/Nov/2013:05:16:57 -0800] "GET /videos/AndyCoatesInterview.webm HTTP/1.1" 206 19180929 "http://www.logscape.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36"

Mapping the ClientIP using the DataTypes page with a ‘weblog’ DataType:weblog-datatype

Extract the ClientIP into a country field (note the use of geoipLookup variable):
country-field

View the results on the DataType page
country-table-dt

Step 2: Using the County Code on a Search

So lets create search and view the results in a table:

country-table-search

Now lets take the country field and use the chart(map) renderer:
country-map-search

Nice! Mouse wheel zoom and drag.

Step 3: Displaying a City-hit break down using a Bubble overlay

But we aren’t finished. Id like to show the Citys as bubbles ontop the the heatmapped world view. To do this we need a City field as follows, it returns Json which the browser will render onto the Geo-Projection:

city-field

The Json field results:
country-table-dt

The search now has added the city.count()  to display a count based breakdown:
city-map

Note the chart legend shows the distribution using a quantised heat-map range over 6 colours.

Nice!

Best Regards, Neil. 

THE MaXMIND GeoLite notice:

This product includes GeoLite data created by MaxMind, available from http://www.maxmind.com

Its a great tool btw! You can configure Logscape to use your own MaxMindGeopDB by setting the system property -Dgeoip.db=/opt/file/GeoIp.dat

Logscape 2.0 – Whats New?

Since January we have been diving into the depths of HTML5, Scala, Websockets and other amazing technologies that have recently emerged. Logscape 2.0 is our take at leveraging these tools to provide the most fluid, intuitive tool that is designed for interactive log analysis. Sure Logscape is more than log analysis, but at the end of the day – the audit trail that tells the truth about what happened comes in the form of data located on disk or in memory: usually a log file.

So, whats new?

From our site you will see the sexy new html5 interface, it continues to amaze me how well this technology runs on mobile devices. In many cases we see an iPad 2.0 outperform a Windows i3 desktop when it comes to SVG rendering. The HTML interaction is very smooth and fluid. Testing Logscape 2.0 on the mobile platforms came with a few minor challenges but to see it on a tablet, interact and work with data is a great feeling. The power of the mobile web is  truly becoming the new powerful interface.

Logscape 2.0 is Free – as in Beer

Log analysis tools are changing, their value propositions are changing, and so are we. You can now download and use the Logscape Manager and any number for Forwarders for free. This allows you to get started with minimum hassle, and then scale at fixed costs. More on this in a later post.

The 2.0 design semantic

We wanted to ‘bin’ the old flex front end and create a new look. There were many lessons learned on the road and its not often you get a fresh start. So with that in mind, we needed to make the search page more interactive, and easier to navigate. We have also adopted industry standards visualization like d3.js, while adding the ability easily plug in new visualizations. Each form or selection allows you to quickly refine results by typing a couple of letters. Everything is click-to-edit; like an interactive document. We also wanted dashboards to be different.

d3-wheel

Dashboards to Workspaces

We have also thrown out the ‘dashboards’ concept. They are replaced with ‘Workspaces’ – the idea being that they provide a richer experience of mashed-up search visualizations. They also allow you to embed html content directly within the page. Ok, so nothing new there, but most of the time when I look at a dashboard, I’m thinking, it’s just a single page with pretty charts. That’s great, but I need to know about these other things (what else is happening). I need other views/facets – to be able drill into a search or a different view. You can get stuck pretty quickly. Our solution is to allow each Workspace to link to any other Workspace or Search page. Put this in the context of a page with integrated help and hyperlinking navigation, and you have the ability to provide users with ‘decision trees’ or analysis workflows. All of this flexibility brings you a bespoke semantic visualization network that drives your users down the correct paths when finding and fixing issues.

 blog-Home
Workspace – Home (link on RHS highlighted)

<a href=”Workspace=Home – System Runtime”>- System Runtime</a>
Links – System Runtime:  

blog-Home-RT
Workspace – System Runtime

Dynamic Field Discovery

Unstructured data can contain multiple elements of structure. It’s increasingly common to dump JSON or XML into log files. Or print Key:Value patterns such as ‘user:joe.blogs’. This data is interesting, it tells you something about system behavior. Logscape 2.0 learns about your data, so when you hit search it will dynamically pick out these fields (i.e. user”) and make them searchable. From there you can quickly refine your focus to particular users or incoming IPAddresses, and spot unexpected behavior without having to think about what might be contained within. Logscape will provide you with a summarized breakdown of what fields are available, in a clickable popup. Check out work/audit.log and work/vsaudit.log for how we use it ourselves. The following example shows a popup displaying summary values for ‘COMMITTED’ – the values have been magically extracted from the highlighted line.

CPU:9 MemFree:183 MemUsePC:18300.00 DiskFree:109157 DiskUsePC:0.00 SwapFree:6182

blog-keyValue

Plotting the CPU field and changing to a line chart gives the following:

blog-KV-CPU

DataSource Wildcards

Making data searchable means it needs to be imported by adding directories, filemasks etc. We frequently find that many deployments have variations on a theme – for example some apps might be installed on different drives, or slightly different paths, or a myriad of nested directories in a particular location. Logscape 2.0 introduces wildcards which follow standard conventions. For example: ‘*’ represents a directory name. /*Server/ represents and directory ending with ‘Server’. For multiple directory recursion ‘**’ can be used. For example:

DataSource Path: /JBossServer/Cluster-*/JVM-*

Zoning

As we grow, so do our customers. We find more and more that it makes sense to structure deployments according to our customers’ data center design. Zoning is the ability to shape your Logscape deployment across regions, timezones, datasets, subnets – much like a network map – thereby “zoning” or  clustering related sets of Agents. It is the key ingredient in scaling.  We supported this in previous version, but in Logscape 2.0 it is more intuitive. Dot-notation is now part of the agent-role and includes a hierarchical fall-up. In practice this means that you install a Manager as lab.Manager, any IndexStores with lab.uk.IndexStore and UK related Forwarders into the UK zone. lab.uk.Forwarder. The image below shows how Zoning allows Logscape to scale to support multiple-geographic regions.

DataGroups

With multi-tenancy it is important to limit the views of Users. In Logscape 2.0 this has evolved from assigning a user a set of DataSource Tags (i.e. include myserver-logs) to the intriduction of DataGroups. DataGroups (found on the user tab) as the name implies is a set of DataSource tags which are included or excluded. The also inherit behaviour from other sets of DataGroups. In all, this capability allows for complete control needed in the modern enterprise. In case of lockdown scenarios there is the ability to ‘disable’ a DataGroup which will prevent the data from being visible.

blog-DataGroup

Logscape Apps on GitHib

All Logscape Apps are moving to github. Not everything has is there yet – we’re still working on it – but we’ll finish soon. Visit apps.logscape.com

GitHub-Mark-120px-plus

Cheers Neil.