Advanced data analytics and use-cases in Logscape

Introduction
self_descriptionLogscape Analytics’ are incredibly powerful, however, are you using them to their full potential? In this blog post we’re going to go over some of the less used analytics, show you how to use them, and hopefully inspire you to use your Logscape instance in new and exciting ways. So, without further ado let’s get into some searches. Continue reading

10 Ways to Improve Your Output File

cloudcomputingjoke-300x259So you have written an app or log – it’s brilliant, it grabs all the data you need and runs like greased lightning. All you need to do now is ensure your output file has a nice clean format – preferably one that means Logscape does all the work for you! So here are some of my top tips.

1) Add a full time stamp to every line. You wouldn’t believe how much trouble can be caused by people using just times or dates. At the best, you have to struggle to get your data properly organised. At worst, you end up with a mess and data appears in the wrong place on the graph. Do it right, set the date and time!

2) Add a time zone to that stamp. My computer will never move time-zone, surely it’ll be fine? Don’t count on it. British Summer Time changing the system time on half your servers, servers being reset to US time, data centres moving locations… All these things can and will happen. Adding the time zone to the stamp gives you a cast iron assurance that the data will always be correct. That peace of mind is worth a few bytes.

Continue reading

Logscape 2.5 is now live!

All,goinglong

Release 2.5 includes improvements to LDAP/Active Directory management, selective UI enhancements and performance improvements. IE11 compatability has also been improved along with minor bug fixes.

The release notes are here.

Continue reading

Using Network Traffic Analysis on Cisco ASA routers to detect SpamBot Activity

Today businesses face an array of external and internal threats to their corporate network. Protecting business operations from security threats requires vigilance, experience and excellent tools.

Recently one of our customers found that alert emails from a trade capture system where being blocked by external mail servers. Further investigation showed that all traffic from the company mail server had been blacklisted and were being marked as spam.

Fortunately the sysadmin had Logscape installed actively monitoring all Cisco router traffic.

Outbound Connection Analysis

The Composite Blocking List (CBL) keeps a list of blacklisted mail servers with suspicious traffic. Many mail servers use this list to reject suspicious mail traffic. The first step for the sysadmin was to analyse all outbound traffic coming from within the company.

The first search he built showed all outbound mail connections recorded by the Cisco router  grouped by the  outbound mail-servers ip address. In a search like this you would expect to see connections to a short list of company approved mail server. This wasn’t the case.

 

 |  _type.equals(cisco-asa)  dstAddress.count() dstPort.equals(25)

cisco-blurred

 

On the 10th of April there is a distinctive spike in mail traffic. The blue parts of the search results represent expected mail activity but the  multi coloured spike indicates suspicious activity. Continue reading

Intelligent Log Analysis – Field Discovery

Field discovery..

.. is cool because it does most of the hard-work for you. It finds system metrics, emails, ipAddress and all sorts of things that you never really realised were filling up your logs. Log analysis has never been so powerful J. Its nice that you can add data, click on Search and see stuff. Log analysis tools keep getting smarter and smarter.

Logscape 2.1 builds on the already popular auto-field discovery by providing users with the ability to add their own, ‘auto-patterns’. The system is called grokIt. Im going to discuss the two approaches and how they work within Logscape.

Implementations:

  • Auto-Field discovery (Key-Value pairs)
  • GrokIt Pattern based discovery (Well known patterns)

Automatic Log Analysis of Key-Value pairs

With 2.0 we launched Key-Value pattern extraction. The idea is simple, whenever a recognised Key-Value pattern is found we index the pair and make them searchable terms.

For example:    CPU:99 hostname:travisio 

OR      { “user”:”john barness”,”ip:”128.10.8.150″,”action”:”login” }

Pattern based extraction (GrokIt):

With this release we have included the ability to extract known patterns such as, email-addresses, hostnames, log-levels, paths, etc. So every time john@jj-pennies.com is seen, then the data is extracted and indexed against the key (_email). The standard config file is logscape/downloads/grokit.properties

#field-name, substring match(leave blank if unavailable), and regular expression matchers that extract a single group for the value
_email::.*?([_A-Za-z0-9-\.]+@[A-Za-z0-9-]+\.[A-Za-z]{2,}).*?
_ipAddress::.*?([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*?
_exception::.*?([_A-Za-z0-9-\.]+Exception).*?
_url::.*?([A-Za-z]{4,4}://[A-Za-z.0-9]+[:0-9]{0,6}[A-Za-z/]+).*?
_level::.*?(INFO|ERROR|WARN|DEBUG|FATAL|TRACE|SEVERE|DEBUG).*?
_hour::.*?[,.\s-]([0-9]{2,2}):[0-9]{2,2}:[0-9]{2,2}[,.\s-].*?
_minute::.*?[,.\s-][0-9]{2,2}:([0-9]{2,2}):[0-9]{2,2}[,.\s-].*?
_gpath::.*?(\/[A-Za-z0-9]+\/[\/A-Za-z0-9]+).*?

Each of these patterns were considered to be the most practical in terms of a) – seeing useful information or b) – slicing your data by time (hour of day).

Each entry contains the FieldName (lhs) : Expression (rhs).
The regular expression must return a group that contains the value (see the orange brackets above). At the bottom we reference some of the awesome regular expression tools we used for these.

How do I configure it?

To make changes you can add or remove entries. Open your favourite text editor (vim?) – make the changes and save it (make sure you test it) . Once saved, then upload the file via the deployments page where the file is replicated to all agents on the network.

Any new files being monitored will pick up the configuration change (note: it wont happen mid-point through a file). To have the change applied retrospectively you will need to re-index the Datasource.

When is it applied?

As with anything, we have tried to make both discovery systems as fast as possible. Key-Value extraction can perform at a rate of 17-20MB/s per pattern, unfortunately the supported 8 different rules cumulatively slow things down. GrokIt – or regular expression parsing is about 14MB/s per compiled pattern. Again this is too slow; as you will see from above, there are 8 of them.

IndexTime: The easiest way to remove the performance penalty is to do the work once, and not when the user is waiting. In our case, when either of the discovery systems are enabled, a Field Database is used to store the data in its most efficient form (dictionary oriented maps). This decouples the processing and provides reasonable search performance on attributes that are unlikely to change.

SearchTime: At search time the executor will pull in any discovered fields and make them available for that event. This provides decent performance and better system scalability.

Configurable by the DataSource

To allow better performance, we have exposed FieldDiscovery flags on the DataSource/Advanced tab. Standard logscape sources have discovery disabled.

data-sources-discovery

Some great regular expression tools:

https://www.debuggex.com

http://regex101.com/

Regards Neil

 

 

 

 

Log Analysis Performance: Logscape Benchmarks 2.0.3

Log Analysis Performance:

..is akin to driving a car, taking the family on holiday, hitting a big hill and struggling to get to the top. Once you finally reach the peak you are told you need a bigger car, more cars, bigger engine (or that your license doesn’t cover this much luggage!). Leaving you in the state of mind, ‘Im here now, I can’t go back…. but I need to handle more data ($$)’

Jokes aside – we have been putting a lot of effort into performance improvements on the latest release. Log monitoring just got faster and more affordable.

Before digging much deeper, it makes sense to cover the pains of anything that solves data-centric problems. That being the cost of storage + processing capability. Architecture over the last couple of years now accepts that its not scale-up (bigger boxes) but the combined ability to scale-out (more boxes) which is the golden-arrow. Everything has a cost, so depending on the limiting factors you can determine a suitable architecture. For example, if my licenses are cheap or data centric then I can use more commodity (existing) hardware, however because the disks are likely to me slow. Scaling out solves this by throwing more machines at the problem (power and maintenance costs aside). Mind you; SSD prevalence is impending! In our case, we recognise the trade off of increasing server performance WRT disk performance, core-counts and relative savings. It makes sense to reason that a 2-3 server deployment is easier to live with than a 10 server deployment.

Q: What balance do we strike between data and processing density?

To serve as a sizing-guide follow us through on the benchmarking analysis we performed on the latest Logscape release. We are focusing on an IndexStore who’s only purpose in life is to receive remote data, make it searchable and do the search.

What to benchmark?

The answer is another question – what is the server doing? Its participating in a distributed compute environment, it has multiple remote servers streaming data to it, the data is persisted, indexed and searched.

The 2 most important elements are:

  1. How much data can I index per day? (WRITE)
    i.e. sustainable Index Rate in MB/s ?
  2. How fast can data processed a search-time? (READ)
    i.e. how quickly can we serve user search requests?

Data Processing Hardware

  • HP PROLIANT DL380 G5,
  • DUAL INTEL XEON QUAD CORE X5460 @ 3.16GHz
  • 16GB RAM
  • 4 x 146GB 10000RPM HDD’s [Raid-0]
  • Ubuntu 12.x

Yes the server is a bit dated but it serves well as an industry benchmark. We named it “battlestar”

IO Subsystem performance

Using DD we can determine Read and Write performance in MB/s.

Write MB/s:

logscape@battlestar:~$ dd bs=100M count=10 if=/dev/zero of=test conv=fdatasync 
10+0 records in
10+0 records out
1048576000 bytes (1.0 GB) copied, 3.93265 s, 267 MB/s

Read MB/s

logscape@battlestar:~$ dd bs=100M count=10 of=/dev/zero if=test conv=fdatasync 
10+0 records in
10+0 records out
1048576000 bytes (1.0 GB) copied, 0.701993 s, 1.5 GB/s

We can convert this to IOPS using simple disk oriented formula – from memory each of these disks runs about 200 IOPS.

We used a Raid-0 configuration because we wanted to ensure that  disk wasn’t the limiting factor. After-all – We have 8 x 3.16 GHz cores to milk 😉

Logscape Agent JVM Configuration

Our technology stack is Java based. Using the an older JRE: jdk1.7.0_07 : (to be updated)

Given that we have 6 months of historic data we want to maximize the use of Heap and Off-Heap storage.

  • JVM Heap: -Xms4G -Xmx4G -XX:MaxDirectMemorySize=10G
    Heap = 4GB, OffHeap-Indexes:10GB
  • Data tenuring period: sysprops:-Dlog.max.slow.days=999
    (this tells the agent to treat anything as new data and processing in normal priority threads). Without this setting long-historical searches are treated as background tasks.
  • Threads:sysprops:-Dlog.search.threads=8 : using 8 processing threads – otherwise logscape will allocate Cores – 2.

Its all about the Data:

We are processing 2 sorts of data: SysLog [4.4GB] and Application Log [2.2GB] (log4j). Each data type has a different profile in terms of standard fields and discovered fields. Note: field discovery is on.

The Numbers:

Benchmark – Indexing Performance

Importing the Syslog data took 4.minutes with 40% of CPU resources allocated to the task. The 40% provides 60% clear bandwidth for other tasks like Incoming Streams and Search requests.

Indexing occurs at a rate of 18.3MB/s.

Scale this out to Minutes, Hours and Days: 40% CPU allows for 1.1GB/m, 66GB/h, 1584GB/day

Benchmark – Search Performance

Whats the point of Indexing 1584GB/day if you cant serve it up to users? So the real challenge here is to understand where the bottle neck is in a READ process. Search use cases are as varied as the British weather.

The worst case is the brute force adhoc search by a poweruser that wants the world!

The Search (7 days):

* |  _host.count() _agent.equals(lab.uk.IndexStore)

This will pull back everything from this particular IndexStore (there are 12 servers in this environment)

The Stats:

We can either use the Logscape UI as an indicator, this search took 22s to return 8,953,585 results. Giving us a just over 400K events per second. If you examine logscape/work/event.log – the search performance is also audited there. In this case we see 492K events per second. The difference is due to search complete coordination of 3 seconds.

Search performance is about 500K events per second.

Performing another search against syslog – mail log data we see results on 292K events per second.

We are currently performing more analysis, upgrading JDKs etc. We will come back and update with a few screen grabs to show the tools we used as part of this process.

Regards Neil.

All Data is Not Created Equal

The cost of analyzing log files and operational data in many companies is starting to add up. Lifting and shifting data around is expensive anyway, and per-gigabyte vendor fees are making it even more expensive.

In the past 6 months, we’ve heard from a lot of companies who’ve placed hard limits on the amount of operational data they’re willing to collect and index in their centralized log management service. Many are actively enforcing an artificial “data ceiling” to make sure that only data that’s pre-defined as highly valuable gets indexed, and everything else gets ignored.

This makes a lot of sense as a cost-control mechanism. Not all data is created equal, and some data is always going to be more valuable than other data. But it goes without saying that – as long as you have a good way to analyze it – the more data you have, the deeper and more complete the picture you can get about what’s going on. Cutting apparently lower value data out of the picture may cut your costs, but it also cuts away at your valuable insights.

With the launch of Logscape 2.0, we’re inviting companies everywhere to expand into a more holistic approach to log file analysis and operational analytics. We’re helping our customers break the data ceiling with cost-effective and massively scalable analytics for high-value AND lower value data using localized and centralized log management. Oh, and we’re also helping them index unlimited data volumes free, so they can get up and running quickly and scale over time to analyze ALL of their operational data, not just what they can afford to collect.

So from the team at Logscape, we hope you enjoy the new release – give it a spin and let us know what you think.

We’ll be sharing more stories with you about how our customers are getting deeper insights from implementing more holistic, cost-effective and massively scalable operational analytics very soon.