Logscape Analytics’ are incredibly powerful, however, are you using them to their full potential? In this blog post we’re going to go over some of the less used analytics, show you how to use them, and hopefully inspire you to use your Logscape instance in new and exciting ways. So, without further ado let’s get into some searches.
You’re seeing spikes in your CPU usage, is this indicative of an underlying issue? is CPU use ramping up over time? Trend can be used to show averages over a series of buckets.
Trending, does exactly what it says in the name. Using trend in your search will cause two results, by default named ‘_10’ and ‘_20’, these represent the moving average over 10, and 20 buckets respectively, rather than the .avg() analytic which works on a per bucket average.
* | _type.equals(Unx-CPU) Cpu.trend(,AverageAcross) Cpu.max()
* | _type.equals(win-cpu) ProcessorPct.trend(,AverageAcross) ProcessorPct.max()
You’re monitoring the error rate of your applications through Log4j, but you also want to see if processor load or memory usage increases as a result of additional errors. Rather than making use of multiple graphs, you can instead make use of Overlays.
Overlays are Logscapes way of allowing you to combine multiple searches in one graph, you are able to overlay multiple graphs of the same, or different types, allowing you to easily extract knowledge and spot trends. In the blow example I’m monitoring Log4j Errors, as well as Unix Load and CPU usage.
* | _type.equals(unx-cpu) CpuUtilPct.max(,MaxCpu) chart(line)
* | _type.equals(unx-load) 1m.max(,MaxLoad) chart(line)
* | _type.equals(log4j) level.count() not(INFO) not(WARN)
Using overlays you’re also able to perform baseline searches, in this example I make use of the offset() function in order to compare my current CPU usage to exactly one hour ago.
cpu | cpu.avg(_host,0h) chart(line) _host.equals(LAB-UK-XS-UB1)
cpu | cpu.avg(_host,0h) chart(line) _host.equals(LAB-UK-XS-UB1) offset(1h)
You’re examining the CPU Utilization in your environment again, you have hundreds of hosts and you’re only really interested about the hosts that are at the highest values, .percentile(,95) will remove all except those within the 95th percentile for CPU Usage.
The percentile function has two usage, ‘.percentile()’ and ‘.percentile(,[value])’, the first, will show the 1,5,25,50,75,95 and 99 bands within your data, and the value at which these percentage lines lie, if however, you specify a value, you will instead by shown how many records lie above the percentile line you choose.
* | _type.equals(UNX-cpu) CpuUtilPct.percentile(,) chart(line)
* | _type.equals(win-cpu) ProcessorPct.percentile(,) chart(line)
* | _type.equals(UNX-cpu) CpuUtilPct.percentile(,75) chart(line)
* | _type.equals(win-cpu) ProcessorPct.percentile(,75) chart(line)
Rather than showing the level of warnings, or the package generating warnings, by concatenating the package field, with the level field, you’re able to get a count of exactly which package is throw each error, as well as a count of how many errors of this type are being generated.
Searches within Logscape utilize data types, when searching fields from these datatypes are exposed to the search syntax to allow you to extract value from your data, however a relatively unknown fact is the ability to concatenate fields in order to better uniquely identify them, see the below search.
* | _type.equals(log4j) package+level.count(,PackageLevel) level.not(INFO) chart(line)
You’re already monitoring your maximum CPU utilization per day, however you want to perform analytics on the value returned by Logscape on these per day groupings.
Post Aggregation allows you to use the values returned by one Logscape function, as the input to another, this is achieved by alias’ing the result, and then using the ‘+’ notation to access this value. For example
Agent and cpu | cpu.max(_host,POST) +POST.max(,Max) +POST.min(,Min) +POST.avg(,Avg) chart (c3.area)
This shows you your CPU utilisation spread. With it, you can see the spread between your minimum and maximum CPU laden machines, with the average as an indicator. On pretty much any environment, you’d expect to see the Max significantly higher than the average – that means a few boxes are straining and the rest are coasting. The larger the gap, the more uneven your workload is spread.
All your systems are reporting values in Kbyte, or even Bytes, while this is perfectly fine for a machine to interpret, it makes your datasets hard to work with. Using Post-Eval you’re able to manipulate the data returned by your search, and by simply dividing by 1024 you’re able to turn those Kbytes into much easier to handle Mbytes
Post-Search Evaluation is a simple yet powerful tool that allows you to modify values that are returned either directly by your search, or from aliased analytics which you have performed on your data. You can perform operations on a single value, simply by referring to that value directly, i.e
* | _type.equals(unx-ps) VSZ_KB.avg(server,mbUsed) eval(mbUsed / 1024) chart(table) buckets(1)
Would divide the mbUsed field by 1024 after the search has completed, however eval also supports the ‘EACH’ keyword, this performs the stated operation on every value within your search.
You have a reporting workspace that is kept active at all times so that anyone on the team can see it, however due to the sheer number of machines, as well as the number of metrics you’re monitoring the noise on the workspace prevents any value being imparted at a glance, instead you have to study the workspace to make sure nothing is going wrong, boolean indicators allow you to designate when values appear, meaning if data is displayed, it’s worth knowing about.
Boolean Indicators, are, as the name suggests booleans, using the Eval function you can specify conditions which must be met in order for a result to be shown. This is incredibly useful for alerts, as the search can be triggered on a true or false basis. Booleans work on a per bucket basis, so it’s recommended you specify a number of buckets that make sense for your search, in the example below I searched over 60 minutes, and made use of 6 buckets.
* | _type.equals(UNX-cpu) CpuUtilPct.avg(server,AvgCpu) +AvgCpu.eval(CpuUtilPct > 10) chart(cluster) buckets(6)
* | _type.equals(win-cpu) ProcessorPct.avg(server,AvgCpu) +AvgCpu.eval(ProcessorPct > 10) chart(cluster) buckets(6)
The two main tools the Logscape offers through its search syntax for Anomaly detection are the ‘percentile()’ function, as well as the ‘eval()’ function. Percentile offers you the bands of values, and the use of ‘eval’ allows the user to specify their own criteria through the likes of ‘eval(CPU > CpuAvg+10)’
* | _type.equals(win-netutil) ThroughputMBps.avg(_host,Thru) +Thru.eval(ThroughputMBps > 3) chart(stacked)
The above is a boolean search for environments making use of the WindowsApp which would trigger when network traffic spikes over 3MB/s, while booleans still retain their value on a workspace the main intention is for them to be used as trigger conditions for alerts.
* | _type.equals(win-netutil) ThroughputMBps.percentile(,95)
The above would give results if your network throughput is above the 95th percentile, again useful for both workspaces and alert triggers.
Hopefully this blog post has been insightful to you and showed you some new and interesting applications for Logscape analytics. Make sure to keep an eye out for future examples of how you can apply different analytics to your data.