Tuesday, April 3, 2018

Real-time baseline anomaly detection

The screen capture demonstrates the real-time baseline and anomaly detection based on industry standard sFlow streaming telemetry. The chart was generated using sFlow-RT analytics software. The blue line is an up to the second measure of traffic (measured in Bits per Second). The red and gold lines represent dynamic upper and lower limits calculated by the baseline function. The baseline function flags "high" and "low" value anomalies when values move outside the limits. In this case, a "low" value anomaly was flagged for the drop in traffic shown in the chart.

Writing Applications provides a general introduction to sFlow-RT programming. The baseline functionality is exposed through through the JavaScript API.

Create new baseline
baselineCreate(name,window,sensitivity,repeat);
Where:
  • name, name used to reference baseline.
  • window, the number of previous intervals to consider in calculating the limits.
  • sensitivity, the number of standard deviations used to calculate the limits.
  • repeat, the number of successive data points outside the limits before flagging anomaly 
In this example, baseline parameter values were window=180 (seconds), sensitivity=2, and repeat=3.

Update baseline
var status = baselineCheck(name,value);
Where:
  • status, "learning" while baseline is warming up (takes window intervals),  "normal" if value is in expected range, "low" if value is exceptionally low, "high" if value is exceptionally high.
  • value, latest value to check against baseline
The baselineCheck function is called periodically to update baseline statistics and check for anomalies.

Query baseline statistics
var {mean,variance,sdev,min,max} = baselineStatistics(name);
Note: Statistics are only available once the baseline has exited the "learning" status.

Reset baseline
baselineReset(name);
Resets the statistics and sets state to "learning"

Delete baseline
baselineDelete(name);
Delete the baseline and free up associated resources.

The sFlow-RT baseline functionality is designed to be resource efficient and to converge quickly so that large numbers of baselines can be created and updated for real-time anomaly detection.

The baseline functions work best when the variable being tracked represents the activity of a large population and is relatively stable. For example, WAN traffic is generally a good candidate for baselining since it is composed of the activity of many systems and users. On the other hand, individual host activity tends to be highly variable and not well suited to baseline monitoring.
The table from Baseline contrasts two methods of baseline calculation. The baseline functionality described in this article is an example of a temporal baseline. Cluster performance metrics describes how sFlow-RT can be used to calculate statistics from large populations of devices. These functions can be used for spatial baselining and anomaly detection, for example, by finding a virtual machine in a service pool that is behaving inconsistently when compared to its peers.

No comments:

Post a Comment