Saturday, November 23, 2013

Metric export to Graphite

Figure 1: Cluster performance metrics
Cluster performance metrics describes how sFlow-RT can be used to calculate summary metrics for cluster performance. The article includes a Python script that polls sFlow-RT's REST API and then sends metrics to to Graphite. In this article sFlow-RT's internal scripting API will be used to send metrics directly to Graphite.
Figure 2: Components of sFlow-RT
The following script (graphite.js) re-implements the Python example (generating a sum of the load_one metric for a cluster of Linux machines) in JavaScript using sFlow-RT built-in functions for retrieving metrics and sending them to Graphite:
// author: Peter
// version: 1.0
// date: 11/23/2013
// description: Log metrics to Graphite

include('extras/json2.js');

var graphiteServer = "10.0.0.151";
var graphitePort = null;

var errors = 0;
var sent = 0;
var lastError;

setIntervalHandler(function() {
  var names = ['sum:load_one'];
  var prefix = 'linux.';
  var vals = metric('ALL',names,{os_name:['linux']});
  var metrics = {};
  for(var i = 0; i < names.length; i++) {
    metrics[prefix + names[i]] = vals[i].metricValue;
  }
  try { 
    graphite(graphiteServer,graphitePort,metrics);
    sent++;
  } catch(e) {
    errors++;
    lastError = e.message;
  }
} , 15);

setHttpHandler(function() {
  var message = { 'errors':errors,'sent':sent };
  if(lastError) message.lastError = lastError;
  return JSON.stringify(message);
});
The interval handler function runs every 15 seconds and retrieves the set of metrics in the names array (in this case just one metrics, but multiple metrics could be retrieved). The names are then converted into a Graphite friendly form (prefixing each metric with the token linux. so that they can be easily grouped) and then sent to the Graphite collector running on 10.0.0.151 using the default TCP port 2003. The script also keeps track of any errors and makes them available through the URL /script/graphite.js/json

The following command line argument loads the script on startup:
-Dscript.file=graphite.js
The following Graphite screen capture below shows a trend of the metric:
There are a virtually infinite number of core and derived metrics that can be collected by sFlow-RT using standard sFlow instrumentation embedded in switches, servers and applications throughout the data center. For example Packet loss describes the importance of collecting network packet loss metrics and including them in performance dashboards.
Figure 2: Visibility and the software defined data center
While having access to all these metrics is extremely useful, not all of them need to be stored in Graphite. Using sFlow-RT to calculate and selectively export high value metrics reduces pressure on the time series database, while still allowing any of the remaining metrics to be polled using the REST API when needed.

Finally, metrics export is only one of many applications for sFlow data, some of which have been described on this blog. The data center wide visibility provided by sFlow-RT supports orchestration tools and allows them to automatically optimize the allocation of compute, storage and application resources and the placement of loads on these resources.

No comments:

Post a Comment