Wednesday, March 29, 2017

Arista EOS telemetry

Arista EOS switches support industry standard sFlow telemetry, enabling hardware instrumentation supported by merchant silicon to export hardware interface counters and flow data. The latest release of the open source Host sFlow agent has been ported to EOS, augmenting the telemetry with standard host CPU, memory, and disk IO metrics.

Linux as a Switch Operating System: Five Lessons Learned identifies benefits of using Linux as the basis for EOS. In this context, the Linux operating system made it easy to port the Host sFlow agent, use standard Linux package management (RPM Package Manager), and gather metrics using standard Linux APIs. A new eAPI module automatically synchronizes the Host sFlow daemon with the EOS sFlow configuration.

The following sflowtool output shows the additional metrics contributed by a Host sFlow agent installed on an Arista switch:
startDatagram =================================
datagramSize 704
unixSecondsUTC 1490843418
datagramVersion 5
agentSubId 100000
packetSequenceNo 714
sysUpTime 0
samplesInPacket 1
startSample ----------------------
sampleType_tag 0:2
sampleSequenceNo 714
sourceId 2:1
counterBlock_tag 0:2001
counterBlock_tag 0:2010
udpInDatagrams 1459
udpNoPorts 16
udpInErrors 0
udpOutDatagrams 4765
udpRcvbufErrors 0
udpSndbufErrors 0
udpInCsumErrors 0
counterBlock_tag 0:2009
tcpRtoAlgorithm 1
tcpRtoMin 200
tcpRtoMax 120000
tcpMaxConn 4294967295
tcpActiveOpens 102
tcpPassiveOpens 100
tcpAttemptFails 0
tcpEstabResets 0
tcpCurrEstab 8
tcpInSegs 19930
tcpOutSegs 19804
tcpRetransSegs 0
tcpInErrs 0
tcpOutRsts 2
tcpInCsumErrors 0
counterBlock_tag 0:2008
icmpInMsgs 1606
icmpInErrors 0
icmpInDestUnreachs 16
icmpInTimeExcds 0
icmpInParamProbs 0
icmpInSrcQuenchs 0
icmpInRedirects 0
icmpInEchos 1590
icmpInEchoReps 0
icmpInTimestamps 0
icmpInAddrMasks 0
icmpInAddrMaskReps 0
icmpOutMsgs 0
icmpOutErrors 1606
icmpOutDestUnreachs 0
icmpOutTimeExcds 16
icmpOutParamProbs 0
icmpOutSrcQuenchs 0
icmpOutRedirects 0
icmpOutEchos 0
icmpOutEchoReps 0
icmpOutTimestamps 1590
icmpOutTimestampReps 0
icmpOutAddrMasks 0
icmpOutAddrMaskReps 0
counterBlock_tag 0:2007
ipForwarding 2
ipDefaultTTL 64
ipInReceives 24685
ipInHdrErrors 0
ipInAddrErrors 42
ipForwDatagrams 0
ipInUnknownProtos 0
ipInDiscards 0
ipInDelivers 23025
ipOutRequests 26170
ipOutDiscards 0
ipOutNoRoutes 0
ipReasmTimeout 0
ipReasmReqds 0
ipReasmOKs 0
ipReasmFails 0
ipFragOKs 4
ipFragFails 0
ipFragCreates 8
counterBlock_tag 0:2005
disk_total 1907843072
disk_free 1083969536
disk_partition_max_used 43.18
disk_reads 16549
disk_bytes_read 1337825280
disk_read_time 7420
disk_writes 412
disk_bytes_written 1159168
disk_write_time 216
counterBlock_tag 0:2004
mem_total 1938849792
mem_free 85483520
mem_shared 0
mem_buffers 106614784
mem_cached 735801344
swap_total 0
swap_free 0
page_in 830716
page_out 566
swap_in 0
swap_out 0
counterBlock_tag 0:2003
cpu_load_one 0.070
cpu_load_five 0.060
cpu_load_fifteen 0.050
cpu_proc_run 0
cpu_proc_total 221
cpu_num 1
cpu_speed 2698
cpu_uptime 17265
cpu_user 272510
cpu_nice 50
cpu_system 178050
cpu_idle 16279880
cpu_wio 550
cpuintr 461060
cpu_sintr 41840
cpuinterrupts 5458397
cpu_contexts 5338141
cpu_steal 0
cpu_guest 0
cpu_guest_nice 0
counterBlock_tag 0:2006
nio_bytes_in 8149749
nio_pkts_in 115730
nio_errs_in 0
nio_drops_in 0
nio_bytes_out 4996846
nio_pkts_out 28451
nio_errs_out 0
nio_drops_out 0
counterBlock_tag 0:2000
hostname leaf1
UUID 33-28-66-a5-82-27-43-49-a5-f1-c1-ba-cc-6c-1d-d3
machine_type 2
os_name 2
os_release 3.4.43.Ar-4170906.4180F
endSample   ----------------------
endDatagram   =================================
There are a number of additional open source and commercial sFlow collectors available.
For example, the diagram shows how new and existing cloud based or locally hosted orchestration, operations, and security tools can leverage the sFlow-RT analytics service to gain real-time visibility.

Installing Host sFlow agent on an Arista switch

The following steps download and install the Host sFlow agent on an Arista switch and direct the telemetry stream to collector

1. Install the Host sFlow agent (hsflowd)
eos# copy extension:
eos# extension hsflowd-eos-2.0.9-1.i686.rpm
eos# bash sudo service hsflowd start
eos# copy installed-extensions boot-extensions
2. Enable eAPI, see eAPI and Unix Domain Socket
eos(config)# management api http-commands
eos(config-mgmt-api-http-cmds)# protocol unix-socket
eos(config-mgmt-api-http-cmds)# no shutdown
3. Configure switch to run hsflowd on startup:
eos(config)# event-handler hsflowd
eos(config-handler-hsflowd)# trigger on-boot
eos(config-handler-hsflowd)# action bash sudo service hsflowd start
eos(config-handler-hsflowd)# delay 60
eos(config-handler-hsflowd)# asynchronous
4. Configure sFlow Introduction to Managing EOS Devices – Setting up Management
eos(config)# sflow source-interface Management1
eos(config)# sflow destination
eos(config)# sflow run
The host metrics should immediately begin to be received at the sFlow collector.

Monday, March 20, 2017


Maximum Performance from Acropolis Hypervisor and Open vSwitch describes the network architecture within a Nutanix converged infrastructure appliance - see diagram above. This article will explore how the Host sFlow agent can be deployed to enable sFlow instrumentation in the Open vSwitch (OVS)  and deliver streaming network and system telemetry from nodes in a Nutanix cluster.
This article is based on a single hardware node running Nutanix Community Edition (CE), built following the instruction in Part I: How to setup a three-node NUC Nutanix CE cluster. If you don't have hardware readily available, the article, 6 Nested Virtualization Resources To Get You Started With Community Edition, describes how to run Nutanix CE as a virtual machine.
The sFlow standard is widely supported by network equipment vendors, which combined with sFlow from each Nutanix appliance, delivers end to end visibility in the Nutanix cluster. The following screen captures from the free sFlowTrend tool are representative examples of the data available from the Nutanix appliance.
The Network > Top N chart displays the top flows traversing OVS. In this case an HTTP connection is responsible for most of the traffic. Inter-VM and external traffic flows traverse OVS and are efficiently monitored by the embedded sFlow instrumentation.
The Hosts > CPU utilization chart shows an increase in CPU utilization due to the increased traffic.
The Hosts > Disk IO shows the Write operations associated with connection.

Installing Host sFlow agent on Nutanix appliance

The following steps install Host sFlow on a Nutanix device:

First log into the Nutanix host as root.

Next, find the latest version of the Centos 7 RPM on and use the following commands to download and install the software:
rpm -ivh hsflowd-centos7-2.0.8-1.x86_64.rpm
rm hsflowd-centos7-2.0.8-1.x86_64.rpm
Edit the /etc/hsflowd.conf file to direct sFlow telemetry to collector, enable KVM monitoring (virtual machine stats), and push sFlow configuration to OVS (network stats):
sflow {
  # collectors:
  collector { ip= udpport=6343 }
  # Open vSwitch sFlow configuration:
  ovs { }
  # KVM (libvirt) hypervisor and VM monitoring:
  kvm { }
Now start the Host sFlow daemon:
systemctl enable hsflowd.service
systemctl start hsflowd.service
Data will immediately start to appear in sFlowTrend.

Wednesday, February 22, 2017


A QUIC update on Google’s experimental transport describes some of the benefits of  the QUIC (Quick UDP Internet Connections) protocol that is now the default transport when Google's Chrome browser connects to Google services (gmail, search, etc.). Given the over 50% market share of the Chrome browser (NetMarketShare) and the popularity of Google services, it is important to be aware of the QUIC protocol and to start tracking its use of network resources.

An easy way to see if you have any QUIC traffic on your network is to use the standard sFlow instrumentation built into network switches. Configure the switches to send sFlow telemetry to an sFlow collector for visibility into network traffic.

For example, use Docker to run the sFlow-RT active-flows application to analyze the sFlow data stream:
docker run -p 6343:6343/udp -p 8008:8008 -d sflow/top-flows
Access the web interface at http://localhost:8008/ and enter the following Flow Specification to monitor QUICK flows:
Note: Real-time domain name lookups describes how sFlow-RT incorporates DNS (Domain Name Service) requests in its real-time analytics pipeline so that traffic flows can be identified by domain name.

The resulting top flows table is shown in the screen capture above. The Google addresses are identifiable by the domain names (What is and it appears that all the traffic is flowing to or from Google services (as one would expect). However, it would be nice to be able to be notified of QUIC traffic that is not associated with Google since this could represent a threat.

The following quic.js script generates events for QUIC traffic to non-Google domains:

setFlowHandler(function(rec) {
Note: Writing Applications gives an overview of sFlow-RT's embedded script API. The script logs events.

Run the script using the following command:
docker run -v `pwd`/quic.js:/sflow-rt/quic.js \
-e "RTPROP=-Ddns.servers=resolv.conf -Dscript.file=quic.js" \
-p 8008:8008 -p 6343:6343/udp sflow/top-flows
The article Exporting events using syslog shows how the script could be modified export events via syslog to SIEM tools such as Logstash and Splunk.

Monday, January 23, 2017

Telegraf, InfluxDB, Chronograf, and Kapacitor

The InfluxData TICK (Telegraf, InfluxDB, Chronograf, Kapacitor) provides a full set of integrated metrics tools, including an agent to export metrics (Telegraf), a time series database to collect and store the metrics (InfluxDB), a dashboard to display metrics (Chronograf), and a data processing engine (Kapacitor). Each of the tools is open sourced and can be used together or separately.
This article will show how industry standard sFlow agents embedded within the data center infrastructure can provide Telegraf metrics to InfluxDB. The solution uses sFlow-RT as a proxy to convert sFlow metrics into their Telegraf equivalent form so that they are immediately visible through the default Chronograf dashboards (Using a proxy to feed metrics into Ganglia described a similar approach for sending metrics to Ganglia).

The following telegraf.js script instructs sFlow-RT to periodically export host metrics to InfluxDB:
var influxdb = "";

function sendToInfluxDB(msg) {
  if(!msg || !msg.length) return;
  var req = {
  req.error = function(e) {
    logWarning('InfluxDB POST failed, error=' + e);
  try { httpAsync(req); }
  catch(e) {
    logWarning('bad request ' + req.url + ' ' + e);

var metric_names = [

var ntoi;
function mVal(row,name) {
  if(!ntoi) {
    ntoi = {};
    for(var i = 0; i < metric_names.length; i++) {
      ntoi[metric_names[i]] = i;
  return row[ntoi[name]].metricValue;

setIntervalHandler(function() {
  var i,r,msg = [];
  var vals = table('ALL',metric_names);
  for(i = 0; i < vals.length; i++) {
    r = vals[i];

    // Telegraf System plugin metrics
      +' load1='+mVal(r,'load_one')
      +' uptime='+mVal(r,'uptime')+'i');

    // Telegraf CPU plugin metrics
      +' usage_user='+(mVal(r,'cpu_user')||0)
Some notes on the script:
  1. The sentToInfluxDB() function uses the Writing data using the HTTP API to POST metrics to InfluxDB.
  2. The setIntervalHandler function retrieves a table of metrics from sFlow-RT every 15 seconds and formats them to use the same names and tags as Telegraf.
  3. The script implements Telegraf System and CPU plugin functionality.
  4. Additional metrics can easily be added to proxy additional Telegraf plugins.
  5. Writing applications provides an overview of the sFlow-RT APIs.
Start gathering metrics:
docker run -v `pwd`/telegraf.js:/sflow-rt/telegraf.js \
-e "RTPROP=-Dscript.file=telegraf.js" \
-p 8008:8008 -p 6343:6343/udp sflow/sflow-rt
Accessing the Chronograf home page brings up a table of hosts with their status and CPU load:
Clicking on the leaf1 host displays a dashboard trending key performance metrics:
Pre-processing the metrics using sFlow-RT's real-time streaming analytics engine can greatly increase scaleability by selectively exporting metrics and calculating higher level summary statistics in order to reduce the amount of data logged to the time series database. The analytics pipeline can also augment the metrics with additional metadata.
For example, Collecting Docker Swarm service metrics demonstrates how sFlow-RT can monitor dynamic service pools running under Docker Swarm and write summary statistics to InfluxDB. In this case Grafana was used to build metrics dashboard instead of Chronograf.

The open source Host sFlow agent exports an extensive range of standard sFlow metrics and has been ported to a wide range of platforms. Standard metrics describes how standardization helps reduce operational complexity. The overlap between standard sFlow metrics and Telegraf base plugin metrics makes the task of proxying straightforward.
The Host sFlow agent (and sFlow agents embedded in network switches and routers) goes beyond simple metrics export to provide detailed visibility into network traffic and articles on this blog demonstrate how sFlow-RT analytics software can be configured to generate detailed traffic flow metrics that can be streamed into InfluxDB, logged (e.g. Exporting events using syslog), or trigger control actions (e.g. DDoS mitigationDocker 1.12 swarm mode elastic load balancing).