Friday, December 16, 2016

Using Ganglia to monitor Linux services

The screen capture from the Ganglia monitoring tool shows metrics for services running on a Linux host. Monitoring Linux services describes how the open source Host sFlow agent has been extended to export standard Virtual Node metrics from services running under systemd. Ganglia already supports these standard metrics and the article Using Ganglia to monitor virtual machine pools describes the configuration steps needed to enable this feature.

Thursday, December 15, 2016

Monitoring Linux services

Mainstream Linux distributions have moved to systemd to manage daemons (e.g. httpd, sshd, etc.). The diagram illustrates how systemd runs each daemon within its own container so that it can maintain tight control of the daemon's resources.

This article describes how to use the open source Host sFlow agent to gather telemetry from daemons running under systemd.

Host sFlow systemd monitoring exports a standard set of metrics for each systemd service - the sFlow Host Structures extension defines metrics for Virtual Nodes (virtual machines, containers, etc.) that are used to export Xen, KVM, Docker, and Java resource usage. Exporting the standard metrics for systemd services provides interoperability with sFlow analyzers, allowing them to report on Linux services using existing virtual node monitoring capabilities.

While running daemons within containers helps systemd maintain control of the resources, it also provides a very useful abstraction for monitoring. For example, a single service (like the Apache web server) may consist of dozens of processes. Reporting on container level metrics abstracts away the per-process details and gives a view of the total resources consumed by the service. In addition, service metadata (like the service name) provides a useful way of identifying and grouping services, for example, making it easy to report on total CPU consumed by the web service across a pool of servers.

Systemd monitoring is easy to set up.

First download and install the latest software release.

Next, enable the systemd module by adding the highlighted line in the /etc/hsflowd.conf file:
sflow{
  collector{ ip=10.0.0.1 }
  systemd{}
}
This is a minimal configuration that sends sFlow telemetry to a collector running on host 10.0.0.1. The Host sFlow agent is capable of gathering an extensive set of network, system and application level metrics. See Configuring Host sFlow for Linux for a full set of options.

Finally, start the agent:
sudo systemctl enable hsflowd.service
sudo systemctl start hsflowd.service
For the best accuracy, enable systemd cgroup accounting by adding the following entries to the /etc/systemd/system.conf file and rebooting the server:
DefaultCPUAccounting=yes
DefaultBlockIOAccounting=yes
DefaultMemoryAccounting=yes
The Host sFlow agent will automatically detect when cgroup accounting has been enabled. However, if cgroup accounting hasn't been enabled, it is still able to compute and export statistics, although it might miss contributions from short lived processes.

Once the agents have been configured, verify that sFlow telemetry is being received at the collector using sflowtool. The simplest way to run sflowtool is using Docker:
docker run -p 6343:6343/udp sflow/sflowtool
The following output shows the statistics exported for the apache2 service:
startSample ----------------------
sampleType_tag 0:2
sampleType COUNTERSSAMPLE
sampleSequenceNo 50
sourceId 3:112270
counterBlock_tag 0:2103
vdsk_capacity 0
vdsk_allocation 0
vdsk_available 0
vdsk_rd_req 0
vdsk_rd_bytes 0
vdsk_wr_req 0
vdsk_wr_bytes 0
vdsk_errs 0
counterBlock_tag 0:2102
vmem_memory 16674816
vmem_maxMemory 0
counterBlock_tag 0:2101
vcpu_state 1
vcpu_cpu_mS 180
vcpu_cpuCount 0
counterBlock_tag 0:2002
parent_dsClass 2
parent_dsIndex 1
counterBlock_tag 0:2000
hostname apache2.service
UUID 92-53-c6-17-60-65-52-a2-ac-f7-76-cb-7b-63-d9-23
machine_type 3
os_name 2
os_release 4.4.0-45-generic
endSample   ----------------------
Install Host sFlow agents on all the hosts in the data center for comprehensive visibility.

Thursday, December 1, 2016

IPv6 Internet router using merchant silicon

Internet router using merchant silicon describes how a commodity white box switch can be used as a replacement for an expensive Internet router. The solution combines standard sFlow instrumentation implemented in merchant silicon with BGP routing information to selectively install only active routes into the hardware.

The article describes a simple self contained solution that uses standard APIs and should be able to run on a variety of Linux based network operating systems, including: Cumulus Linux, Dell OS10, Arista EOS, and Cisco NX-OS.

The diagram shows the elements of the solution. Standard sFlow instrumentation embedded in the merchant silicon ASIC data plane in the white box switch provides real-time information on traffic flowing through the switch. The sFlow agent is configured to send the sFlow to an instance of sFlow-RT running on the switch. The Bird routing daemon is used to handle the BGP peering sessions and to install routes in the Linux kernel using the standard netlink interface. The network operating system in turn programs the switch ASIC with the kernel routes so that packets are forwarded by the switch hardware and not by the kernel software.

The key to this solution is Bird's multi-table capabilities. The full Internet routing table learned from BGP peers is installed in a user space table that is not reflected into the kernel. A BGP session between sFlow-RT analytics software and Bird allows sFlow-RT to see the full routing table and combine it with the sFlow telemetry to perform real-time BGP route analytics and identify the currently active routes. A second BGP session allows sFlow-RT to push routes to Bird which in turn pushes the active routes to the kernel, programming the ASIC.

This article extends the previous example to add IPv6 routing. In this example, the following Bird configuration, /etc/bird/bird6.conf, was installed on the switch:
# Please refer to the documentation in the bird-doc package or BIRD User's
# Guide on http://bird.network.cz/ for more information on configuring BIRD and
# adding routing protocols.

# Change this into your BIRD router ID. It's a world-wide unique identification
# of your router, usually one of router's IPv6 addresses.
router id 10.0.0.136;

# The Kernel protocol is not a real routing protocol. Instead of communicating
# with other routers in the network, it performs synchronization of BIRD's
# routing tables with the OS kernel.
protocol kernel {
 scan time 60;
        scan time 2;
 import all;
 export all;
}

# The Device protocol is not a real routing protocol. It doesn't generate any
# routes and it only serves as a module for getting information about network
# interfaces from the kernel. 
protocol device {
 scan time 60;
}

protocol direct {
        interface "*";
}

# Create a new table (disconnected from kernel/master) for peering routes
table peers;

protocol bgp peer_65134 {
  table peers;
  igp table master;
  local as 65136;
  neighbor fc00:136::2 as 65134;
  source address fc00:136::1;
  import all;
  export all;
}

protocol bgp peer_65135 {
  table peers;
  igp table master;
  local as 65136;
  neighbor fc00:136::3 as 65135;
  source address fc00:136::1;
  import all;
  export all;
}

# Copy default route from peers table to master table
protocol pipe {
  table peers;
  peer table master;
  import none;
  export filter {
     if net ~ [ ::/0 ] then accept;
     reject;
  };
}

# Reflect peers table to sFlow-RT
protocol bgp to_sflow_rt {
  table peers;
  igp table master;
  local as 65136;
  neighbor ::1 port 1179 as 65136;
  import none;
  export all;
}

# Receive active prefixes from sFlow-RT
protocol bgp from_sflow_rt {
  local as 65136;
  neighbor fc00:136::1 port 1179 as 65136;
  import all;
  export none;
}
The open source Active Route Manager (ARM) application has been installed in sFlow-RT and the following sFlow-RT configuration, /usr/local/sflow-rt/conf.d/sflow-rt.conf, adds the IPv6 BGP route reflector and control sessions with Bird:
bgp.start=yes
arm.reflector.ip=127.0.0.1
arm.reflector.ip6=::1
arm.reflector.as=65136
arm.reflector.id=0.0.0.1
arm.sflow.ip=10.0.0.136
arm.target.ip = 10.0.0.136
arm.target.ip6=fc00:136::1
arm.target.as=65136
arm.target.id=0.0.0.2
arm.target.prefixes=10000
arm.target.prefixes6=5000
Once configured, operation is entirely automatic. As soon as traffic starts flowing to a new route, the route is identified and installed in the ASIC. If the route later becomes inactive, it is automatically removed from the ASIC to be replaced with a different active route. In this case, the maximum number of routes allowed in the ASIC has been specified as 5,000. This number can be changed to reflect the capacity of the hardware.
The Active Route Manager application has a web interface that provides up to the second visibility into the number of routes, routes installed in hardware, amount of traffic, hardware and software resource utilization etc. In addition, the sFlow-RT REST API can be used to make additional queries.

Thursday, November 17, 2016

Monitoring at Terabit speeds

The chart was generated from industry standard sFlow telemetry from the switches and routers comprising The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16) network. The chart shows a number of conference participants pushing the network to see how much data they can transfer, peaking at a combined bandwidth of 3 Terabits/second over a minute just before noon and sustaining over 2.5 Terabits/second for over an hour. The traffic is broken out by MAC vendors code: routed traffic can be identified by router vendor (Juniper, Brocade, etc.) and layer 2 transfers (RDMA over Converged Ethernet) are identified by host adapter vendor codes (Mellanox, Hewlett-Packard Enterprise, etc.).

From the SCinet web page, "The Fastest Network Connecting the Fastest Computers: SC16 will host the most powerful and advanced networks in the world – SCinet. Created each year for the conference, SCinet brings to life a very high-capacity network that supports the revolutionary applications and experiments that are a hallmark of the SC conference."

SC16 live real-time weathermaps provides additional demonstrations of high performance network monitoring.

Sunday, November 13, 2016

SC16 live real-time weathermaps

Connect to https://inmon.sc16.org/sflow-rt/app/sc16-weather/html/ between now and November 17th to see a real-time heat map of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16) network.

From the SCinet web page, "The Fastest Network Connecting the Fastest Computers: SC16 will host the most powerful and advanced networks in the world – SCinet. Created each year for the conference, SCinet brings to life a very high-capacity network that supports the revolutionary applications and experiments that are a hallmark of the SC conference."

The real-time weathermap leverages industry standard sFlow instrumentation built into network switch and router hardware to provide scaleable monitoring of the SCinet network. Link colors are updated every second to reflect operational status and utilization of each link.
Clicking on a link in the map pops up a 1 second resolution strip chart showing the protocol mix carried by the link.
OSiRIS (Open Storage Research Infrastructure) is a "distributed, multi-institutional storage infrastructure that lets researchers write, manage, and share data from their own computing facility locations."

Connect to http://inmon.sc16.org/sflow-rt/app/OSiRIS-weather/html/ to see an animated diagram of the SC16 OSiRIS demonstration connecting SCinet with University of Michigan, Michigan State, Wayne State, Indiana University, USGS, and Utah Cloudlab. Click on any of the links in the diagram to see traffic.
Connect to https://inmon.sc16.org/sflow-rt/app/world-map/html/ to see a real-time view of traffic from SCinet to different countries.

The SCinet real-time weathermaps were constructed using open source components (https://github.com/pphaal/sc15-weather, https://github.com/sflow-rt/svg-weather, https://github.com/sflow-rt/dashboard-example, and https://github.com/sflow-rt/world-map) running on a single instance of the sFlow-RT real-time analytics engine. See Writing Applications and download sFlow-RT to see what you can build.

Tuesday, October 18, 2016

Network performance monitoring

Today, network performance monitoring typically relies on probe devices to perform active tests and/or observe network traffic in order to try and infer performance. This article demonstrates that hosts already track network performance and that exporting host-based network performance information provides an attractive alternative to complex and expensive in-network approaches.
# tcpdump -ni eth0 tcp
11:29:28.949783 IP 10.0.0.162.ssh > 10.0.0.70.56174: Flags [P.], seq 1424968:1425312, ack 1081, win 218, options [nop,nop,TS val 2823262261 ecr 2337599335], length 344
11:29:28.950393 IP 10.0.0.70.56174 > 10.0.0.162.ssh: Flags [.], ack 1425312, win 4085, options [nop,nop,TS val 2337599335 ecr 2823262261], length 0
The host TCP/IP stack continuously measured round trip time and estimates available bandwidth for each active connection as part of its normal operation. The tcpdump output shown above highlights timestamp information that is exchanged in TCP packets to provide the accurate round trip time measurements needed for reliable high speed data transfer.

The open source Host sFlow agent already makes use of Berkeley Packet Filter (BPF) capability on Linux to efficiently sample packets and provide visibility into traffic flows. Adding support for the tcp_diag kernel module allows the detailed performance metrics maintained in the Linux TCP stack to be attached to each sampled TCP packet.
enum packet_direction {
  unknown  = 0,
  received = 1,
  sent     = 2
}

/* TCP connection state */
/* Based on Linux struct tcp_info */
/* opaque = flow_data; enterprise=0; format=2209 */
struct extended_tcp_info {
  packet_direction dir;     /* Sampled packet direction */
  unsigned int snd_mss;     /* Cached effective mss, not including SACKS */
  unsigned int rcv_mss;     /* Max. recv. segment size */
  unsigned int unacked;     /* Packets which are "in flight" */
  unsigned int lost;        /* Lost packets */
  unsigned int retrans;     /* Retransmitted packets */
  unsigned int pmtu;        /* Last pmtu seen by socket */
  unsigned int rtt;         /* smoothed RTT (microseconds) */
  unsigned int rttvar;      /* RTT variance (microseconds) */
  unsigned int snd_cwnd;    /* Sending congestion window */
  unsigned int reordering;  /* Reordering */
  unsigned int min_rtt;     /* Minimum RTT (microseconds) */
}
The sFlow telemetry protocol is extensible, and the above structure was added to transport network performance metrics along with the sampled TCP packet.
startSample ----------------------
sampleType_tag 0:1
sampleType FLOWSAMPLE
sampleSequenceNo 153026
sourceId 0:2
meanSkipCount 10
samplePool 1530260
dropEvents 0
inputPort 1073741823
outputPort 2
flowBlock_tag 0:2209
tcpinfo_direction sent
tcpinfo_send_mss 1448
tcpinfo_receive_mss 536
tcpinfo_unacked_pkts 0
tcpinfo_lost_pkts 0
tcpinfo_retrans_pkts 0
tcpinfo_path_mtu 1500
tcpinfo_rtt_uS 773
tcpinfo_rtt_uS_var 137
tcpinfo_send_congestion_win 10
tcpinfo_reordering 3
tcpinfo_rtt_uS_min 0
flowBlock_tag 0:1
flowSampleType HEADER
headerProtocol 1
sampledPacketSize 84
strippedBytes 4
headerLen 66
headerBytes 08-00-27-09-5C-F7-08-00-27-B8-32-6D-08-00-45-C0-00-34-60-79-40-00-01-06-03-7E-0A-00-00-88-0A-00-00-86-84-47-00-B3-50-6C-E7-E7-D8-49-29-17-80-10-00-ED-15-34-00-00-01-01-08-0A-18-09-85-3A-23-8C-C6-61
dstMAC 080027095cf7
srcMAC 080027b8326d
IPSize 66
ip.tot_len 52
srcIP 10.0.0.136
dstIP 10.0.0.134
IPProtocol 6
IPTOS 192
IPTTL 1
IPID 31072
TCPSrcPort 33863
TCPDstPort 179
TCPFlags 16
endSample   ----------------------
The sflowtool output shown above provides an example. The tcp_info values are highlighted.

Combining performance data and packet headers delivers a telemetry stream that is far more useful than either measurement on its own. There are hundreds of attributes and billions of values that can be decoded from the packet header resulting in a virtually infinite number of permutations that combine with the network performance data.

For example, the chart at the top of this article uses sFlow-RT real-time analytics software to combine telemetry from multiple hosts and generate an up to the second view of network performance, plotting round trip time by Country.

This solution leverages the TCP/IP stack to turn every host and its clients (desktops, laptops, tablets, smartphones, IoT devices, etc.) into a network performance monitoring probe - continuously streaming telemetry gathered from normal network activity.

A host-based approach to network performance monitoring is well suited to public cloud deployments, where lack of access to the physical network resources challenges in-network approaches to monitoring.
More generally, each network, host and application entity maintains state as part of its normal operation (for example, the TCP metrics in the host). However, the information is incomplete and of limited value when it is stranded within each device. The sFlow standard specifies a unified data model and efficient transport that allows each element to stream measurements and related meta-data to analytics software where the information is combined to provide a comprehensive view of performance.

Thursday, October 13, 2016

Real-time domain name lookups

Reverse DNS requests request the domain name associated with an IP address, for example providing the name google-public-dns-a.google.com for IP address 8.8.8.8.  This article demonstrates how the sFlow-RT engine incorporates domain name lookups in real-time flow analytics.

First, use the dns.servers System Property is used to specify one or more DNS servers to handle the reverse lookup requests. For example, the following command uses Docker to run sFlow-RT with DNS lookups directed to server 10.0.0.1:
docker run -e "RTPROP=-Ddns.servers=10.0.0.1" \
-p 8008:8008 -p 6343:6343/udp -d sflow/sflow-rt
The following Python script dnspair.py uses the sFlow-RT REST API to define a flow and log the resulting flow records:
#!/usr/bin/env python
import requests
import json

flow = {'keys':'dns:ipsource,dns:ipdestination',
 'value':'bytes','activeTimeout':10,'log':True}
requests.put('http://localhost:8008/flow/dnspair/json',data=json.dumps(flow))
flowurl = 'http://localhost:8008/flows/json?name=dnspair&maxFlows=10&timeout=60'
flowID = -1
while 1 == 1:
  r = requests.get(flowurl + "&flowID=" + str(flowID))
  if r.status_code != 200: break
  flows = r.json()
  if len(flows) == 0: continue

  flowID = flows[0]["flowID"]
  flows.reverse()
  for f in flows:
    print json.dumps(f,indent=1)
Running the script generates the following output:
$ ./dnspair.py
{
 "value": 233370.92322668363, 
 "end": 1476234478177, 
 "name": "dnspair", 
 "flowID": 1523, 
 "agent": "10.0.0.20", 
 "start": 1476234466195, 
 "dataSource": "10", 
 "flowKeys": "xenvm11.sf.inmon.com.,dhcp20.sf.inmon.com."
}
{
 "value": 39692.88754760739, 
 "end": 1476234478177, 
 "name": "dnspair", 
 "flowID": 1524, 
 "agent": "10.0.0.20", 
 "start": 1476234466195, 
 "dataSource": "10", 
 "flowKeys": "xenvm11.sf.inmon.com.,switch.sf.inmon.com."
}
The token dns:ipsource in the flow definition is an example of a Key Function. Functions can be combined to define flow keys or in filters.
or:[dns:ipsource]:ipsource
Returns a dns name if available, otherwise the original IP address is returned
suffix:[dns:ipsource]:.:3
Returns the last 2 parts of the DNS name, e.g. xenvm11.sf.inmon.com. becomes inmon.com.

DNS results are cached by the dns: function in order to provide real-time lookups and reduce the load on the backend name server(s). Cache size and timeout settings are tune-able using System Properties.

Monday, October 10, 2016

Collecting Docker Swarm service metrics

This article demonstrates how to address the challenge of monitoring dynamic Docker Swarm deployments and track service performance metrics using existing on-premises and cloud monitoring tools like Ganglia, Graphite, InfluxDB, Grafana, SignalFX, Librato, etc.

In this example, Docker Swarm is used to deploy a simple web service on a four node cluster:
docker service create --replicas 2 -p 80:80 --name apache httpd:2.4
Next, the following script tests the agility of monitoring systems by constantly changing the number of replicas in the service:
#!/bin/bash
while true
do
  docker service scale apache=$(( ( RANDOM % 20 )  + 1 ))
  sleep 30
done
The above test is easy to set up and is a quick way to stress test monitoring systems and reveal accuracy and performance problems when they are confronted with container workloads.

Many approaches to gathering and recording metrics were developed for static environments and have a great deal of difficulty tracking rapidly changing container-based service pools without missing information, leaking resources, and slowing down. For example, each new container in Docker Swarm has unique name, e.g. apache.16.17w67u9157wlri7trd854x6q0. Monitoring solutions that record container names, or even worse, index data by container name, will suffer from bloated databases and resulting slow queries.

The solution is to insert a stream processing analytics stage in the metrics pipeline that delivers a consistent set of service level metrics to existing tools.
The asynchronous metrics export method implemented in the open source Host sFlow agent is part of the solution, sending a real-time telemetry stream to a centralized sFlow collector which is then able to deliver a comprehensive view of all services deployed on the Docker Swarm cluster.

The sFlow-RT real-time analytics engine completes the solution by converting the detailed per instance metrics into service level statistics which are in turn streamed to a time series database where they drive operational dashboards.

For example, the following swarmmetrics.js script computes cluster and service level metrics and exports them to InfluxDB:
var docker = "https://10.0.0.134:2376/services";
var certs = '/tls/';

var influxdb = "http://10.0.0.50:8086/write?db=docker"

var clustermetrics = [
  'avg:load_one',
  'max:cpu_steal',
  'sum:node_domains'
];

var servicemetrics = [
  'avg:vir_cpu_utilization',
  'avg:vir_bytes_in',
  'avg:vir_bytes_out'
];

function sendToInfluxDB(msg) {
  if(!msg || !msg.length) return;

  var req = {
    url:influxdb,
    operation:'POST',
    headers:{"Content-Type":"text/plain"},
    body:msg.join('\n')
  };
  req.error = function(e) {
    logWarning('InfluxDB POST failed, error=' + e);
  }
  try { httpAsync(req); }
  catch(e) {
    logWarning('bad request ' + req.url + ' ' + e);
  }
}

function clusterMetrics(nservices) {
  var vals = metric(
    'ALL', clustermetrics,
    {'node_domains':['*'],'host_name':['vx*host*']}
  );
  var msg = [];
  msg.push('swarm.services value='+nservices);
  msg.push('nodes value='+(vals[0].metricN || 0));
  for(var i = 0; i < vals.length; i++) {
    let val = vals[i];
    msg.push(val.metricName+' value='+ (val.metricValue || 0));
  } 
  sendToInfluxDB(msg);
}

function serviceMetrics(name, replicas) {
  var vals = metric(
    'ALL', servicemetrics,
    {'vir_host_name':[name+'\\.*'],'vir_cpu_state':['running']}
  );
  var msg = [];
  msg.push('replicas_configured,service='+name+' value='+replicas);
  msg.push('replicas_measured,service='+name+' value='+(vals[0].metricN || 0));
  for(var i = 0; i < vals.length; i++) {
    let val = vals[i];
    msg.push(val.metricName+',service='+name+' value='+(val.metricValue || 0));
  }
  sendToInfluxDB(msg);
}

setIntervalHandler(function() {
  var i, services, service, spec, name, replicas, res;
  try { services = JSON.parse(http2({url:docker, certs:certs}).body); }
  catch(e) { logWarning("cannot get " + docker + " error=" + e); }
  if(!services || !services.length) return;

  clusterMetrics(services.length);

  for(i = 0; i < services.length; i++) {
    service = services[i];
    if(!service) continue;
    spec = service["Spec"];
    if(!spec) continue;
    name = spec["Name"];
    if(!name) continue;
 
    replicas = spec["Mode"]["Replicated"]["Replicas"];
    serviceMetrics(name, replicas);
  }
},10);
Some notes on the script:
  1. Only a few representative metrics are being monitored, many more are available, see Metrics.
  2. The setIntervalHandler function is run every 10 seconds. The function queries Docker REST API for the current list of services and then calculates summary statistics for each service. The summary statistics are then pushed to InfluxDB via a REST API call.
  3. Cluster performance metrics describes the set of summary statistics that can be calculated.
  4. Writing Applications provides additional information on sFlow-RT scripting and REST APIs.
Start gathering metrics:
docker run -v `pwd`/tls:/tls -v `pwd`/swarmmetrics.js:/sflow-rt/swarmmetrics.js \
-e "RTPROP=-Dscript.file=swarmmetrics.js" \
-p 8008:8008 -p 6343:6343/udp sflow/sflow-rt
The results are shown in the Grafana dashboard at the top of this article. The charts show 30 minutes of data. The top Replicas by Service chart compares the number of replicas configured for each service with the number of container instances that the monitoring system is tracking. The chart demonstrates that the monitoring system is accurately tracking the rapidly changing service pool and able to deliver reliable metrics. The middle Network IO by Service chart shows a brief spike in network activity whenever the number of instances in the apache service is increased. Finally, the bottom Cluster Size chart confirms that all four nodes in the Swarm cluster are being monitored.

This solution is extremely scaleable. For example, increasing the size of the cluster from 4 to 1,000 nodes increases the amount of raw data that sFlow-RT needs to process to accurately calculate service metrics, but has have no effect on the amount of data sent to the time series database and so there is no increase in storage requirements or query response time.
Pre-processing the stream of raw data reduces the cost of the monitoring solution, either in terms of the resources required by an on-premises monitoring solutions, or the direct costs of cloud based solutions which charge per data point per minute per month. In this case the raw telemetry stream contains hundreds of thousands of potential data points per minute per host - filtering and summarizing the data reduces monitoring costs by many orders of magnitude.
This example can easily be modified to send data into any on-premises or cloud based backend, examples in this blog include: SignalFX, Librato, Graphite and Ganglia. In addition, Docker 1.12 swarm mode elastic load balancing describes how the same architecture can be used to dynamically resize service pools to meet changing demand.

Tuesday, September 27, 2016

Docker 1.12 swarm mode elastic load balancing


Docker Built-In Orchestration Ready For Production: Docker 1.12 Goes GA describes the native swarm mode feature that integrates cluster management, virtual networking, and policy based deployment of services.

This article will demonstrate how real-time streaming telemetry can be used to construct an elastic load balancing solution that dynamically adjusts service capacity to match changing demand.

Getting started with swarm mode describes the steps to configure a swarm cluster. For example, following command issued on any of the Manager nodes deploys a web service on the cluster:
docker service create --replicas 2 -p 80:80 --name apache httpd:2.4
And the following command raises the number of containers in the service pool from 2 to 4:
docker service scale apache=4
Asynchronous Docker metrics describes how sFlow telemetry provides the real-time visibility required for elastic load balancing. The diagram shows how streaming telemetry allows the sFlow-RT controller to determine the load on the service pool so that it can use the Docker service API to automatically increase or decrease the size of the pool as demand changes. Elastic load balancing of the service pools ensures consistent service levels by adding additional resources if demand increases. In addition, efficiency is improved by releasing resources when demand drops so that they can be used by other services. Finally, global visibility into all resources and services makes it possible to load balance between services, reducing service pools for non-critical services to release resources during peak demand.

The first step is to install and configure Host sFlow agents on each of the nodes in the Docker swarm cluster. The following /etc/hsflowd.conf file configures Host sFlow to monitor Docker and send sFlow telemetry to a designated collector (in this case 10.0.0.162):
sflow {
  sampling = 400
  polling = 10
  collector { ip = 10.0.0.162 } 
  docker { }
  pcap { dev = docker0 }
  pcap { dev = docker_gwbridge }
}
Note: The configuration file is identical for all nodes in the cluster making it easy to automate the installation and configuration of sFlow monitoring using  Puppet, Chef, Ansible, etc.

Verify that the sFlow measurements are arriving at the collector node (10.0.0.162) using sflowtool:
docker -p 6343:6343/udp sflow/sflowtool
The following elb.js script implements elastic load balancer functionality using the sFlow-RT real-time analytics engine:
var api = "https://10.0.0.134:2376";
var certs = '/tls/';
var service = 'apache';

var replicas_min = 1;
var replicas_max = 10;
var util_min = 0.5;
var util_max = 1;
var bytes_min = 50000;
var bytes_max = 100000;
var enabled = false;

function getInfo(name) {
  var info = null;
  var url = api+'/services/'+name;
  try { info = JSON.parse(http2({url:url, certs:certs}).body); }
  catch(e) { logWarning("cannot get " + url + " error=" + e); }
  return info;
}

function setReplicas(name,count,info) {
  var version = info["Version"]["Index"];
  var spec = info["Spec"];
  spec["Mode"]["Replicated"]["Replicas"]=count;
  var url = api+'/v1.24/services/'+info["ID"]+'/update?version='+version;
  try {
    http2({
      url:url, certs:certs, method:'POST',
      headers:{'Content-Type':'application/json'},
      body:JSON.stringify(spec)
    });
  }
  catch(e) { logWarning("cannot post to " + url + " error=" + e); }
  logInfo(service+" set replicas="+count);
}

var hostpat = service+'\\.*';
setIntervalHandler(function() {
  var info = getInfo(service);
  if(!info) return;

  var replicas = info["Spec"]["Mode"]["Replicated"]["Replicas"];
  if(!replicas) {
    logWarning("no active members for service=" + service);
    return;
  }

  var res = metric(
    'ALL', 'avg:vir_cpu_utilization,avg:vir_bytes_in,avg:vir_bytes_out',
    {'vir_host_name':[hostpat],'vir_cpu_state':['running']}
  );

  var n = res[0].metricN;

  // we aren't seeing all the containers (yet)
  if(replicas !== n) return;

  var util = res[0].metricValue;
  var bytes = res[1].metricValue + res[2].metricValue;

  if(!enabled) return;

  // load balance
  if(replicas < replicas_max && (util > util_max || bytes > bytes_max)) {
    setReplicas(service,replicas+1,info);
  }
  else if(replicas > replicas_min && util < util_min && bytes < bytes_min) {
    setReplicas(service,replicas-1,info);
  }
},2);

setHttpHandler(function(req) {
  enabled = req.query && req.query.state && req.query.state[0] === 'enabled';
  return enabled ? "enabled" : "disabled";
});
Some notes on the script:
  1. The setReplicas(name,count,info) function uses the Docker Remote API to implement functionality equivalent to the docker service scale name=count command shown earlier. The REST API is accessible at https://10.0.0.134:2376 in this example.
  2. The setIntervalHandler() function runs every 2 seconds, retrieving metrics for the service pool and scaling the number of replicas in the service up or down based on thresholds.
  3. The setHttpHandler() function exposes a simple REST API for enabling / disabling the load balancer functionality. The API can easily be extended to all thresholds to be set, to report statistics, etc.
  4. Certificates, key.pem, cert.pem, and ca.pem, required to authenticate API requests must be present in the /tls/ directory.
  5. The thresholds are set to unrealistically low values for the purpose of this demonstration.
  6. The script can easily be extended to load balance multiple services simultaneously.
  7. Writing Applications provides additional information on sFlow-RT scripting.
Run the controller:
docker run -v `pwd`/tls:/tls -v `pwd`/elb.js:/sflow-rt/elb.js \
 -e "RTPROP=-Dscript.file=elb.js" -p 8008:8008 -p 6343:6343/udp -d sflow/sflow-rt
The autoscaling functionality can be enabled:
curl "http://localhost:8008/script/elb.js/json?state=enabled"
and disabled:
curl "http://localhost:8008/script/elb.js/json?state=disabled"
using the REST API exposed by the script.
The chart above shows the results of a simple test to demonstrate the elastic load balancer function. First, ab - Apache HTTP server benchmarking tool was used to generate load on the apache service running under Docker swarm:
ab -rt 60 -n 300000 -c 4 http://10.0.0.134/
Next, the test was repeated with the elastic load balancer enabled. The chart clearly shows that the load balancer is keeping the average network load on each container under control.
2016-09-24T00:57:10+0000 INFO: Listening, sFlow port 6343
2016-09-24T00:57:10+0000 INFO: Listening, HTTP port 8008
2016-09-24T00:57:10+0000 INFO: elb.js started
2016-09-24T01:00:17+0000 INFO: apache set replicas=2
2016-09-24T01:00:23+0000 INFO: apache set replicas=3
2016-09-24T01:00:27+0000 INFO: apache set replicas=4
2016-09-24T01:00:33+0000 INFO: apache set replicas=5
2016-09-24T01:00:41+0000 INFO: apache set replicas=6
2016-09-24T01:00:47+0000 INFO: apache set replicas=7
2016-09-24T01:00:59+0000 INFO: apache set replicas=8
2016-09-24T01:01:29+0000 INFO: apache set replicas=7
2016-09-24T01:01:33+0000 INFO: apache set replicas=6
2016-09-24T01:01:35+0000 INFO: apache set replicas=5
2016-09-24T01:01:39+0000 INFO: apache set replicas=4
2016-09-24T01:01:43+0000 INFO: apache set replicas=3
2016-09-24T01:01:45+0000 INFO: apache set replicas=2
2016-09-24T01:01:47+0000 INFO: apache set replicas=1
The sFlow-RT log shows that containers are added to the apache service to handle the increased load and removed once demand decreases.

This example relied on a small subset of the information available from the sFlow telemetry stream. In addition to container resource utilization, the Host sFlow agent exports an extensive set of metrics from the nodes in the Docker swarm cluster. If the nodes are virtual machines running in a public or private cloud, the metrics can be used to perform elastic load balancing of the virtual machine pool making up the cluster, increasing the cluster size if demand increases and reducing cluster size when demand decreases. In addition, poorly performing instances can be detected and removed from the cluster (see Stop thief! for an example).
The sFlow agents also efficiently report on traffic flowing within and between microservices running on the swarm cluster. For example, the following command:
docker run -p 6343:6343/udp -p 8008:8008 -d sflow/top-flows
launches the top-flows application to show an up to the second view of active flows in the network.

Comprehensive real-time analytics is critical to effectively managing agile container-bases infrastructure. Open source Host sFlow agents provide a lightweight method of instrumenting the infrastructure that unifies network and system monitoring to deliver a full set of standard metrics to performance management applications.

Monday, September 26, 2016

Asynchronous Docker metrics

Docker allows large numbers of lightweight containers can be started and stopped within seconds, creating an agile infrastructure that can rapidly adapt to changing requirements. However, the rapidly changing populating of containers poses a challenge to traditional methods of monitoring which struggle to keep pace with the changes. For example, periodic polling methods take time to detect new containers and can miss short lived containers entirely.

This article describes how the latest version of the Host sFlow agent is able to track the performance of a rapidly changing population of Docker containers and export a real-time stream of standard sFlow metrics.
The diagram above shows the life cycle status events associated with a container. The Docker Remote API provides a set of methods that allow the Host sFlow agent to communicate with the Docker to list containers and receive asynchronous container status events. The Host sFlow agent uses the events to keep track of running containers and periodically exports cpu, memory, network and disk performance counters for each container.

The diagram at the beginning of this article shows the sequence of messages, going from top to bottom, required to track a container. The Host sFlow agent first registers for container lifecycle events before asking for all the currently running containers. Later, when a new container is started, Docker immediately sends an event to the Host sFlow agent, which requests additional information (such as the container process identifier - PID) that it can use to retrieve performance counters from the operating system. Initial counter values are retrieved and exported along with container identity information as an sFlow counters message and a polling task for the new container is initiated. Container counters are periodically retrieved and exported while the container continues to run (2 polling intervals are shown in the diagram). When the Host sFlow agent receives an event from Docker indicating that the container is being stopped, it retrieves the final values of the performance counters, exports a final sFlow message, and removes the polling task for the container.

This method of asynchronously triggered periodic counter export allows an sFlow collector to accurately track rapidly changing container populations in large scale deployments. The diagram only shows the sequence of events relating to monitoring a single container. Docker network visibility demonstration shows the full range of network traffic and system performance information being exported.

Detailed real-time visibility is essential for fully realizing the benefits of agile container infrastructure, providing the feedback needed to track and automatically optimize the performance of large scale microservice deployments.

Saturday, September 17, 2016

Triggered remote packet capture using filtered ERSPAN

Packet brokers are typically deployed as a dedicated network connecting network taps and SPAN/mirror ports to packet analysis applications such as Wireshark, Snort, etc.

Traditional hierarchical network designs were relatively straightforward to monitor using a packet broker since traffic flowed through a small number of core switches and so a small number of taps provided network wide visibility. The move to leaf and spine fabric architectures eliminates the performance bottleneck of core switches to deliver low latency and high bandwidth connectivity to data center applications. However, traditional packet brokers are less attractive since spreading traffic across many links with equal cost multi-path (ECMP) routing means that many more links need to be monitored.

This article will explore how the remote Selective Spanning capability in Cumulus Linux 3.0 combined with industry standard sFlow telemetry embedded in commodity switch hardware provides a cost effective alternative to traditional packet brokers.

Cumulus Linux uses iptables rules to specify packet capture sessions. For example, the following rule forwards packets with source IP 20.0.1.0 and destination IP 20.0.1.2 to a packet analyzer on host 20.0.2.2:
-A FORWARD --in-interface swp+ -s 20.0.0.2 -d 20.0.1.2 -j ERSPAN --src-ip 90.0.0.1 --dst-ip 20.0.2.2
REST API for Cumulus Linux ACLs describes a simple Python wrapper that exposes IP tables through a RESTful API. For example, the following command remotely installs the capture rule on switch 10.0.0.233:
curl -H "Content-Type:application/json" -X PUT --data \
  '["[iptables]","-A FORWARD --in-interface swp+ -s 20.0.0.2 -d 20.0.1.2 -j ERSPAN --src-ip 90.0.0.1 --dst-ip 20.0.2.2"]' \
  http://10.0.0.233:8080/acl/capture1
The following command deletes the rule:
curl -X DELETE http://10.0.0.233:8080/acl/capture1
Selective Spanning makes it possible to turn every switch and port in the network into a capture device. However, it is import to carefully select which traffic to capture since the aggregate bandwidth of an ECMP fabric is measured in Terabits per second - far more traffic than can be handled by typical packet analyzers.
SDN packet broker describes an analogy for the role that sFlow plays in steering the capture network to that of a finderscope, the small wide-angle telescope used to provide an overview of the sky and guide a telescope to its target. The article goes on to describes some of the benefits of combining sFlow analytics with selective packet capture:
  1. Offload The capture network is a limited resource, both in terms of bandwidth and in the number of flows that can be simultaneously captured.  Offloading as many tasks as possible to the sFlow analyzer frees up resources in the capture network, allowing the resources to be applied where they add most value. A good sFlow analyzer delivers data center wide visibility that can address many traffic accounting, capacity planning and traffic engineering use cases. In addition, many of the packet analysis tools (such as Wireshark) can accept sFlow data directly, further reducing the cases where a full capture is required.
  2. Context Data center wide monitoring using sFlow provides context for triggering packet capture. For example, sFlow monitoring might show an unusual packet size distribution for traffic to a particular service. Queries to the sFlow analyzer can identify the set of switches and ports involved in providing the service and identify a set of attributes that can be used to selectively capture the traffic.
  3. DDoS Certain classes of event such as DDoS flood attacks may be too large for the capture network to handle. DDoS mitigation with Cumulus Linux frees the capture network to focus on identifying more serious application layer attacks.
The diagram at the top of this article shows an example of using sFlow to target selective capture of traffic to blacklisted addresses. In this example sFlow-RT is used to perform real-time sFlow analytics. The following emerging.js script instructs sFlow-RT to download the Emerging Threats blacklist and identify any local hosts that are communicating with addresses in the blacklist. A full packet capture is triggered when a potentially compromised host is detected:
var wireshark = '10.0.0.70';
var idx=0;
function capture(localIP,remoteIP,agent) {
  var acl = [
    '[iptables]',
    '# emerging threat capture',
    '-A FORWARD --in-interface swp+ -s '+localIP+' -d '+remoteIP 
    +' -j ERSPAN --src-ip '+agent+' --dst-ip '+wireshark,
    '-A FORWARD --in-interface swp+ -s '+remoteIP+' -d '+localIP 
    +' -j ERSPAN --src-ip '+agent+' --dst-ip '+wireshark
  ];
  var id = 'emrg'+idx++;
  logWarning('capturing '+localIP+' rule '+id+' on '+agent);
  http('http://'+agent+':8080/acl/'+id,
        'PUT','application/json',JSON.stringify(acl));
}

var groups = {};
function loadGroup(name,url) {
  try {
    var res, cidrs = [], str = http(url);
    var reg = /^(\d{1,3}\.){3}\d{1,3}(\/\d{1,2})?$/mg;
    while((res = reg.exec(str)) != null) cidrs.push(res[0]);
    if(cidrs.length > 0) groups[name]=cidrs;
  } catch(e) {
    logWarning("failed to load " + url + ", " + e);
  }
}

loadGroup('compromised',
  'https://rules.emergingthreats.net/blockrules/compromised-ips.txt');
loadGroup('block',
  'https://rules.emergingthreats.net/fwrules/emerging-Block-IPs.txt');
setGroups('emerging',groups);

setFlow('emerging',
  {keys:'ipsource,ipdestination,group:ipdestination:emerging',value:'frames',
   log:true,flowStart:true});

setFlowHandler(function(rec) {
  var [localIP,remoteIP,group] = rec.flowKeys.split(',');
  try { capture(localIP,remoteIP,rec.agent); }
  catch(e) { logWarning("failed to capture " + e); }
});
Some comments about the script:
  1. The script uses sFlow telemetry to identify the potentially compromised host and the location (agent) observing the traffic.
  2. The location information is required so that the capture rule can be installed on a switch that is in the traffic path.
  3. The application has been simplified for clarity. In production, the blacklist information would be periodically updated and the capture sessions would be tracked so that they can be deleted when they they are no longer required.
  4. Writing Applications provides an introduction to sFlow-RT's API.
Configure sFlow on the Cumulus switches to stream telemetry to a host running Docker. Next, log into the host and run the following command in a directory containing the emerging.js script:
docker run -v "$PWD/emerging.js":/sflow-rt/emerging.js \
 -e "RTPROP=-Dscript.file=emerging.js" -p 6343:6343/udp sflow/sflow-rt
Note: Deploying analytics as a Docker service is a convenient method of packaging and running sFlow-RT. However, you can also download and install sFlow-RT as a package.

Once the software is running, you should see output similar to the following:
2016-09-17T22:19:16+0000 INFO: Listening, sFlow port 6343
2016-09-17T22:19:16+0000 INFO: Listening, HTTP port 8008
2016-09-17T22:19:16+0000 INFO: emerging.js started
2016-09-17T22:19:44+0000 WARNING: capturing 10.0.0.162 rule emrg0 on 10.0.0.253
The last line shows that traffic from host 10.0.0.162 to a blacklisted address has been detected and that selective spanning session has been configured on switch 10.0.0.253 to capture packets and send them to the host running Wireshark (10.0.0.70) for further analysis.

Wednesday, August 17, 2016

Real-time web analytics

The diagram shows a typical scale out web service with a load balancer distributing requests among a pool of web servers. The sFlow HTTP Structures standard is supported by commercial load balancers, including F5 and A10, and open source load balancers and web servers, including HAProxy, NGINX, Apache, and Tomcat.
The simplest way to try out the examples in this article is to download sFlow-RT and install the Host sFlow agent and Apache mod-sflow instrumentation on a Linux web server.

The following sFlow-RT metrics report request rates based on the standard sFlow HTTP counters:
  • http_method_option
  • http_method_get
  • http_method_head
  • http_method_post
  • http_method_put
  • http_method_delete
  • http_method_trace
  • http_method_connect
  • http_method_other
  • http_status_1xx
  • http_status_2xx
  • http_status_3xx
  • http_status_4xx
  • http_status_5xx
  • http_status_other
  • http_requests
In addition, mod-sflow exports the following standard thread pool metrics:
  • workers_active
  • workers_idle
  • workers_max
  • workers_utilization
  • req_delayed
  • req_dropped
Cluster performance metrics describes how sFlow-RT's REST API is used to compute summary statistics for a pool of servers. For example, the following query calculates the cluster wide total request rates:
http://localhost:8008/metric/ALL/sum:http_method_get,sum:http_method_post/json
More interesting is that the sFlow telemetry stream also includes randomly sampled HTTP request records with the following attributes:
  • protocol
  • serveraddress
  • serveraddress6
  • serverport
  • clientaddress
  • clientaddress6
  • clientport
  • proxyprotocol
  • proxyserveraddress
  • proxyserveraddress6
  • proxyserverport
  • proxyclientaddress
  • proxyclientaddress6
  • proxyclientport
  • httpmethod
  • httpprotocol
  • httphost
  • httpuseragent
  • httpxff
  • httpauthuser
  • httpmimetype
  • httpurl
  • httpreferer
  • httpstatus
  • bytes
  • req_bytes
  • resp_bytes
  • duration
  • requests
The sFlow-RT analytics pipeline is programmable. Defining Flows describes how to compute additional metrics based on the sampled requests. For example, the following flow definition creates a new metric called image_bytes that tracks the volume of image data in HTTP responses as a bytes/second value calculated over a 10 second window:
setFlow('image_bytes', {value:'resp_bytes',t:10,filter:'httpmimetype~image/.*'});
The new metric can be queries in exactly the same way as the counter based metrics above, e.g.:
http://localhost:8008/metric/ALL/sum:image_bytes/json
The uri: function is used to extract parts of the httpurl or httpreferer URL fields. The following attributes can be extracted:
  • normalized
  • scheme
  • user
  • authority
  • host
  • port
  • path
  • file
  • extension
  • query
  • fragment
  • isabsolute
  • isopaque
For example, the following flow definition creates a metric called game_reqs that tracks the requests/second hitting the URL path with prefix /games:
setFlow('games_reqs', {value:'requests',t:10,filter:'uri:httpurl:path~/games/.*'});
Define flow keys to identify slowest requests, most popular URLs, etc. For example, the following definition tracks the top 5 longest duration requests:
setFlow('slow_reqs', {keys:'httpurl',value:'duration',t:10,n:5});
The following query retrieves the result:
$ curl "http://localhost:8008/activeflows/ALL/slow_reqs/json?maxFlows=5"
[
 {
  "dataSource": "3.80",
  "flowN": 1,
  "value": 117009.24305622398,
  "agent": "10.0.0.150",
  "key": "/login.php"
 },
 {
  "dataSource": "3.80",
  "flowN": 1,
  "value": 7413.476263017302,
  "agent": "10.0.0.150",
  "key": "/games/animals.php"
 },
 {
  "dataSource": "3.80",
  "flowN": 1,
  "value": 4486.286259806839,
  "agent": "10.0.0.150",
  "key": "/games/puzzles.php"
 },
 {
  "dataSource": "3.80",
  "flowN": 1,
  "value": 2326.33482623333,
  "agent": "10.0.0.150",
  "key": "/sales/buy.php"
 },
 {
  "dataSource": "3.80",
  "flowN": 1,
  "value": 276.3486100676183,
  "agent": "10.0.0.150",
  "key": "/index.php"
 }
]
Sampled records are a useful complement to counter based metrics, making it possible to disaggregate counts and identify root causes. For example, suppose a spike in errors is identified through the http_status_4xx or http_status_5xx metrics. The following flow definition breaks out the most frequent failed requests by specific URL and error code:
setFlow('err_reqs', {keys:'httpurl,httpstatus',value:'requests',t:10,n:5,
  filter:'range:httpstatus:400=true'});
Finally, the real-time HTTP analytics don't exist in isolation. The diagram shows how the sFlow-RT real-time analytics engine receives a continuous telemetry stream from sFlow instrumentation build into network, server and application infrastructure and delivers analytics through APIs and can easily be integrated with a wide variety of on-site and cloud, orchestration, DevOps and Software Defined Networking (SDN) tools.

Thursday, August 11, 2016

Network and system analytics as a Docker service

The diagram shows how new and existing cloud based or locally hosted orchestration, operations, and security tools can leverage the sFlow-RT analytics service to gain real-time visibility. Network visibility with Docker describes how to install open source sFlow agents to monitor network activity in a Docker environment in order to gain visibility into Docker Microservices.

The sFlow-RT analytics software is now on Docker Hub, making it easy to deploy real-time sFlow analytics as a Docker service:
docker run -p 8008:8008 -p 6343:6343/udp -d sflow/sflow-rt
Configure standard sFlow Agents to stream telemetry to the analyzer and retrieve analytics using the REST API on port 8008.

Increase memory from default 1G to 2G:
docker run -e "RTMEM=2G" -p 8008:8008 -p 6343:6343/udp -d sflow/sflow-rt
Set System Property to enable country lookups when Defining Flows:
docker run -e "RTPROP=-Dgeo.country=resources/config/GeoIP.dat" -p 8008:8008 -p 6343:6343/udp -d sflow/sflow-rt
Run sFlow-RT Application. Drop the -d option while developing an application to see output of logging commands and use control-c to stop the container.
docker run -v /Users/pp/my-app:/sflow-rt/app/my-app -p 8008:8008 -p 6343:6343/udp -d sflow/sflow-rt
A simple Dockerfile can be used to generate a new image that includes the application:
FROM sflow/sflow-rt:latest
COPY /Users/pp/my-app /sflow-rt/app
Similarly, a Dockerfile can be used to generate a new image from published applications. Any required System Properties can also be set in the Dockerfile.
FROM sflow/sflow-rt:latest
ENV RTPROP="-Dgeo.country=resources/config/GeoIP.dat"
RUN /sflow-rt/get-app.sh sflow-rt top-flows
This solution is extremely scaleable, a single sFlow-RT instance can monitor thousands of servers and the network devices connecting them.

Wednesday, July 20, 2016

Internet router using Cumulus Linux

Internet router using merchant silicon describes how an inexpensive white box switch running Linux can be used to replace a much costlier Internet router. This article will describe the steps needed to install the software on an x86 based white box switch running Cumulus Linux 3.0.

First, add the Debian Jessie repository:
sudo sh -c 'echo "deb http://ftp.us.debian.org/debian jessie main contrib" > \
/etc/apt/sources.list.d/deb.list'
Next, install Host sFlow, Java, and Bird:
sudo apt-get update
sudo apt-get install hsflowd
sudo apt-get install unzip
sudo apt-get install default-jre-headless
sudo apt-get install bird
Install sFlow-RT (the latest version is available at sFlow-RT.com):
wget http://www.inmon.com/products/sFlow-RT/sflow-rt_2.0-1116.deb
sudo dpkg -i sflow-rt_2.0-1116.deb
Increase the default virtual memory limit for sflowrt (needs to be greater than 1/3 amount of RAM on system to start Java virtual machine, see Giant Bug: Cannot run java with a virtual mem limit (ulimit -v)):
sudo sh -c 'echo "sflowrt soft as 2000000" > \
/etc/security/limits.d/99-sflowrt.conf'
Note: Maximum Java heap memory has a default of 1G and is controlled by settings in /usr/local/sflow-rt/conf.d/sflow-rt.jvm file.

Install the Active Route Manager application:
sudo sh -c "/usr/local/sflow-rt/get-app.sh sflow-rt active-routes"
Cumulus Networks, sFlow and data center automation describes how to configure the sFlow agent (hsflowd). The sFlow collector address should be set to 127.0.0.1.

Finally, configure Bird and sFlow-RT as described in Internet router using merchant silicon.

The instructions were tested on a Cumulus VX virtual machine, but should work on physical switches. Cumulus VX is free and provides a convenient way to try out Cumulus Linux and create virtual networks to test configurations.

If you are going to experiment with the solution on CumulusVX then the following command is needed to enable sFlow traffic monitoring:
sudo iptables -I FORWARD -j NFLOG --nflog-group 1 --nflog-prefix SFLOW
On physical switches the sFlow agent automatically configures packet sampling in the ASIC and is able to monitor all packets (not just the routed packets captured by the iptables command above).

Monday, July 18, 2016

World map

World Map has been released on GitHub, https://github.com/sflow-rt/world-map. The application displays an up to the second view of traffic as animated bubbles overlaid on a world map.

Download and install sFlow-RT to run the world-map application. Enable the System Property, geo.country=resources/config/GeoIP.dat, to allow the application to identify countries based on IP addresses.

Friday, July 15, 2016

Internet router using merchant silicon

SDN router using merchant silicon top of rack switch and Dell OS10 SDN router demo discuss how an inexpensive white box switch running Linux can be used to replace a much costlier Internet router. The key to this solution is the observation that, while the full Internet routing table of over 600,000 routes is too large to fit in white box switch hardware, only a small fraction of the routes carry most of the traffic. Traffic analytics allows the active routes to be identified and installed in the hardware.

This article describes a simple self contained solution that uses standard APIs and should be able to run on a variety of Linux based network operating systems, including: Cumulus Linux, Dell OS10, Arista EOS, and Cisco NX-OS. The distinguishing feature of this solution is its real-time response, where previous solutions respond to changes in traffic within minutes or hours, this solution updates hardware routes within seconds.

The diagram shows the elements of the solution. Standard sFlow instrumentation embedded in the merchant silicon ASIC data plane in the white box switch provides real-time information on traffic flowing through the switch. The sFlow agent is configured to send the sFlow to an instance of sFlow-RT running on the switch. The Bird routing daemon is used to handle the BGP peering sessions and to install routes in the Linux kernel using the standard netlink interface. The network operating system in turn programs the switch ASIC with the kernel routes so that packets are forwarded by the switch hardware and not by the kernel software.

The key to this solution is Bird's multi-table capabilities. The full Internet routing table learned from BGP peers is installed in a user space table that is not reflected into the kernel. A BGP session between sFlow-RT and Bird allows sFlow-RT to see the full routing table and combine it with the sFlow telemetry to perform real-time BGP route analytics and identify the currently active routes. A second BGP session allows sFlow-RT to push routes to Bird which in turn pushes the active routes to the kernel, programming the ASIC.

In this example, the following Bird configuration, /etc/bird/bird.conf, was installed on the switch:
# Please refer to the documentation in the bird-doc package or BIRD User's
# Guide on http://bird.network.cz/ for more information on configuring BIRD and
# adding routing protocols.

# Change this into your BIRD router ID. It's a world-wide unique identification
# of your router, usually one of router's IPv4 addresses.
router id 10.0.0.136;

# The Kernel protocol is not a real routing protocol. Instead of communicating
# with other routers in the network, it performs synchronization of BIRD's
# routing tables with the OS kernel.
protocol kernel {
  learn;
  scan time 2;
  import all;
  export all;   # Actually insert routes into the kernel routing table
}

# The Device protocol is not a real routing protocol. It doesn't generate any
# routes and it only serves as a module for getting information about network
# interfaces from the kernel. 
protocol device {
  scan time 60;
}

protocol direct {
  interface "*";
}

# Create a new table (disconnected from kernel/master) for peering routes
table peers;

# Create BGP sessions with peers
protocol bgp peer_65134 {
  table peers;
  igp table master;
  local as 65136;
  neighbor 10.0.0.134 as 65134;
  import all;
  export all;
}

protocol bgp peer_65135 {
  table peers;
  igp table master;
  local as 65136;
  neighbor 10.0.0.135 as 65135;
  import all;
  export all;
}

# Copy default route from peers table to master table
protocol pipe {
  table peers;
  peer table master;
  import none;
  export filter {
     if net ~ [ 0.0.0.0/0 ] then accept;
     reject;
  };
}

# Reflect peers table to sFlow-RT
protocol bgp to_sflow_rt {
  table peers;
  igp table master;
  local as 65136;
  neighbor 127.0.0.1 port 1179 as 65136;
  import none;
  export all;
}

# Receive active prefixes from sFlow-RT
protocol bgp from_sflow_rt {
  local as 65136;
  neighbor 10.0.0.136 port 1179 as 65136;
  import all;
  export none;
}
The open source Active Route Manager (ARM) application has been installed in sFlow-RT and the following sFlow-RT configuration, /usr/local/sflow-rt/conf.d/sflow-rt.conf, enables the BGP route reflector and control sessions with Bird:
bgp.start=yes
arm.reflector.ip=127.0.0.1
arm.reflector.as=65136
arm.reflector.id=0.0.0.1
arm.sflow.ip=10.0.0.136
arm.target.ip = 10.0.0.136
arm.target.as=65136
arm.target.id=0.0.0.2
arm.target.prefixes=10000
Once configured, operation is entirely automatic. As soon as traffic starts flowing to a new route, the route is identified and installed in the ASIC. If the route later becomes inactive, it is automatically removed from the ASIC to be replaced with a different active route. In this case, the maximum number of routes allowed in the ASIC has been specified as 10,000. This number can be changed to reflect the capacity of the hardware.
The Active Route Manager application has a web interface that provides up to the second visibility into the number of routes, routes installed in hardware, amount of traffic, hardware and software resource utilization etc. In addition, the sFlow-RT REST API can be used to make additional queries.

Wednesday, July 6, 2016

Network, host, and application monitoring for Amazon EC2

Microservices describes how visibility into network traffic is the key to monitoring, managing and securing applications that are composed of large numbers of communicating services running in virtual machines or containers.

Amazon Virtual Private Cloud (VPC) Flow Logs can be used to monitor network traffic:
However, there are limitations on the types of traffic that are logged, a 10-15 minute delay in accessing flow records, and costs associated with using VPC and storing the logs in CloudWatch (currently $0.50 per GB ingested, $0.03 per GB archived per month, and possible addition Data Transfer OUT charges).

In addition, collecting basic host metrics at 1 minute granularity using CloudWatch is an additional $3.50 per instance per month.

The open source Host sFlow agent offers an alternative:
  1. Lightweight, requiring minimal CPU and memory on EC2 instances.
  2. Real-time, up to the second network visibility
  3. Efficient, export of extensive set of host metrics every 10-60 seconds (configurable).
This article will demonstrate how to install Host sFlow on an Amazon Linux instance:
$ cat /etc/issue
Amazon Linux AMI release 2016.03
The following commands build the latest version of the Host sFlow agent from sources:
yum install libcap-devel libpcap-devel
git clone https://github.com/sflow/host-sflow
cd host-sflow
make
sudo make install
You can also make an RPM package (make rpm) so that the Host sFlow agent can be installed on additional EC2 instances without compiling.

Edit the Host sFlow configuration file, /etc/hsflowd.conf, to specify an sFlow collector, sampling rate, polling interval, and interface(s) to monitor:
sflow {
  agent=eth0
  DNSSD=off
  polling=20
  sampling=400
  collector { ip = 10.117.46.49 }
  pcap { dev=eth0 }
}
Note: The same configuration file can be used for all EC2 instances.

Finally, start the Host sFlow daemon:
sudo service hsflowd start
The above steps are easily automated using Puppet, Chef, Ansible, etc. to deploy Host sFlow agents on all your EC2 instances.

There are a variety of open source and commercial software packages listed on sFlow.org that can be used to analyze and the telemetry stream. The sFlow-RT analyzer has APIs that provide similar functionality to the Amazon VPC and CloudWatch APIs, but with sub-second response times.
The diagram shows how the sFlow-RT real-time analytics engine receives a continuous telemetry stream from sFlow instrumentation build into network, server and application infrastructure and delivers analytics through APIs and can easily be integrated with a wide variety of on-site and cloud, orchestration, DevOps and Software Defined Networking (SDN) tools.

Download and install sFlow-RT in an EC2 instance. The following articles provide examples of integrations:
Industry standard sFlow is easily deployed, highly scaleable, and provides a low cost, low latency, alternative to Amazon VPC flow logging for gaining visibility into EC2 microservice deployments. Using sFlow for visibility allows a common monitoring technology to be used in public, private and hybrid cloud deployments, and to extend visibility into physical and virtual networks.

Friday, July 1, 2016

Real-time BGP route analytics

The diagram shows how sFlow-RT real-time analytics software can combine BGP route information and sFlow telemetry to generate route analytics. Merging sFlow traffic with BGP route data significantly enhances both data streams:
  1. sFlow real-time traffic data identifies active BGP routes
  2. BGP path attributes are available in flow definitions
The following example demonstrates how to configure sFlow / BGP route analytics. In this example, the switch IP address is 10.0.0.253, the router IP address is 10.0.0.254, and the sFlow-RT address is 10.0.0.162.

Setup

First download sFlow-RT. Next create a configuration file, bgp.js, in the sFlow-RT home directory with the following contents:
var reflectorIP  = '10.0.0.254';
var myAS         = '65162';
var myID         = '10.0.0.162';
var sFlowAgentIP = '10.0.0.253';

// allow BGP connection from reflectorIP
bgpAddNeighbor(reflectorIP,myAS,myID);

// direct sFlow from sFlowAgentIP to reflectorIP routing table
// calculate a 60 second moving average byte rate for each route
bgpAddSource(sFlowAgentIP,reflectorIP,60,'bytes');
The following sFlow-RT System Properties load the configuration file and enable BGP:
  • script.file=bgp.js
  • bgp.start=yes
Start sFlow-RT and the following log lines will confirm that BGP has been enabled and configured:
$ ./start.sh 
2016-06-28T13:14:34-0700 INFO: Listening, BGP port 1179
2016-06-28T13:14:35-0700 INFO: Listening, sFlow port 6343
2016-06-28T13:14:35-0700 INFO: Starting the Jetty [HTTP/1.1] server on port 8008
2016-06-28T13:14:35-0700 INFO: Starting com.sflow.rt.rest.SFlowApplication application
2016-06-28T13:14:35-0700 INFO: Listening, http://localhost:8008
2016-06-28T13:14:36-0700 INFO: bgp.js started
2016-06-28T13:14:36-0700 INFO: bgp.js stopped
Configure the switch (10.0.0.253) to send sFlow to the sFlow-RT instance(10.0.0.162), see Switch configurations for vendor specific configurations. Check the sFlow-RT /agents/html page to verify that sFlow telemetry is being received from the agent.

Next, configure the router (10.0.0.254) to reflect BGP routes to the sFlow-RT instance (10.0.0.162):
router bgp 65254
 bgp router-id 10.0.0.254
 neighbor 10.0.0.162 remote-as 65162
 neighbor 10.0.0.162 port 1179
 neighbor 10.0.0.162 timers connect 30
 neighbor 10.0.0.162 route-reflector-client
 neighbor 10.0.0.162 activate
The following sFlow-RT log entry confirms that a BGP session has been established:
2016-06-28T13:20:17-0700 INFO: BGP open 10.0.0.254 53975

Query active routes

The following cURL command uses the REST API to identify the top 5 IPv4 prefixes ranked by traffic (measured in bytes/second):
curl "http://10.0.0.162:8008/bgp/topprefixes/10.0.0.254/json?maxPrefixes=5
{
 "as": 65254,
 "direction": "destination",
 "id": "10.0.0.254",
 "learnedPrefixesAdded": 691838,
 "learnedPrefixesRemoved": 0,
 "nPrefixes": 691838,
 "pushedPrefixesAdded": 0,
 "pushedPrefixesRemoved": 0,
 "startTime": 1467322582093,
 "state": "established",
 "topPrefixes": [
  {
   "aspath": "NNNN-NNNN-NNNNN-NNNNN",
   "localpref": 100,
   "med": 1,
   "nexthop": "NNN.NNN.NNN.N",
   "origin": "IGP",
   "prefix": "NN.NNN.NN.0/24",
   "value": 9.735462342126082E7
  },
  {
   "aspath": "NNN-NNNN",
   "localpref": 100,
   "med": 1,
   "nexthop": "NNN.NNN.NNN.N",
   "origin": "IGP",
   "prefix": "NN.NNN.NNN.0/24",
   "value": 7.347515546153101E7
  },
  {
   "aspath": "NNNN-NNNNNN-NNNNN",
   "localpref": 100,
   "med": 1,
   "nexthop": "NNN.NNN.NNN.N",
   "origin": "IGP",
   "prefix": "NN.NNN.NN.N/24",
   "value": 4.26137765317916E7
  },
  {
   "aspath": "NNNN-NNNN-NNNN",
   "localpref": 100,
   "med": 1,
   "nexthop": "NNN.NNN.NNN.N",
   "origin": "IGP",
   "prefix": "NNN.NN.NNN.0/24",
   "value": 2.6633190792947102E7
  },
  {
   "aspath": "NNNN-NNN-NNNNN",
   "localpref": 100,
   "med": 10001,
   "nexthop": "NNN.NNN.NNN.NN",
   "origin": "IGP",
   "prefix": "NN.NNN.NNN.0/24",
   "value": 1.5500941476103483E7
  }
 ],
 "valuePercentCoverage": 71.38452058755995,
 "valueTopPrefixes": 2.55577687683634E8,
 "valueTotal": 3.5802956380458355E8
}
In addition to returning the top prefixes, the query returns information about the amount of traffic covered by these prefixes. In this case, the valuePercentageCoverage of 71.38 indicates that 71.38% of the traffic is covered by the top 5 prefixes.
Note: Identifying numeric digits have been substituted with the letter N to protect privacy.
Additional arguments can be used to refine the top prefixes query:
  • maxPrefixes, maximum number of prefixes in the result 
  • minValue, only include entries with a value greater than the threshold
  • direction, specify "ingress" for traffic arriving from remote networks and "egress" for traffic destined for remote networks
  • minPrefix, exclude shorter prefixes, e.g. minPrefix=1 would exclude 0.0.0.0/0.
  • includeCovered, set to "true" to also include prefixes that are covered by the top prefix, but wouldn't otherwise make the list. For example, if 10.1.0.0/16 was included, then 10.1.3.0/24 would also be included if it were in the set of prefixes advertised by the router.
  • pruneCovered, set to "true" to eliminate covered prefixes that share the same next hop.
IPv6 prefixes an be queried using /bgp/topprefixes6/{router}/json, which takes the same arguments as the topprefixes query shown above.

Writing Applications, describes how to build analytics driven controller applications using sFlow-RT's REST and embedded JavaScript APIs. For example, SDN router using merchant silicon top of rack switchWhite box Internet router PoC, and Active Route Manager demonstrate how real-time identification of active routes can be used to efficiently manage limited hardware resources in commodity white box switches in order to handle a full Internet routing table of over 600,000 routes.

Defining Flows

The following flow attributes learned from the BGP session are merged with sFlow data received from switch 10.0.0.253:
  • ipsourcemaskbits
  • ipdestinationmaskbits
  • bgpnexthop
  • bgpnexthop6
  • bgpas
  • bgpsourceas
  • bgpsourcepeeras
  • bgpdestinationas
  • bgpdestinationpeeras
  • bgpdestinationaspath
  • bgpcommunities
  • bgplocalpref
The sFlow-RT /flowkeys/html page can be queried to verify that the attributes have been merged and to see the full set of attributes that are available from the sFlow feed.

Writing Applications describes how to program sFlow-RT flow caches, using the flow keys to select and identify traffic flows. For example, the following Python script uses the REST API to identify the source networks associated with a UDP amplification DDoS attack:
#!/usr/bin/env python
import requests
import json

// DNS port
reflector_port = '53'
max_pps = 100000

rest = 'http://localhost:8008'

# define flow
flow = {'keys':'mask:ipsource,bgpsourceas',
 'filter':'udpsourceport='+reflector_port,
 'value':'frames'}
requests.put(rest+'/flow/ddos/json',data=json.dumps(flow))

# set threshold
threshold = {'metric':'ddos', 'value': max_pps, 'byFlow':True}
requests.put(rest+'/threshold/ddos/json',data=json.dumps(threshold))

# tail even log
eventurl = rest+'/events/json?thresholdID=ddos&maxEvents=10&timeout=60'
eventID = -1
while 1 == 1:
  r = requests.get(eventurl + "&eventID=" + str(eventID))
  if r.status_code != 200: break
  events = r.json()
  if len(events) == 0: continue

  eventID = events[0]["eventID"]
  events.reverse()
  for e in events:
    print e['flowKey']
Running the script generates a log of the source network and AS number that exceed 100,000 packets per second of DNS response traffic (again, identifying numeric digits have been substituted with the letter N to protect privacy):
$ ./ddos.py 
NNN.NNN.0.0/13,NNNN
NNN.NNN.NNN.NNN/27,NNNN
NNN.NN.NNN.NNN/28,NNNNN
NNN.NNN.NN.0/24,NNNNN
A variation on the script can be used to identify large "Elephant" flows and their destination AS paths (showing the list of networks that packets traverse en route to their destination):
#!/usr/bin/env python
import requests
import json

max_Bps = 1000000000/8

rest = 'http://localhost:8009'

# define flow
flow = {
 'keys':'ipsource,ipdestination,tcpsourceport,tcpdestinationport,bgpdestinationaspath',
 'value':'bytes'}
requests.put(rest+'/flow/elephant/json',data=json.dumps(flow))

# set threshold
threshold = {'metric':'elephant', 'value': max_Bps, 'byFlow':True}
requests.put(rest+'/threshold/elephant/json',data=json.dumps(threshold))

# tail even log
eventurl = rest+'/events/json?thresholdID=elephant&maxEvents=10&timeout=60'
eventID = -1
while 1 == 1:
  r = requests.get(eventurl + "&eventID=" + str(eventID))
  if r.status_code != 200: break
  events = r.json()
  if len(events) == 0: continue

  eventID = events[0]["eventID"]
  events.reverse()
  for e in events:
    print e['flowKey']
Running the script generates real-time notification of the Elephant flows (flows exceeding 1Gbit/s) along with their destination AS paths:
$ ./elephant.py 
NNN.NN.NN.NNN,NNN.NNN.NN.NN,60789,25,NNNNN
NNN.NN.NNN.NN,NNN.NN.NN.NNN,443,38016,NNNNN-NNNNN-NNNNN-NNNNN
NN.NNN.NNN.NNN,NNN.NNN.NN.NN,37030,10059,NNNN-NNN-NNNN
NNN.NN.NN.NNN,NN.NN.NNN.NNN,34611,25,NNNN
SDN and large flows describes how a small number of Elephant flows typically consume most of the bandwidth, even though they are greatly outnumbered by small (Mice) flows. Dynamic policy based routing can targeted at Elephant flows to significantly improve performance and manage network resources: Leaf and spine traffic engineering using segment routing and SDN and WAN optimization using real-time traffic analytics are two examples.
Finally, the real-time BGP analytics don't exist in isolation. The diagram shows how the sFlow-RT real-time analytics engine receives a continuous telemetry stream from sFlow instrumentation build into network, server and application infrastructure and delivers analytics through APIs and can easily be integrated with a wide variety of on-site and cloud, orchestration, DevOps and Software Defined Networking (SDN) tools.