Monday, June 19, 2017

Remotely Triggered Black Hole (RTBH) Routing

The screen shot demonstrates real-time distributed denial of service (DDoS) mitigation. Automatic mitigation was disabled for the first simulated attack (shown on the left of the chart).  The attack reaches a sustained packet rate of 1000 packets per second for a period of 60 seconds. Next, automatic mitigation was enabled and a second attack launched. This time, as soon as the traffic crosses the threshold (the horizontal red line), a BGP remote trigger message is sent to router, which immediately drops the traffic.
The diagram shows the test setup. The network was built out of freely available components: CumulusVX switches and Ubuntu 16.04 servers running under VirtualBox.

The following configuration is installed on the ce-router:
router bgp 65141
 bgp router-id
 neighbor remote-as 65140
 neighbor port 1179
 neighbor remote-as 65141
 address-family ipv4 unicast
  neighbor allowas-in
  neighbor route-map blackhole-in in
ip community-list standard blackhole permit 65535:666
route-map blackhole-in permit 20
 match community blackhole
 match ip address prefix-len 32
 set ip next-hop
The ce-router peers with the upstream service provider router (sp-router as well as with sFlow-RT.  A route-map is used to filter updates from sFlow-RT (, matching the well-known blackhole community 65535:666, and setting the null route next-hop The route-map also ensures that only /32 prefixes are accepted. In production, the route-map should also filter out the addresses of critical infrastructure (router IP addresses etc.).

An additional route-map will typically be required to select the blackhole routes to propagate upstream, re-mapping the community to meet the upstream service provider's policy, e.g. Hurricane Electric uses community 6939:666.

In addition, the ce-router is configured to send sFlow to the controller (, see Switch configurations.

Run the DDoS mitigation application on server using docker:
docker run --net=host -p 6343:6343/udp -p 8008:8008 -p 1179:1179 -e "RTPROP=-Dddos_blackhole.router=" sflow/ddos-blackhole
Alternatively, an RPM or DEB packaged version of sFlow-RT can be downloaded and installed on the server and the ddos-blackhole application can be installed. For example, on Ubuntu:
dpkg -i sflow-rt_2.0-1195.deb
/usr/local/sflow-rt/ sflow-rt ddos-blackhole
Edit the configuration file, /usr/local/sflow-rt/conf.d/sflow-rt.conf:
Next, start the daemon:
service sflow-rt start
In either case, the remainder of the configuration is handled through the web interface, accessible via

Click on the Settings tab and upload the following IP Address Groups file groups.json:
 "external": [
 "private": [
 "multicast": [
 "web": [
The groups identify addresses that are external (possible attackers) and local (possible targets). By default, traffic to the externalprivate, multicast and exclude groups will not trigger actions. Any additional group names, in this case web, are blackhole candidates.

Note: Only traffic from the external group to backhole candidate groups (web) is shown on the Charts tab (and considered for DDoS detection and mitigation).

The following command on sp-host simulates an ICMP flood attack on ce-host:
ping -f
The following messages should appear in the sFlow-RT logs:
2017-06-17T00:17:39+0000 INFO: Listening, BGP port 1179
2017-06-17T00:17:40+0000 INFO: Listening, sFlow port 6343
2017-06-17T00:17:40+0000 INFO: Listening, HTTP port 8008
2017-06-17T00:17:40+0000 INFO: app/ddos-blackhole/scripts/ddos.js started
2017-06-17T00:17:40+0000 INFO: app/ddos-blackhole/scripts/stats.js started
2017-06-17T00:17:46+0000 INFO: BGP open 41692
2017-06-17T00:18:13+0000 INFO: DDoS blocking
2017-06-17T00:20:25+0000 INFO: DDoS allowing
The screen capture at the top of this article shows that the time between the attack being launched and successfully blocked is just a few of seconds.

Wednesday, March 29, 2017

Arista EOS telemetry

Arista EOS switches support industry standard sFlow telemetry, enabling hardware instrumentation supported by merchant silicon to export hardware interface counters and flow data. The latest release of the open source Host sFlow agent has been ported to EOS, augmenting the telemetry with standard host CPU, memory, and disk IO metrics.

Linux as a Switch Operating System: Five Lessons Learned identifies benefits of using Linux as the basis for EOS. In this context, the Linux operating system made it easy to port the Host sFlow agent, use standard Linux package management (RPM Package Manager), and gather metrics using standard Linux APIs. A new eAPI module automatically synchronizes the Host sFlow daemon with the EOS sFlow configuration.

The following sflowtool output shows the additional metrics contributed by a Host sFlow agent installed on an Arista switch:
startDatagram =================================
datagramSize 704
unixSecondsUTC 1490843418
datagramVersion 5
agentSubId 100000
packetSequenceNo 714
sysUpTime 0
samplesInPacket 1
startSample ----------------------
sampleType_tag 0:2
sampleSequenceNo 714
sourceId 2:1
counterBlock_tag 0:2001
counterBlock_tag 0:2010
udpInDatagrams 1459
udpNoPorts 16
udpInErrors 0
udpOutDatagrams 4765
udpRcvbufErrors 0
udpSndbufErrors 0
udpInCsumErrors 0
counterBlock_tag 0:2009
tcpRtoAlgorithm 1
tcpRtoMin 200
tcpRtoMax 120000
tcpMaxConn 4294967295
tcpActiveOpens 102
tcpPassiveOpens 100
tcpAttemptFails 0
tcpEstabResets 0
tcpCurrEstab 8
tcpInSegs 19930
tcpOutSegs 19804
tcpRetransSegs 0
tcpInErrs 0
tcpOutRsts 2
tcpInCsumErrors 0
counterBlock_tag 0:2008
icmpInMsgs 1606
icmpInErrors 0
icmpInDestUnreachs 16
icmpInTimeExcds 0
icmpInParamProbs 0
icmpInSrcQuenchs 0
icmpInRedirects 0
icmpInEchos 1590
icmpInEchoReps 0
icmpInTimestamps 0
icmpInAddrMasks 0
icmpInAddrMaskReps 0
icmpOutMsgs 0
icmpOutErrors 1606
icmpOutDestUnreachs 0
icmpOutTimeExcds 16
icmpOutParamProbs 0
icmpOutSrcQuenchs 0
icmpOutRedirects 0
icmpOutEchos 0
icmpOutEchoReps 0
icmpOutTimestamps 1590
icmpOutTimestampReps 0
icmpOutAddrMasks 0
icmpOutAddrMaskReps 0
counterBlock_tag 0:2007
ipForwarding 2
ipDefaultTTL 64
ipInReceives 24685
ipInHdrErrors 0
ipInAddrErrors 42
ipForwDatagrams 0
ipInUnknownProtos 0
ipInDiscards 0
ipInDelivers 23025
ipOutRequests 26170
ipOutDiscards 0
ipOutNoRoutes 0
ipReasmTimeout 0
ipReasmReqds 0
ipReasmOKs 0
ipReasmFails 0
ipFragOKs 4
ipFragFails 0
ipFragCreates 8
counterBlock_tag 0:2005
disk_total 1907843072
disk_free 1083969536
disk_partition_max_used 43.18
disk_reads 16549
disk_bytes_read 1337825280
disk_read_time 7420
disk_writes 412
disk_bytes_written 1159168
disk_write_time 216
counterBlock_tag 0:2004
mem_total 1938849792
mem_free 85483520
mem_shared 0
mem_buffers 106614784
mem_cached 735801344
swap_total 0
swap_free 0
page_in 830716
page_out 566
swap_in 0
swap_out 0
counterBlock_tag 0:2003
cpu_load_one 0.070
cpu_load_five 0.060
cpu_load_fifteen 0.050
cpu_proc_run 0
cpu_proc_total 221
cpu_num 1
cpu_speed 2698
cpu_uptime 17265
cpu_user 272510
cpu_nice 50
cpu_system 178050
cpu_idle 16279880
cpu_wio 550
cpuintr 461060
cpu_sintr 41840
cpuinterrupts 5458397
cpu_contexts 5338141
cpu_steal 0
cpu_guest 0
cpu_guest_nice 0
counterBlock_tag 0:2006
nio_bytes_in 8149749
nio_pkts_in 115730
nio_errs_in 0
nio_drops_in 0
nio_bytes_out 4996846
nio_pkts_out 28451
nio_errs_out 0
nio_drops_out 0
counterBlock_tag 0:2000
hostname leaf1
UUID 33-28-66-a5-82-27-43-49-a5-f1-c1-ba-cc-6c-1d-d3
machine_type 2
os_name 2
os_release 3.4.43.Ar-4170906.4180F
endSample   ----------------------
endDatagram   =================================
There are a number of additional open source and commercial sFlow collectors available.
For example, the diagram shows how new and existing cloud based or locally hosted orchestration, operations, and security tools can leverage the sFlow-RT analytics service to gain real-time visibility.

Installing Host sFlow agent on an Arista switch

The following steps download and install the Host sFlow agent on an Arista switch and direct the telemetry stream to collector

1. Install the Host sFlow agent (hsflowd)
eos# copy extension:
eos# extension hsflowd-eos-2.0.9-1.i686.rpm
eos# bash sudo service hsflowd start
eos# copy installed-extensions boot-extensions
2. Enable eAPI, see eAPI and Unix Domain Socket
eos(config)# management api http-commands
eos(config-mgmt-api-http-cmds)# protocol unix-socket
eos(config-mgmt-api-http-cmds)# no shutdown
3. Configure switch to run hsflowd on startup:
eos(config)# event-handler hsflowd
eos(config-handler-hsflowd)# trigger on-boot
eos(config-handler-hsflowd)# action bash sudo service hsflowd start
eos(config-handler-hsflowd)# delay 60
eos(config-handler-hsflowd)# asynchronous
4. Configure sFlow Introduction to Managing EOS Devices – Setting up Management
eos(config)# sflow source-interface Management1
eos(config)# sflow destination
eos(config)# sflow run
The host metrics should immediately begin to be received at the sFlow collector.

Monday, March 20, 2017


Maximum Performance from Acropolis Hypervisor and Open vSwitch describes the network architecture within a Nutanix converged infrastructure appliance - see diagram above. This article will explore how the Host sFlow agent can be deployed to enable sFlow instrumentation in the Open vSwitch (OVS)  and deliver streaming network and system telemetry from nodes in a Nutanix cluster.
This article is based on a single hardware node running Nutanix Community Edition (CE), built following the instruction in Part I: How to setup a three-node NUC Nutanix CE cluster. If you don't have hardware readily available, the article, 6 Nested Virtualization Resources To Get You Started With Community Edition, describes how to run Nutanix CE as a virtual machine.
The sFlow standard is widely supported by network equipment vendors, which combined with sFlow from each Nutanix appliance, delivers end to end visibility in the Nutanix cluster. The following screen captures from the free sFlowTrend tool are representative examples of the data available from the Nutanix appliance.
The Network > Top N chart displays the top flows traversing OVS. In this case an HTTP connection is responsible for most of the traffic. Inter-VM and external traffic flows traverse OVS and are efficiently monitored by the embedded sFlow instrumentation.
The Hosts > CPU utilization chart shows an increase in CPU utilization due to the increased traffic.
The Hosts > Disk IO shows the Write operations associated with connection.

Installing Host sFlow agent on Nutanix appliance

The following steps install Host sFlow on a Nutanix device:

First log into the Nutanix host as root.

Next, find the latest version of the Centos 7 RPM on and use the following commands to download and install the software:
rpm -ivh hsflowd-centos7-2.0.8-1.x86_64.rpm
rm hsflowd-centos7-2.0.8-1.x86_64.rpm
Edit the /etc/hsflowd.conf file to direct sFlow telemetry to collector, enable KVM monitoring (virtual machine stats), and push sFlow configuration to OVS (network stats):
sflow {
  # collectors:
  collector { ip= udpport=6343 }
  # Open vSwitch sFlow configuration:
  ovs { }
  # KVM (libvirt) hypervisor and VM monitoring:
  kvm { }
Now start the Host sFlow daemon:
systemctl enable hsflowd.service
systemctl start hsflowd.service
Data will immediately start to appear in sFlowTrend.

Wednesday, February 22, 2017


A QUIC update on Google’s experimental transport describes some of the benefits of  the QUIC (Quick UDP Internet Connections) protocol that is now the default transport when Google's Chrome browser connects to Google services (gmail, search, etc.). Given the over 50% market share of the Chrome browser (NetMarketShare) and the popularity of Google services, it is important to be aware of the QUIC protocol and to start tracking its use of network resources.

An easy way to see if you have any QUIC traffic on your network is to use the standard sFlow instrumentation built into network switches. Configure the switches to send sFlow telemetry to an sFlow collector for visibility into network traffic.

For example, use Docker to run the sFlow-RT active-flows application to analyze the sFlow data stream:
docker run -p 6343:6343/udp -p 8008:8008 -d sflow/top-flows
Access the web interface at http://localhost:8008/ and enter the following Flow Specification to monitor QUICK flows:
Note: Real-time domain name lookups describes how sFlow-RT incorporates DNS (Domain Name Service) requests in its real-time analytics pipeline so that traffic flows can be identified by domain name.

The resulting top flows table is shown in the screen capture above. The Google addresses are identifiable by the domain names (What is and it appears that all the traffic is flowing to or from Google services (as one would expect). However, it would be nice to be able to be notified of QUIC traffic that is not associated with Google since this could represent a threat.

The following quic.js script generates events for QUIC traffic to non-Google domains:

setFlowHandler(function(rec) {
Note: Writing Applications gives an overview of sFlow-RT's embedded script API. The script logs events.

Run the script using the following command:
docker run -v `pwd`/quic.js:/sflow-rt/quic.js \
-e "RTPROP=-Ddns.servers=resolv.conf -Dscript.file=quic.js" \
-p 8008:8008 -p 6343:6343/udp sflow/top-flows
The article Exporting events using syslog shows how the script could be modified export events via syslog to SIEM tools such as Logstash and Splunk.

Monday, January 23, 2017

Telegraf, InfluxDB, Chronograf, and Kapacitor

The InfluxData TICK (Telegraf, InfluxDB, Chronograf, Kapacitor) provides a full set of integrated metrics tools, including an agent to export metrics (Telegraf), a time series database to collect and store the metrics (InfluxDB), a dashboard to display metrics (Chronograf), and a data processing engine (Kapacitor). Each of the tools is open sourced and can be used together or separately.
This article will show how industry standard sFlow agents embedded within the data center infrastructure can provide Telegraf metrics to InfluxDB. The solution uses sFlow-RT as a proxy to convert sFlow metrics into their Telegraf equivalent form so that they are immediately visible through the default Chronograf dashboards (Using a proxy to feed metrics into Ganglia described a similar approach for sending metrics to Ganglia).

The following telegraf.js script instructs sFlow-RT to periodically export host metrics to InfluxDB:
var influxdb = "";

function sendToInfluxDB(msg) {
  if(!msg || !msg.length) return;
  var req = {
  req.error = function(e) {
    logWarning('InfluxDB POST failed, error=' + e);
  try { httpAsync(req); }
  catch(e) {
    logWarning('bad request ' + req.url + ' ' + e);

var metric_names = [

var ntoi;
function mVal(row,name) {
  if(!ntoi) {
    ntoi = {};
    for(var i = 0; i < metric_names.length; i++) {
      ntoi[metric_names[i]] = i;
  return row[ntoi[name]].metricValue;

setIntervalHandler(function() {
  var i,r,msg = [];
  var vals = table('ALL',metric_names);
  for(i = 0; i < vals.length; i++) {
    r = vals[i];

    // Telegraf System plugin metrics
      +' load1='+mVal(r,'load_one')
      +' uptime='+mVal(r,'uptime')+'i');

    // Telegraf CPU plugin metrics
      +' usage_user='+(mVal(r,'cpu_user')||0)
Some notes on the script:
  1. The sentToInfluxDB() function uses the Writing data using the HTTP API to POST metrics to InfluxDB.
  2. The setIntervalHandler function retrieves a table of metrics from sFlow-RT every 15 seconds and formats them to use the same names and tags as Telegraf.
  3. The script implements Telegraf System and CPU plugin functionality.
  4. Additional metrics can easily be added to proxy additional Telegraf plugins.
  5. Writing applications provides an overview of the sFlow-RT APIs.
Start gathering metrics:
docker run -v `pwd`/telegraf.js:/sflow-rt/telegraf.js \
-e "RTPROP=-Dscript.file=telegraf.js" \
-p 8008:8008 -p 6343:6343/udp sflow/sflow-rt
Accessing the Chronograf home page brings up a table of hosts with their status and CPU load:
Clicking on the leaf1 host displays a dashboard trending key performance metrics:
Pre-processing the metrics using sFlow-RT's real-time streaming analytics engine can greatly increase scaleability by selectively exporting metrics and calculating higher level summary statistics in order to reduce the amount of data logged to the time series database. The analytics pipeline can also augment the metrics with additional metadata.
For example, Collecting Docker Swarm service metrics demonstrates how sFlow-RT can monitor dynamic service pools running under Docker Swarm and write summary statistics to InfluxDB. In this case Grafana was used to build metrics dashboard instead of Chronograf.

The open source Host sFlow agent exports an extensive range of standard sFlow metrics and has been ported to a wide range of platforms. Standard metrics describes how standardization helps reduce operational complexity. The overlap between standard sFlow metrics and Telegraf base plugin metrics makes the task of proxying straightforward.
The Host sFlow agent (and sFlow agents embedded in network switches and routers) goes beyond simple metrics export to provide detailed visibility into network traffic and articles on this blog demonstrate how sFlow-RT analytics software can be configured to generate detailed traffic flow metrics that can be streamed into InfluxDB, logged (e.g. Exporting events using syslog), or trigger control actions (e.g. DDoS mitigationDocker 1.12 swarm mode elastic load balancing).

Friday, December 16, 2016

Using Ganglia to monitor Linux services

The screen capture from the Ganglia monitoring tool shows metrics for services running on a Linux host. Monitoring Linux services describes how the open source Host sFlow agent has been extended to export standard Virtual Node metrics from services running under systemd. Ganglia already supports these standard metrics and the article Using Ganglia to monitor virtual machine pools describes the configuration steps needed to enable this feature.

Thursday, December 15, 2016

Monitoring Linux services

Mainstream Linux distributions have moved to systemd to manage daemons (e.g. httpd, sshd, etc.). The diagram illustrates how systemd runs each daemon within its own container so that it can maintain tight control of the daemon's resources.

This article describes how to use the open source Host sFlow agent to gather telemetry from daemons running under systemd.

Host sFlow systemd monitoring exports a standard set of metrics for each systemd service - the sFlow Host Structures extension defines metrics for Virtual Nodes (virtual machines, containers, etc.) that are used to export Xen, KVM, Docker, and Java resource usage. Exporting the standard metrics for systemd services provides interoperability with sFlow analyzers, allowing them to report on Linux services using existing virtual node monitoring capabilities.

While running daemons within containers helps systemd maintain control of the resources, it also provides a very useful abstraction for monitoring. For example, a single service (like the Apache web server) may consist of dozens of processes. Reporting on container level metrics abstracts away the per-process details and gives a view of the total resources consumed by the service. In addition, service metadata (like the service name) provides a useful way of identifying and grouping services, for example, making it easy to report on total CPU consumed by the web service across a pool of servers.

Systemd monitoring is easy to set up.

First download and install the latest software release.

Next, enable the systemd module by adding the highlighted line in the /etc/hsflowd.conf file:
  collector{ ip= }
This is a minimal configuration that sends sFlow telemetry to a collector running on host The Host sFlow agent is capable of gathering an extensive set of network, system and application level metrics. See Configuring Host sFlow for Linux for a full set of options.

Finally, start the agent:
sudo systemctl enable hsflowd.service
sudo systemctl start hsflowd.service
For the best accuracy, enable systemd cgroup accounting by adding the following entries to the /etc/systemd/system.conf file and rebooting the server:
The Host sFlow agent will automatically detect when cgroup accounting has been enabled. However, if cgroup accounting hasn't been enabled, it is still able to compute and export statistics, although it might miss contributions from short lived processes.

Once the agents have been configured, verify that sFlow telemetry is being received at the collector using sflowtool. The simplest way to run sflowtool is using Docker:
docker run -p 6343:6343/udp sflow/sflowtool
The following output shows the statistics exported for the apache2 service:
startSample ----------------------
sampleType_tag 0:2
sampleSequenceNo 50
sourceId 3:112270
counterBlock_tag 0:2103
vdsk_capacity 0
vdsk_allocation 0
vdsk_available 0
vdsk_rd_req 0
vdsk_rd_bytes 0
vdsk_wr_req 0
vdsk_wr_bytes 0
vdsk_errs 0
counterBlock_tag 0:2102
vmem_memory 16674816
vmem_maxMemory 0
counterBlock_tag 0:2101
vcpu_state 1
vcpu_cpu_mS 180
vcpu_cpuCount 0
counterBlock_tag 0:2002
parent_dsClass 2
parent_dsIndex 1
counterBlock_tag 0:2000
hostname apache2.service
UUID 92-53-c6-17-60-65-52-a2-ac-f7-76-cb-7b-63-d9-23
machine_type 3
os_name 2
os_release 4.4.0-45-generic
endSample   ----------------------
Install Host sFlow agents on all the hosts in the data center for comprehensive visibility.