Tuesday, October 17, 2017

Arista EOS CloudVision

Arista EOS® CloudVision® provides a centralized point of visibility, configuration and control for Arista devices. The CloudVision controller is available as a virtual machine or physical appliance.

Fabric Visibility on Arista EOS Central describes how to use industry standard sFlow instrumentation in Arista switches to deliver real-time flow analytics. This article describes the steps needed to integrate flow analytics into CloudVision.

Log into the CloudVision node and run the following cvp_install_fabricview.sh script as root:
# Install Fabric View on CloudVision Portal (CVP)

VER=`wget -qO - http://inmon.com/products/sFlow-RT/latest.txt`
wget http://www.inmon.com/products/sFlow-RT/sflow-rt-$VER.noarch.rpm
rpm --nodeps -ivh sflow-rt-$VER.noarch.rpm
/usr/local/sflow-rt/get-app.sh sflow-rt fabric-view

ln -s /cvpi/jdk/bin/java /usr/bin/java

sed -i '/^# http.hostname=/s/^# //' /usr/local/sflow-rt/conf.d/sflow-rt.conf
echo "http.html.redirect=./app/fabric-view/html/" >> /usr/local/sflow-rt/conf.d/sflow-rt.conf

cat <<EOT > /etc/nginx/conf.d/locations/sflow-rt.https.conf
location /sflow-rt/ {
  auth_request /aeris/auth;
  proxy_buffering off;
  proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for;
  proxy_set_header X-Forwarded-Prefix /sflow-rt/;
  proxy_set_header Host \$host;
  proxy_pass http://localhost:8008/;
  proxy_redirect ~^http://[^/]+(/.+)\$ /sflow-rt\$1;

systemctl restart nginx.service

firewall-cmd --zone public --add-port=6343/udp --permanent
firewall-cmd --reload

systemctl enable sflow-rt.service
systemctl start sflow-rt.service

wget http://www.inmon.com/products/sFlow-RT/cvp-eapi-topology.py
chmod +x cvp-eapi-topology.py

echo "configure and run cvp-eapi-topology.py"
Edit the cvp-api-topology.py script to specify CVP_USER and CVP_PASSWORD (and EAPI_USER and EAPI_PASSWORD if they differ). Now run the script to discover the physical topology and post it to Fabric View:
Note: The script needs to be run any time the physical topology changes, or you can run the script periodically using cron.

Flow analytics requires sFlow to be enabled on all the switches, sending the data to the CloudVision node. This can be accomplished using a CloudVision configlet to push the configuration to switches. For example the following configuration enables sFlow on all switch ports and sends the data to CloudVision node
sflow sample 20000
sflow polling-interval 30
sflow destination
sflow source-interface Management1
sflow run
Optionally, follow the steps in Arista EOS telemetry to enhance the sFlow telemetry stream from the switches with detailed CPU, memory, disk, and host network statistics.

Finally, access the Fabric View web interface at https://cloudvision/sflow-rt/ using your CloudVision login credentials.
Fabric View is an open source application running on top of the sFlow-RT analytics engine. The Fabric View software can easily be modified to add new capabilities, e.g. Black hole detection.

A number of applications are available for sFlow-RT. Writing Applications describes how to use sFlow-RT's APIs to extend or modify existing applications or develop new applications. In addition, there are also many sFlow-RT related articles on this blog. For example, Arista eAPI describes how to automatically push controls based on flow measurements, describing automated DDoS mitigation as a use case. Other use cases include: traffic engineering, traffic accounting, anomaly detection, intrusion detection, FIB optimization, targeted packet capture etc.

Wednesday, September 27, 2017

Real-time visibility and control of campus networks

Many of the examples on this blog describe network visibility driven control of data center networks. However, campus networks face many similar challenges and the availability of industry standard sFlow telemetry and RESTful control APIs in campus switches make it possible to apply feedback control.

HPE Aruba has an extensive selection of campus switches that combine programmatic control via a REST API with hardware sFlow support:
  • Aruba 2530 
  • Aruba 2540 
  • Aruba 2620
  • Aruba 2930F
  • Aruba 2930M
  • Aruba 3810
  • Aruba 5400R
  • Aruba 8400
 This article presents an example of implementing quota controls using HPE Aruba switches.
Typically, a small number of hosts are responsible for the majority of traffic on the network: identifying those hosts, and applying controls to their traffic to prevent them from unfairly dominating, ensures fair access to all users.

Peer-to-peer protocols (P2P) pose some unique challenges:
  • P2P protocols make use of very large numbers of connections in order to quickly transfer data. The large number of connections allows a P2P user to obtain a disproportionate amount of network bandwidth; even a small number of P2P users (less than 0.5% of users) can consume over 90% of the network bandwidth.
  • P2P protocols (and users) are very good at getting through access control lists (acl) by using non-standard ports, using port 80 (web) etc. Trying to maintain an effective filter to identify P2P traffic is a challenge and the resulting complex rule sets consume significant resources in devices attempting to perform classification.
In this example, all switches are configured to stream sFlow telemetry to an instance of the sFlow-RT real-time sFlow analyzer running a quota controller. When a host exceeds its traffic quota, a REST API call is made to the host's access switch, instructing the switch to mark the host's traffic as low priority. Marking the traffic ensures that if congestion occurs elsewhere in the network, typically on the Internet access links, priority queuing will cause marked packets can be dropped, reducing the bandwidth consumed by the marked host. The quota controller continues to monitor the host and the marking action is removed when the host's traffic returns to acceptable levels.

A usage quota is simply a limit on the amount of traffic that a user is allowed to generate in a given amount of time. Usage quotas have a number of attributed that make them an effective means of managing P2P activity:
  • A simple usage quota is easy to maintain and enforce and encourages users to be more responsible in their use of shared resources.
  • Since quota based controls are interested in the overall amount of traffic that a host generates and not the specific type of traffic, they don't encourage users to tailor P2P application setting to bypass access control rules and so their traffic is easier to monitor.
  • A quota system can be implemented using standard network hardware, without the addition of a "traffic shaping" appliance that can become a bottleneck and point of failure.
The following quota.js script implements the quota controller functionality:
var user = 'manager';
var password = 'manager';
var scheme = 'http://';
var mbps = 10;
var interval = 10;
var timeout = 20;
var dscp = '10';
var groups = {'ext':[''],'inc':[''],'exc':['']};

function runCmd(agent,cmd) {
  var headers = {'Content-Type':'application/json','Accept':'application/json'};
  // create session
  var auth = http2({
    body: JSON.stringify({userName:user, password:password})

  // make request
  var resp = http2({
    body: JSON.stringify({cmd:cmd})
  var result = base64Decode(JSON.parse(resp.body).result_base64_encoded);

  // end session
  var end = http2({
  return result;


setFlow('src', {

setThreshold('quota', {
  byFlow: true,

var controls = {};
setEventHandler(function(evt) {
  var ip = evt.flowKey;
  if(controls[ip]) return;

  var agent = evt.agent;
  var ds = evt.dataSource;
  controls[ip] = {agent:agent,ds:ds,time:Date.now()};
  logInfo('mark '+ip+' agent '+agent);
  try { runCmd(agent,'qos device-priority '+ip+' dscp '+dscp); }
  catch(e) { logWarning('runCmd error ' + e); }
}, ['quota']);

setIntervalHandler(function() {
  for(var ip in controls) {
    var ctl = controls[ip];
    if(thresholdTriggered('quota',ctl.agent,ctl.ds+'.src',ip)) continue;

    logInfo('unmark '+ip+' agent '+ctl.agent);
    try { runCmd(ctl.agent,'no qos device-priority '+ip); }
    catch(e) { logWarning('runCmd error ' + e); }
    delete controls[ip];
Some notes on the script:
  • Writing Applications provides an overview of the sFlow-RT scripting API.
  • HPE ArubaOS-Switch REST API and JSON Schema Reference Guide 16.03 describes the REST API calls in the runCmds() function.
  • The groups variable defines groups of IP addresses, identifying external addresses (ext), addresses included as candidates for the quota controller (inc), and addresses excluded from quota controls (exc).
  • Defining Flows describes the arguments to the setFlow() function. In this case, calculating a 10 second moving average of traffic from local sources (inc) to external destinations (ext).
  • The setThreshold() function creates a threshold that triggers if the moving average of traffic from an address exceeds 10Mbit/s. The thresholds are only applied to 10M, 100M and 1G access ports, ensuring that controls are only applied to access layer switches and not to 10G ports on aggregation and core switches.
  • The setEventHandler() function processes the events, keeping track of existing controls and implementing new DSCP marking rules.
  • The setIntervalHandler() function runs periodically and removes marking rules when they are not longer required (after the threshold timeout has expired).
  • The ArubaOS commands to add / remove DSCP marking for the host are highlighted in blue.
The easiest way to try out the script is to use Docker to run sFlow-RT:
docker run -v $PWD/cli.js:/sflow-rt/quota.js -e "RTPROP=-Dscript.file=quota.js" -p 6343:6343/udp -p 8008:8008 sflow/sflow-rt
2017-09-27T02:32:14+0000 INFO: Listening, sFlow port 6343
2017-09-27T02:32:14+0000 INFO: Listening, HTTP port 8008
2017-09-27T02:32:14+0000 INFO: quota.js started
2017-09-27T02:33:53+0000 INFO: mark agent
2017-09-27T02:34:25+0000 INFO: unmark agent
The output indicates that a control marking traffic from was added to edge switch and removed just over 30 seconds later.
The screen capture using Flow Trend to monitor the core switches shows the controller in action. The marking rule is added as soon as traffic from exceeds the 10Mbps quota for 10 seconds (the DSCP marking shown in the chart changes from red be(0) to blue af11(10)). The marking rule is removed once the traffic returns to normal.
The quota settings in the demonstration were aggressive. In practice the threshold settings would be higher and the timeouts longer in order to minimize control churn. The trend above was collected from a university network of approximately 20,000 users that implemented a similar control scheme. In this case, the quota controller was able to consistently mark 50% of the traffic as low priority. Between 10 and 20 controls were in place at any given time and the controller made around 10 control changes an hour (adding or removing a control). During busy periods, congestion on the campus Internet access links was eliminated since marked traffic could be discarded when necessary.

Implementing usage quotas is just one example of applying measurement based control to a campus network. Other interesting applications include:
Real-time measurement and programmatic APIs are becoming standard features of campus switches, allowing visibility driven automatic control to adapt the network to changing network demands and security threats.

Wednesday, September 20, 2017

Flow Trend

The open source sflow-rt/flow-trend project displays a real-time trend chart of network traffic that updates every second. Defining Flows describes how to break out traffic by different traffic attributes, including: addresses, ports, VLANs, protocols, countries, DNS names, etc.
docker run -p 6343:6343/udp -p 8008:8008 sflow/flow-trend
The simplest way to run the software is using the docker. Configure network devices to send standard sFlow telemetry to Flow Trend. Access the web user interface on port 8008.

Sunday, September 10, 2017

Real-time traffic visualization using Netflix Vizceral

The open source sflow-rt/vizceral project demonstrates how real-time sFlow network telemetry can be presented using Netflix Vizceral. The central dot represents the Internet (all non-local addresses). The surrounding dots represents addresses grouped into sites, data centers, buildings etc. The animated particle flows represent packet flows with colors indicating packet type: TCP/UDP shown in blue, ICMP shown in yellow, and all other traffic in red.
Click on a node to zoom in to show packets flowing up and down the protocol stack. Press the ESC key to unzoom.

The simplest way to run the software is to use the pre-built Docker image:
docker run -p 6343:6343/udp -p 8008:8008 sflow/vizceral
The Docker image also contains demo data based on Netflix's public cloud infrastructure:
docker run -e "RTPROP=-Dviz.demo=yes" -p 8008:8008 sflow/vizceral
In this case, the detailed view shows messages flowing between microservices running in the Amazon public cloud. Similar visibility could be obtained by deploying Host sFlow agents with associated modules for web and application servers and modifying sflow/vizceral to present the application transaction flows. In private data centers, sFlow support in load balancers  (F5, A10) provides visibility into interactions between application tiers. See Microservices for more information on using sFlow to instrument microservice architectures.
Collecting Docker Swarm service metrics describes how meta data about services running on Docker Swarm can be combined with sFlow telemetry to generate service level metrics. A similar approach could be taken to display Docker Swarm service interactions using Vizceral. Using network visibility to measure flows between services greatly simplifies the monitoring task, avoiding the challenge of adding instrumentation to each container.

Tuesday, September 5, 2017

Troubleshooting connectivity problems in leaf and spine fabrics

Introducing data center fabric, the next-generation Facebook data center network describes the benefits of moving to a leaf and spine network architecture. The diagram shows how the leaf and spine architecture creates many paths between each pair of hosts. Multiple paths increase available bandwidth and resilience against the loss of a link or a switch. While most networks don't have the scale requirements of Facebook, smaller scale leaf and spine designs deliver high bandwidth, low latency, networking to support cloud workloads (e.g. vSphere, OpenStack, Docker, Hadoop, etc.).

Unlike traditional hierarchical network designs, where a small number of links can be monitored to provide visibility, a leaf and spine network has no special links or switches where running CLI commands or attaching a probe would provide visibility. Even if it were possible to attach probes, the effective bandwidth of a leaf and spine network can be as high as a Petabit/second, well beyond the capabilities of current generation monitoring tools.

Fortunately, industry standard sFlow monitoring technology is built into the commodity switch hardware used to build leaf and spine networks. Enabling sFlow telemetry on all the switches in the network provides centralized, real-time, visibility into network traffic.
Fabric View describes an open source application running on the sFlow-RT real-time analytics engine. The Fabric View application provides an overview of the health of the entire leaf and spine fabric, tracking flows and counters on all links and summarizing information in a set of fabric level metrics and dashboards. In addition, Black hole detection describes how to detect routing anomalies in the fabric using the forwarding information included in the sFlow telemetry stream.

The sFlow sampling mechanism implemented in the switches is a highly scaleable method of passively collecting traffic information. However,  analyzing failed connections can be a challenge since very few packets are generated and the chance of sampling these packets is small. The traditional tools used to diagnose connectivity issues, ping and traceroute, are of limited value in a leaf and spine network since they only test a single path and are likely to miss the path that is experiencing difficulties.

An alternative method of addressing the multi-path tracing problem is to enable filtered packet capture on each switch, programming the filters to capture the packets of interest. However, this method can be slow and complex since every switch needs to be configured for each test and the switch configurations need to be cleared after the test has been completed.

This article explores how the hping3 tool can be used with sFlow to trace packet paths across the fabric and detect where they are being lost. The following Python script, trace.py, uses sFlow-RT's REST API to program a flow to watch for a specific flow and print the links that it traverses:
#!/usr/bin/env python
import argparse
import requests
import json
import signal
from random import randint

def sig_handler(signal,frame):
signal.signal(signal.SIGINT, sig_handler)

parser = argparse.ArgumentParser()
parser.add_argument('filter', help='sFlow-RT flow filter, e.g. "ipsource="')
args = parser.parse_args()

rt = 'http://localhost:8008'
name = 'trace' + str(randint(0,10000))

flow = {'keys':'link:inputifindex','value':'frames',

flowurl = rt+'/flows/json?maxFlows=100&timeout=60&name='+name
flowID = -1
while 1 == 1:
  r = requests.get(flowurl+'&flowID='+str(flowID))
  if r.status_code != 200: break
  flows = r.json()
  if len(flows) == 0: continue

  flowID = flows[0]["flowID"]
  for f in flows:
    print f['flowKeys']
Note: See RESTflow for a description of the sFlow-RT REST API.

First run the following Python script, supplying a filter to select the packets of interest:
./trace.py 'ipsource='
Note: Identifying characteristics of failed connections may be inferable from application error logs. Otherwise, running packet capture on the affected host (tcpdump/wireshark) can identify the network attributes of interest.

Next, log into the host that is having connectivity problems and generate traffic matching the flow:
sudo hping3 -c 100000 -i u100 --udp -k -s 1111 -p 53
Note: The above command sends 100,000 packets at a rate of 1 packet every 100 microseconds (i.e. at a rate of 10,000 packets per second).  Select a packet rate that will not disturb production traffic on the network and make sure to send enough packets so that at least one packet will be sampled on each link. For example, for 10G links the packet sampling rate should be around 1-in-10,000 so generating 100,000 packets means that there is a 99.995% chance that a link carrying the flow will generate at least 1 sample (the probability is easily calculated using the Binomal distribution, see Wolfram Alpha).

The trace.py script will start printing links traversed by the flow immediately they are detected (typically in less than a second after starting the test):
./trace.py 'ipsource='
The above example traced the single path traversed by a specific connection. To explore all paths, drop the source port and hping3 will cycle through source ports and the traffic should be visible on all the equal cost paths (provided that a layer 4 hash function has been selected by the switches).

Drop the source port from the trace.py filter:
./trace.py 'ipsource='
Drop the -k and -s options from the hping3 command:
sudo hping3  -c 100000 -i u100 --udp -p 53
The open source trace-flow application is a graphical version of the trace.py script written using sFlow-RT's JavaScript API (see Writing Applications). The screen capture above displayed the path for the test traffic within a second of the start of test.

Continuous network-wide monitoring of leaf and spine networks using sFlow leverages the capabilities of commodity switch hardware and provides centralized visibility that simplifies network operation and trouble shooting.

Sunday, August 27, 2017

Cumulus Linux 3.4 REST API

The latest Cumulus Linux 3.4 release include a REST API. This article will demonstrate how the REST API can be used to automatically deploy traffic controls based on real-time sFlow telemetry. DDoS mitigation with Cumulus Linux describes how sFlow-RT can detect Distributed Denial of Service (DDoS) attacks in real-time and deploy automated controls.

The following ddos.js script is modified to use the REST API to send Network Command Line Utility - NCLU commands to add and remove ACLs, see Installing and Managing ACL Rules with NCLU:
var user = "cumulus";
var password = "CumulusLinux!";
var thresh = 10000;
var block_minutes = 1;


setThreshold('attack',{metric:'udp_target', value:thresh, byFlow:true, timeout:10});

function restCmds(agent,cmds) {
  for(var i = 0; i < cmds.length; i++) {
    let msg = {cmd:cmds[i]};

var controls = {};
var id = 0;
setEventHandler(function(evt) {
  var key = evt.agent + ',' + evt.flowKey;
  if(controls[key]) return;

  var ifname = metric(evt.agent,evt.dataSource+".ifname")[0].metricValue;
  if(!ifname) return;

  var now = (new Date()).getTime();
  var name = 'ddos'+id++;
  var [ip,port] = evt.flowKey.split(',');
  var cmds = [
    'add acl ipv4 '+name+' drop udp source-ip any source-port '+port+' dest-ip '+ip+' dest-port any',
    'add int '+ifname+' acl ipv4 '+name+' inbound',
  controls[key] = {time:now, target: ip, port: port, agent:evt.agent, metric:evt.dataSource+'.'+evt.metric, key:evt.flowKey, name:name};
  try { restCmds(evt.agent, cmds); }
  catch(e) { logSevere('failed to add ACL, '+e); }
  logInfo('block target='+ip+' port='+port+' agent=' + evt.agent); 

setIntervalHandler(function() {
  var now = (new Date()).getTime();
  for(var key in controls) {
    if(now - controls[key].time < 1000 * 60 * block_minutes) continue;
    var ctl = controls[key];
    if(thresholdTriggered('attack',ctl.agent,ctl.metric,ctl.key)) continue;

    delete controls[key];
    var cmds = [
      'del acl ipv4 '+ctl.name,
    try { restCmds(ctl.agent,cmds); }
    catch(e) { logSevere('failed to remove ACL, ' + e); }
    logInfo('allow target='+ctl.target+' port='+ctl.port+' agent='+ctl.agent);
The quickest way test the script is to use docker to run sFlow-RT:
docker run -v $PWD/ddos.js:/sflow-rt/ddos.js \
-e "RTPROP=-Dscript.file=ddos.js -Dhttp.timeout.read=60000" \
-p 6343:6343/udp -p 8008:8008 sflow/sflow-rt
This solution can be tested using freely available software. The setup shown at the top of this article was constructed using a Cumulus VX virtual machine running on VirtualBox.  The Attacker and Target virtual machines are Linux virtual machines used to simulate the DDoS attack.

DNS amplification attack can be simulated using hping3. Run the following command on the Attacker host:
sudo hping3 --flood --udp -k -s 53
Run tcpdump on the Target host to see if the attack is getting through:
sudo tcpdump -i eth1 udp port 53
Each time an attack is launched a new ACL will be added that matches the attack signature and drops the traffic. The ACL is kept in place for at least block_minutes and removed once the attack ends. The following sFlow-RT log messages show the results:
2017-08-26T17:01:24+0000 INFO: Listening, sFlow port 6343
2017-08-26T17:01:24+0000 INFO: Listening, HTTP port 8008
2017-08-26T17:01:24+0000 INFO: ddos.js started
2017-08-26T17:03:07+0000 INFO: block target= port=53 agent=
2017-08-26T17:03:49+0000 INFO: allow target= port=53 agent=
REST API for Cumulus Linux ACLs describes the acl_server daemon that was used in the original article. The acl_server daemon is optimized for real-time performance, supporting use cases in which multiple traffic controls need to be quickly added and removed, e.g  DDoS mitigation, marking large flows, ECMP load balancing, packet brokers.

A key benefit of the openness of Cumulus Linux is that you can install software to suite your use case, other examples include: BGP FlowSpec on white box switchInternet router using Cumulus LinuxTopology discovery with Cumulus LinuxBlack hole detection, and Docker networking with IPVLAN and Cumulus Linux.

Thursday, July 13, 2017

Linux 4.11 kernel extends packet sampling support

Linux 4.11 on Linux Kernel Newbies describes the features added in the April 30, 2017 release. Of particular interest is the new netlink sampling channel:
Introduce psample, a general way for kernel modules to sample packets, without being tied to any specific subsystem. This netlink channel can be used by tc, iptables, etc. and allow to standardize packet sampling in the kernel commit
The psample netlink channel delivers sampled packet headers along with associated metadata from the Linux kernel to user space. The psample fields map directly into sFlow Version 5 sampled_header export structures:

netlink psamplesFlowDescription
PSAMPLE_ATTR_IIFINDEXinputInterface packet was received on.
PSAMPLE_ATTR_OIFINDEXoutputInterface packet was sent on.
PSAMPLE_ATTR_GROUP_SEQdropsNumber of times that the sFlow agent detected that a packet marked to be sampled was dropped due to lack of resources. Agent calculates drops by tracking discontinuities in PSAMPLE_ATTR_GROUP_SEQ
PSAMPLE_ATTR_SAMPLE_RATEsampling_rateThe Sampling Rate specifies the ratio of packets observed at the Data Source to the samples generated. For example a sampling rate of 100 specifies that, on average, 1 sample will be generated for every 100 packets observed.
PSAMPLE_ATTR_ORIGSIZEframe_lengthOriginal length of packet before sampling
PSAMPLE_ATTR_DATAheader<>Header bytes

Linux is widely used for switch network operating systems, including: Arista EOS, Cumulus Linux, Dell OS10, OpenSwitch, SONiC, and Open Network Linux. The adoption of Linux by network vendors and cloud providers is driving increased support for switch hardware by the Linux kernel community.

Hardware support for sFlow packet sampling is widely implemented in switch ASICs, including: Broadcom, Mellanox, Intel, Marvell, Barefoot Networks, Cavium, and Innovium. A standard Linux interface to ASIC sampling simplifies the implementation of sFlow agents (e.g. Host sFlow) and ensures consistent behavior across hardware platforms to deliver real-time network-wide visibility using industry standard sFlow protocol.