Friday, February 2, 2018

OpenSwitch

OpenSwitch is a Linux Foundation project providing an open source white box control plane running on a standard Linux distribution. The diagram above shows the OpenSwitch architecture.

This article describes how to enable industry standard sFlow telemetry using the open source Host sFlow agent. The Host sFlow agent uses Control Plane Services (CPS) to configure sFlow instrumentation in the hardware and gather metrics. CPS in turn uses the Open Compute Project (OCP) Switch Abstraction Interface (SAI) as a vendor independent method of configuring the hardware. Hardware support for sFlow is a standard feature supported by Network Processing Unit (NPU) vendors (Barefoot, Broadcom, Cavium, Innovium, Intel, Marvell, Mellanox, etc.) and vendor neutral sFlow configuration is part of the SAI.

Installing and configuring Host sFlow agent

Installing the software is simple. Log into the switch and type the following commands:
wget --no-check-certificate https://github.com/sflow/host-sflow/releases/download/v2.0.13-3/hsflowd-opx_2.0.13-3_amd64.deb
sudo dpkg -i hsflowd-opx_2.0.13-3_amd64.deb
The sFlow agent requires very little configuration, automatically monitoring all switch ports using the following default settings:

Link SpeedSampling RatePolling Interval
1 Gbit/s1-in-1,00030 seconds
10 Gbit/s1-in-10,00030 seconds
25 Gbit/s1-in-25,00030 seconds
40 Gbit/s1-in-40,00030 seconds
50 Gbit/s1-in-50,00030 seconds
100 Gbit/s1-in-100,00030 seconds

Note: The default settings ensure that large flows (defined as consuming 10% of link bandwidth) are detected within approximately 1 second - see Large flow detection

Edit the /etc/hsflowd.conf file to specify the address of an sFlow analyzer (10.0.0.1):
sflow {
  collector { ip = 10.0.0.1 }
}
Monitoring Linux services describes how configure Host sFlow to include detailed telemetry for all services running on OpenSwitch:
  • bind9.service
  • cron.service
  • dbus.service
  • getty@tty1.service
  • getty@tty2.service
  • getty@tty3.service
  • getty@tty4.service
  • getty@tty5.service
  • getty@tty6.service
  • hsflowd.service
  • lldpd.service
  • networking.service
  • opx-alms.service
  • opx-cps.service
  • opx-front-panel-ports.service
  • opx-ip.service
  • opx-monitor-phy-media.service
  • opx-nas-shell.service
  • opx-nas.service
  • opx-nbmgr.service
  • opx-phy-media-config.service
  • opx-tmpctl.service
  • polkitd.service
  • redis-server.service
  • rsyslog.service
  • snmpd.service
  • ssh.service
  • systemd-journald.service
  • systemd-logind.service
  • systemd-udevd.service

Finally, start the Host sFlow agent:
sudo systemctl enable hsflowd
sudo systemctl start hsflowd
Using the Host sFlow agent to monitor Linux servers and switches provides a consistent set of measurements end-to-end, particularly for cloud infrastructure such as OpenStack and Docker where the network extends into the servers in the form of virtual switches and routers.

Collecting and analyzing sFlow

Visibility and the software defined data center describes the general architecture of sFlow monitoring. Standard sFlow agents embedded within the elements of the infrastructure stream essential performance metrics to management tools, ensuring that every resource in a dynamic cloud infrastructure is immediately detected and continuously monitored.

The Host sFlow agent on OpenSwitch streams standard Linux performance statistics in addition to the interface counters and packet samples that you would typically get from a networking device.
Note: Enhanced visibility into host performance is particularly important on open switch platforms since they may be running a number of user installed services that can stress the limited CPU, memory and IO resources.
For example, the following sflowtool output shows the raw data contained in an sFlow datagram:
startDatagram =================================
datagramSourceIP 10.0.0.59
datagramSize 1332
unixSecondsUTC 1516946395
datagramVersion 5
agentSubId 100000
agent 10.0.0.59
packetSequenceNo 340132
sysUpTime 17479000
samplesInPacket 7
startSample ----------------------
sampleType_tag 0:2
sampleType COUNTERSSAMPLE
sampleSequenceNo 876
sourceId 2:1
counterBlock_tag 0:2001
counterBlock_tag 0:2005
disk_total 8102721536
disk_free 5178248192
disk_partition_max_used 37.77
disk_reads 25339
disk_bytes_read 562041856
disk_read_time 25380
disk_writes 3192551
disk_bytes_written 28776890368
disk_write_time 1043712
counterBlock_tag 0:2004
mem_total 2107891712
mem_free 142082048
mem_shared 0
mem_buffers 155873280
mem_cached 1611935744
swap_total 0
swap_free 0
page_in 184268
page_out 9367478
swap_in 0
swap_out 0
counterBlock_tag 0:2003
cpu_load_one 0.010
cpu_load_five 0.030
cpu_load_fifteen 0.000
cpu_proc_run 2
cpu_proc_total 167
cpu_num 2
cpu_speed 2699
cpu_uptime 3541814
cpu_user 3336490
cpu_nice 0
cpu_system 5479320
cpu_idle 2754958964
cpu_wio 168960
cpuintr 160
cpu_sintr 2717250
cpuinterrupts 656232310
cpu_contexts 1704273704
cpu_steal 0
cpu_guest 0
cpu_guest_nice 0
counterBlock_tag 0:2006
nio_bytes_in 267777
nio_pkts_in 4210
nio_errs_in 0
nio_drops_in 0
nio_bytes_out 2104528
nio_pkts_out 2227
nio_errs_out 0
nio_drops_out 0
counterBlock_tag 0:2000
hostname opx2_vm
UUID 40-d4-8b-d5-6b-29-4e-4a-be-48-d6-55-8d-f6-81-73
machine_type 3
os_name 2
os_release 3.16.0-4-amd64
endSample   ----------------------
startSample ----------------------
sampleType_tag 0:2
sampleType COUNTERSSAMPLE
sampleSequenceNo 876
sourceId 0:44
counterBlock_tag 0:1005
ifName e101-001-0
counterBlock_tag 0:1
ifIndex 3
networkType 6
ifSpeed 0
ifDirection 2
ifStatus 0
ifInOctets 0
ifInUcastPkts 0
ifInMulticastPkts 0
ifInBroadcastPkts 0
ifInDiscards 0
ifInErrors 0
ifInUnknownProtos 4294967295
ifOutOctets 0
ifOutUcastPkts 0
ifOutMulticastPkts 0
ifOutBroadcastPkts 0
ifOutDiscards 0
ifOutErrors 0
ifPromiscuousMode 0
endSample   ----------------------
startSample ----------------------
sampleType_tag 0:1
sampleType FLOWSAMPLE
sampleSequenceNo 1022129
sourceId 0:7
meanSkipCount 128
samplePool 130832512
dropEvents 0
inputPort 7
outputPort 10
flowBlock_tag 0:1
flowSampleType HEADER
headerProtocol 1
sampledPacketSize 1518
strippedBytes 4
headerLen 128
headerBytes 6C-64-1A-00-04-5E-E8-E7-32-77-E2-B5-08-00-45-00-05-DC-63-06-40-00-40-06-9E-21-0A-64-0A-97-0A-64-14-96-9A-6D-13-89-4A-0C-4A-42-EA-3C-14-B5-80-10-00-2E-AB-45-00-00-01-01-08-0A-5D-B2-EB-A5-15-ED-48-B7-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35
dstMAC 6c641a00045e
srcMAC e8e73277e2b5
IPSize 1500
ip.tot_len 1500
srcIP 10.100.10.151
dstIP 10.100.20.150
IPProtocol 6
IPTOS 0
IPTTL 64
TCPSrcPort 39533
TCPDstPort 5001
TCPFlags 16
endSample   ----------------------
Note: The Linux host metrics (red), network interface counters (green), and packet sample information (blue) have been highlighted.

sflowtool has a number of additional uses:
  • Verifying that sFlow is being received correctly at the destination
  • Converting binary sFlow data into ASCII for scripted analysis (Python, Perl etc.)
  • Converting sFlow into IPFIX/NetFlow
  • Converting sFlow into PCAP format for use with tcpdump, Wireshark, etc.
  • Replicate sFlow streams for multiple collectors
  • Source code for sFlow decoder that can be used to build custom sFlow analyzer
In addition to sflowtool, there are many other open source and commercial sFlow collectors listed on sFlow.org.

A key feature of sFlow telemetry is the low latency network-wide visibility that is possible because of the stateless nature of the measurements. Comprehensive real-time visibility is an essential building block that provides feedback for operations, automation, and control. Articles on this blog use the sFlow-RT analyzer to demonstrate use cases for real-time telemetry, including:
The programmability of an open Linux network operating system combined with real-time visibility is transformative, providing the foundation services necessary for solutions that automatically adapt the network to changing demands.

Friday, January 26, 2018

Intranet DDoS attacks

As on a Darkling Plain: Network Survival in an Age of Pervasive DDoS talk by Steinthor Bjarnason at the recent NANOG 71 conference. The talk discusses the threat that the proliferation of network connected devices in enterprises create when they are used to launch denial of service attacks. Last year's Mirai attacks are described, demonstrating the threat posed by mixed mode attacks where a compromised host is used to infect large numbers devices on the corporate network.
The first slide from the talk shows a denial attack launched against an external target, launched from infected video surveillance cameras scattered throughout the the enterprise network. The large volume of traffic fills up external WAN link and overwhelms stateful firewalls.
The second slide shows an attack targeting critical internal services that can have been identified by reconnaissance from the compromised devices. In addition, scanning activity associated with reconnaissance for additional devices can itself overload internal resources and cause outages.

In both cases, most of the critical activity occurs behind the corporate firewall, making it extremely challenging to detect and mitigate these threats.

The talk discusses a number of techniques that service providers use to secure their networks that enterprises will need to adopt in order to meet this challenge. In particular, "utilizing flow telemetry to analyze external and internal traffic. This is necessary for attack detection, classification and traceback."

Instrumentation needs to be built into every network device in order to provide the comprehensive visibility required to address these challenges. sFlow is a scaleable streaming telemetry solution built into a wide variety of devices, from low cost edge switches to high end chassis routers. Network vendors that support sFlow include: A10, Aerohive, AlexalA, ALUe, Allied Telesis, Arista, Aruba, Big Switch, Brocade, Cisco, Cumulus, DCN, Dell, D-Link, Edge-Core, Enterasys, Extreme, F5, Fortinet, HPE, Hitachi, Huawei, IBM, IP Infusion, Juniper, NEC, Netgear, OpenSwitch, Open vSwitch, Oracle, Pica8, Plexxi, Pluribus, Proxim, Quanta, Silicom, SMC, ZTE, and ZyXEL.

Selecting devices that support sFlow simplifies operations by ensuring that the visibility needed to effectively manage the network is integrated into the fabric and deployed pervasively. Attempting to add visibility later is complex, expensive, and results in limited coverage.

There are a number of examples of DDoS mitigation using sFlow on this blog. While many of the examples focus on external DDoS attacks, the techniques are equally applicable to the internal network.

Thursday, November 16, 2017

RESTful control of Cumulus Linux ACLs

The diagram above shows how the Cumulus Linux 3.4 HTTP API can be extended to include the functionality described in REST API for Cumulus Linux ACLs. Fast programmatic control of Cumulus Linux ACLs addresses a number of interesting use cases, including: DDoS mitigationElephant flow marking, and Triggered remote packet capture using filtered ERSPAN.

The Github pphaal/acl_server project INSTALL page describes how to install the acl_server daemon and configure the NGINX web server front end for the Cumulus Linux REST API to include the acl_server functions. The integration ensures that the same access controls configured for the REST API apply to the acl_server functions, which appear under the /acl/ path.

The following examples demonstrate the REST API.

Create an ACL

curl -X PUT -H 'Content-Type:application/json' --data '["[iptables]","-A FORWARD --in-interface swp+ -d 10.10.100.10 -p udp --sport 53 -j DROP"]' -k -u 'cumulus:CumulusLinux!' https://10.0.0.52:8080/acl/ddos1
ACLs are sent as a JSON encoded array of strings. Each string will be written as a line in a file stored under /etc/cumulus/acl/policy.d/ - See Cumulus Linux: Netfilter - ACLs. For example, the rule above will be written to the file 50rest-ddos1.rules with the following content:
[iptables]
-A FORWARD --in-interface swp+ -d 10.10.100.10 -p udp --sport 53 -j DROP
This iptables rule blocks all traffic from UDP port 53 (DNS) to host 10.10.100.10. This is the type of rule that might be inserted to block a DNS amplification attack.

Retrieve an ACL

curl -k -u 'cumulus:CumulusLinux!' https://10.0.0.52:8080/acl/ddos1
Returns the result:
[
 "[iptables]", 
 "-A FORWARD --in-interface swp+ -d 10.10.100.10 -p udp --sport 53 -j DROP"
]

List ACLs

curl -k -u 'cumulus:CumulusLinux!' https://10.0.0.52:8080/acl/
Returns the result:
[
 "ddos1"
]

Delete an ACL

curl -X DELETE -k -u 'cumulus:CumulusLinux!' https://10.0.0.52:8080/acl/ddos1

Delete all ACLs

curl -X DELETE -k -u 'cumulus:CumulusLinux!' https://10.0.0.52:8080/acl/
Note: this doesn't delete all the ACLs, just the ones created using the REST API. All default ACLs or manually created ACLs are inaccessible through the REST API.

Errors

The acl_server batches and compiles changes after the HTTP requests complete. Batching has the benefit of increasing throughput and reducing request latency, but makes it difficult to track compilation errors since they are reported later. The acl_server catches the output and status when running cl-acltool and attaches an HTTP Warning header to subsequent requests to indicate that the last compilation failed:
HTTP/1.1 200 OK
Server: nginx/1.6.2
Date: Thu, 16 Nov 2017 21:38:03 GMT
Content-Type: application/json
Transfer-Encoding: chunked
Connection: keep-alive
Keep-Alive: timeout=300
Accept: application/json
Warning: 199 - "check lasterror"
The output of cl-acltool can be retrieved:
curl -k -u 'cumulus:CumulusLinux!' https://10.0.0.52:8080/acl/lasterror
Returns the result:
{"returncode": 255, "lines": [...]}
The REST API is intended to be used by automation systems and so syntax problems with the ACLs they generate should be rare and are the result of a software bug. A controller using this API should check responses for the presence of the last error Warning, log the lasterror information so that the problem can be debugged, and finally delete all the rules created through the REST API to restore the system to its default state.


While this REST API could be used as a convenient way to manually push an ACL to a switch, the API is intended to be part of automation solutions that combine real-time traffic analytics with automated control. Cumulus Linux includes standard sFlow measurement support, delivering real-time network wide visibility to drive solutions that include: DDoS mitigation, enforcing black lists, marking large flows, ECMP load balancing, packet brokers etc.

The acl_server functionality demonstrates the value of the open Linux environment exposed by Cumulus Linux, making it easy to extend the platform using standard Linux tools in order to address operational requirements. Download the free Cumulus Linux VX virtual machine to try these examples yourself.

Monday, November 13, 2017

Real-time WiFi heat map

Real-time Wifi-Traffic Heatmap (source code GitHub: cod3monk/showfloor-heatmap) displays real-time WiFi traffic from SC17 (The International Conference for High Performance Computing, Networking, Storage and Analysis, November 12-17, 2017). Click on the link to see live data.

The Cisco Wireless access points in the conference network don't currently support sFlow, however, the access points are connected to Juniper EX switches which stream sFlow telemetry to an instance of sFlow-RT analytics software that provides real-time usage metrics for the heat map.

Wireless describes the additional visibility delivered by sFlow capable wireless access points, including: air time, channel, retransmissions, receive / transmit speeds, power, signal to noise ratio, etc. With sFlow enabled wireless access points, additional information could be layered on the heat map. The sFlow.org web site lists network products and vendors that support the sFlow standard.

Tuesday, October 17, 2017

Arista EOS CloudVision

Arista EOS® CloudVision® provides a centralized point of visibility, configuration and control for Arista devices. The CloudVision controller is available as a virtual machine or physical appliance.


Fabric Visibility on Arista EOS Central describes how to use industry standard sFlow instrumentation in Arista switches to deliver real-time flow analytics. This article describes the steps needed to integrate flow analytics into CloudVision.

Log into the CloudVision node and run the following cvp_install_fabricview.sh script as root:
#!/bin/sh
# Install Fabric View on CloudVision Portal (CVP)

VER=`wget -qO - http://inmon.com/products/sFlow-RT/latest.txt`
wget http://www.inmon.com/products/sFlow-RT/sflow-rt-$VER.noarch.rpm
rpm --nodeps -ivh sflow-rt-$VER.noarch.rpm
/usr/local/sflow-rt/get-app.sh sflow-rt fabric-view

ln -s /cvpi/jdk/bin/java /usr/bin/java

sed -i '/^# http.hostname=/s/^# //' /usr/local/sflow-rt/conf.d/sflow-rt.conf
echo "http.html.redirect=./app/fabric-view/html/" >> /usr/local/sflow-rt/conf.d/sflow-rt.conf

cat <<EOT > /etc/nginx/conf.d/locations/sflow-rt.https.conf
location /sflow-rt/ {
  auth_request /aeris/auth;
  proxy_buffering off;
  proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for;
  proxy_set_header X-Forwarded-Prefix /sflow-rt/;
  proxy_set_header Host \$host;
  proxy_pass http://localhost:8008/;
  proxy_redirect ~^http://[^/]+(/.+)\$ /sflow-rt\$1;
}
EOT

systemctl restart nginx.service

firewall-cmd --zone public --add-port=6343/udp --permanent
firewall-cmd --reload

systemctl enable sflow-rt.service
systemctl start sflow-rt.service

wget http://www.inmon.com/products/sFlow-RT/cvp-eapi-topology.py
chmod +x cvp-eapi-topology.py

echo "configure and run cvp-eapi-topology.py"
Edit the cvp-api-topology.py script to specify CVP_USER and CVP_PASSWORD (and EAPI_USER and EAPI_PASSWORD if they differ). Now run the script to discover the physical topology and post it to Fabric View:
./cvp-eapi-topology.ph
Note: The script needs to be run any time the physical topology changes, or you can run the script periodically using cron.

Flow analytics requires sFlow to be enabled on all the switches, sending the data to the CloudVision node. This can be accomplished using a CloudVision configlet to push the configuration to switches. For example the following configuration enables sFlow on all switch ports and sends the data to CloudVision node 10.0.0.98:
sflow sample 20000
sflow polling-interval 30
sflow destination 10.0.0.98
sflow source-interface Management1
sflow run
Optionally, follow the steps in Arista EOS telemetry to enhance the sFlow telemetry stream from the switches with detailed CPU, memory, disk, and host network statistics.

Finally, access the Fabric View web interface at https://cloudvision/sflow-rt/ using your CloudVision login credentials.
Fabric View is an open source application running on top of the sFlow-RT analytics engine. The Fabric View software can easily be modified to add new capabilities, e.g. Black hole detection.

A number of applications are available for sFlow-RT. Writing Applications describes how to use sFlow-RT's APIs to extend or modify existing applications or develop new applications. In addition, there are also many sFlow-RT related articles on this blog. For example, Arista eAPI describes how to automatically push controls based on flow measurements, describing automated DDoS mitigation as a use case. Other use cases include: traffic engineering, traffic accounting, anomaly detection, intrusion detection, FIB optimization, targeted packet capture etc.

Wednesday, September 27, 2017

Real-time visibility and control of campus networks

Many of the examples on this blog describe network visibility driven control of data center networks. However, campus networks face many similar challenges and the availability of industry standard sFlow telemetry and RESTful control APIs in campus switches make it possible to apply feedback control.

HPE Aruba has an extensive selection of campus switches that combine programmatic control via a REST API with hardware sFlow support:
  • Aruba 2530 
  • Aruba 2540 
  • Aruba 2620
  • Aruba 2930F
  • Aruba 2930M
  • Aruba 3810
  • Aruba 5400R
  • Aruba 8400
 This article presents an example of implementing quota controls using HPE Aruba switches.
Typically, a small number of hosts are responsible for the majority of traffic on the network: identifying those hosts, and applying controls to their traffic to prevent them from unfairly dominating, ensures fair access to all users.

Peer-to-peer protocols (P2P) pose some unique challenges:
  • P2P protocols make use of very large numbers of connections in order to quickly transfer data. The large number of connections allows a P2P user to obtain a disproportionate amount of network bandwidth; even a small number of P2P users (less than 0.5% of users) can consume over 90% of the network bandwidth.
  • P2P protocols (and users) are very good at getting through access control lists (acl) by using non-standard ports, using port 80 (web) etc. Trying to maintain an effective filter to identify P2P traffic is a challenge and the resulting complex rule sets consume significant resources in devices attempting to perform classification.
In this example, all switches are configured to stream sFlow telemetry to an instance of the sFlow-RT real-time sFlow analyzer running a quota controller. When a host exceeds its traffic quota, a REST API call is made to the host's access switch, instructing the switch to mark the host's traffic as low priority. Marking the traffic ensures that if congestion occurs elsewhere in the network, typically on the Internet access links, priority queuing will cause marked packets can be dropped, reducing the bandwidth consumed by the marked host. The quota controller continues to monitor the host and the marking action is removed when the host's traffic returns to acceptable levels.

A usage quota is simply a limit on the amount of traffic that a user is allowed to generate in a given amount of time. Usage quotas have a number of attributed that make them an effective means of managing P2P activity:
  • A simple usage quota is easy to maintain and enforce and encourages users to be more responsible in their use of shared resources.
  • Since quota based controls are interested in the overall amount of traffic that a host generates and not the specific type of traffic, they don't encourage users to tailor P2P application setting to bypass access control rules and so their traffic is easier to monitor.
  • A quota system can be implemented using standard network hardware, without the addition of a "traffic shaping" appliance that can become a bottleneck and point of failure.
The following quota.js script implements the quota controller functionality:
var user = 'manager';
var password = 'manager';
var scheme = 'http://';
var mbps = 10;
var interval = 10;
var timeout = 20;
var dscp = '10';
var groups = {'ext':['0.0.0.0/0'],'inc':['10.0.0.0/8'],'exc':['10.1.0.0/16']};

function runCmd(agent,cmd) {
  var headers = {'Content-Type':'application/json','Accept':'application/json'};
  // create session
  var auth = http2({
    url:scheme+agent+'/rest/v3/login-sessions',
    operation:'post',
    headers:headers,
    body: JSON.stringify({userName:user, password:password})
  });
  headers['Cookie']=JSON.parse(auth.body).cookie;

  // make request
  var resp = http2({
    url:scheme+agent+'/rest/v3/cli',
    operation:'post',
    headers:headers,
    body: JSON.stringify({cmd:cmd})
  });
  var result = base64Decode(JSON.parse(resp.body).result_base64_encoded);

  // end session
  var end = http2({
    url:scheme+agent+'/rest/v3/login-sessions',
    operation:'delete',
    headers:headers
  });
  return result;
}

setGroups('site',groups);

setFlow('src', {
  keys:'ipsource',
  value:'bytes',
  filter:'direction=ingress&group:ipsource:site=inc&group:ipdestination:site=ext',
  t:interval
});

setThreshold('quota', {
  metric:'src',
  value:mbps*1000000/8,
  byFlow: true,
  timeout:timeout,
  filter:{ifspeed:[10000000,100000000,1000000000]}
});

var controls = {};
setEventHandler(function(evt) {
  var ip = evt.flowKey;
  if(controls[ip]) return;

  var agent = evt.agent;
  var ds = evt.dataSource;
  controls[ip] = {agent:agent,ds:ds,time:Date.now()};
  logInfo('mark '+ip+' agent '+agent);
  try { runCmd(agent,'qos device-priority '+ip+' dscp '+dscp); }
  catch(e) { logWarning('runCmd error ' + e); }
}, ['quota']);

setIntervalHandler(function() {
  for(var ip in controls) {
    var ctl = controls[ip];
    if(thresholdTriggered('quota',ctl.agent,ctl.ds+'.src',ip)) continue;

    logInfo('unmark '+ip+' agent '+ctl.agent);
    try { runCmd(ctl.agent,'no qos device-priority '+ip); }
    catch(e) { logWarning('runCmd error ' + e); }
    delete controls[ip];
  }
});
Some notes on the script:
  • Writing Applications provides an overview of the sFlow-RT scripting API.
  • HPE ArubaOS-Switch REST API and JSON Schema Reference Guide 16.03 describes the REST API calls in the runCmds() function.
  • The groups variable defines groups of IP addresses, identifying external addresses (ext), addresses included as candidates for the quota controller (inc), and addresses excluded from quota controls (exc).
  • Defining Flows describes the arguments to the setFlow() function. In this case, calculating a 10 second moving average of traffic from local sources (inc) to external destinations (ext).
  • The setThreshold() function creates a threshold that triggers if the moving average of traffic from an address exceeds 10Mbit/s. The thresholds are only applied to 10M, 100M and 1G access ports, ensuring that controls are only applied to access layer switches and not to 10G ports on aggregation and core switches.
  • The setEventHandler() function processes the events, keeping track of existing controls and implementing new DSCP marking rules.
  • The setIntervalHandler() function runs periodically and removes marking rules when they are not longer required (after the threshold timeout has expired).
  • The ArubaOS commands to add / remove DSCP marking for the host are highlighted in blue.
The easiest way to try out the script is to use Docker to run sFlow-RT:
docker run -v $PWD/cli.js:/sflow-rt/quota.js -e "RTPROP=-Dscript.file=quota.js" -p 6343:6343/udp -p 8008:8008 sflow/sflow-rt
2017-09-27T02:32:14+0000 INFO: Listening, sFlow port 6343
2017-09-27T02:32:14+0000 INFO: Listening, HTTP port 8008
2017-09-27T02:32:14+0000 INFO: quota.js started
2017-09-27T02:33:53+0000 INFO: mark 10.0.0.70 agent 10.0.0.232
2017-09-27T02:34:25+0000 INFO: unmark 10.0.0.70 agent 10.0.0.232
The output indicates that a control marking traffic from 10.0.0.70 was added to edge switch 10.0.0.232 and removed just over 30 seconds later.
The screen capture using Flow Trend to monitor the core switches shows the controller in action. The marking rule is added as soon as traffic from 10.0.0.70 exceeds the 10Mbps quota for 10 seconds (the DSCP marking shown in the chart changes from red be(0) to blue af11(10)). The marking rule is removed once the traffic returns to normal.
The quota settings in the demonstration were aggressive. In practice the threshold settings would be higher and the timeouts longer in order to minimize control churn. The trend above was collected from a university network of approximately 20,000 users that implemented a similar control scheme. In this case, the quota controller was able to consistently mark 50% of the traffic as low priority. Between 10 and 20 controls were in place at any given time and the controller made around 10 control changes an hour (adding or removing a control). During busy periods, congestion on the campus Internet access links was eliminated since marked traffic could be discarded when necessary.

Implementing usage quotas is just one example of applying measurement based control to a campus network. Other interesting applications include:
Real-time measurement and programmatic APIs are becoming standard features of campus switches, allowing visibility driven automatic control to adapt the network to changing network demands and security threats.

Wednesday, September 20, 2017

Flow Trend

The open source sflow-rt/flow-trend project displays a real-time trend chart of network traffic that updates every second. Defining Flows describes how to break out traffic by different traffic attributes, including: addresses, ports, VLANs, protocols, countries, DNS names, etc.
docker run -p 6343:6343/udp -p 8008:8008 sflow/flow-trend
The simplest way to run the software is using the docker. Configure network devices to send standard sFlow telemetry to Flow Trend. Access the web user interface on port 8008.