Monday, December 10, 2018

sFlow to JSON

The latest version of sflowtool can convert sFlow datagrams into JSON, making it easy to write scripts to process the standard sFlow telemetry streaming from devices in the network.

Download and compile the latest version of sflowtool:
git clone https://github.com/sflow/sflowtool.git
cd sflowtool/
./boot.sh 
./configure 
make
sudo make install
The -J option formats the JSON output to be human readable:
$ sflowtool -J
{
 "datagramSourceIP":"10.0.0.162",
 "datagramSize":"396",
 "unixSecondsUTC":"1544241239",
 "localtime":"2018-12-07T19:53:59-0800",
 "datagramVersion":"5",
 "agentSubId":"0",
 "agent":"10.0.0.231",
 "packetSequenceNo":"1068783",
 "sysUpTime":"1338417874",
 "samplesInPacket":"2",
 "samples":[
  {
   "sampleType_tag":"0:2",
   "sampleType":"COUNTERSSAMPLE",
   "sampleSequenceNo":"148239",
   "sourceId":"0:3",
   "elements":[
    {
     "counterBlock_tag":"0:1",
     "ifIndex":"3",
     "networkType":"6",
     "ifSpeed":"1000000000",
     "ifDirection":"1",
     "ifStatus":"3",
     "ifInOctets":"4162076356",
     "ifInUcastPkts":"16312256",
     "ifInMulticastPkts":"187789",
     "ifInBroadcastPkts":"2566",
     "ifInDiscards":"0",
     "ifInErrors":"0",
     "ifInUnknownProtos":"0",
     "ifOutOctets":"2115351089",
     "ifOutUcastPkts":"7087570",
     "ifOutMulticastPkts":"4453258",
     "ifOutBroadcastPkts":"6141715",
     "ifOutDiscards":"0",
     "ifOutErrors":"0",
     "ifPromiscuousMode":"0"
    },
    {
     "counterBlock_tag":"0:2",
     "dot3StatsAlignmentErrors":"0",
     "dot3StatsFCSErrors":"0",
     "dot3StatsSingleCollisionFrames":"0",
     "dot3StatsMultipleCollisionFrames":"0",
     "dot3StatsSQETestErrors":"0",
     "dot3StatsDeferredTransmissions":"0",
     "dot3StatsLateCollisions":"0",
     "dot3StatsExcessiveCollisions":"0",
     "dot3StatsInternalMacTransmitErrors":"0",
     "dot3StatsCarrierSenseErrors":"0",
     "dot3StatsFrameTooLongs":"0",
     "dot3StatsInternalMacReceiveErrors":"0",
     "dot3StatsSymbolErrors":"0"
    }
   ]
  },
  {
   "sampleType_tag":"0:1",
   "sampleType":"FLOWSAMPLE",
   "sampleSequenceNo":"11791",
   "sourceId":"0:3",
   "meanSkipCount":"2000",
   "samplePool":"34185160",
   "dropEvents":"0",
   "inputPort":"3",
   "outputPort":"10",
   "elements":[
    {
     "flowBlock_tag":"0:1",
     "flowSampleType":"HEADER",
     "headerProtocol":"1",
     "sampledPacketSize":"102",
     "strippedBytes":"0",
     "headerLen":"104",
     "headerBytes":"0C-AE-4E-98-0B-89-05-B6-D8-D9-A2-66-80-00-54-00-00-45-08-12-04-00-04-10-4A-FB-A0-00-00-BC-A0-00-00-EF-80-00-DE-B1-E7-26-00-20-75-04-B0-C5-00-00-00-00-96-01-20-00-00-00-00-00-01-11-21-31-41-51-61-71-81-91-A1-B1-C1-D1-E1-F1-02-12-22-32-42-52-62-72-82-92-A2-B2-C2-D2-E2-F2-03-13-23-33-43-53-63-73-1A-1D-4D-76-00-00",
     "dstMAC":"0cae4e980b89",
     "srcMAC":"05b6d8d9a266",
     "IPSize":"88",
     "ip.tot_len":"84",
     "srcIP":"10.0.0.203",
     "dstIP":"10.0.0.254",
     "IPProtocol":"1",
     "IPTOS":"0",
     "IPTTL":"64",
     "IPID":"8576",
     "ICMPType":"8",
     "ICMPCode":"0"
    },
    {
     "flowBlock_tag":"0:1001",
     "extendedType":"SWITCH",
     "in_vlan":"1",
     "in_priority":"0",
     "out_vlan":"1",
     "out_priority":"0"
    }
   ]
  }
 ]
}
The output shows the JSON representation of a single sFlow datagram containing one counter sample and one flow sample.

The -j option output formats the JSON output as a single line per datagram making the output easy to parse in scripts. For example, the following Python script, flow.py, runs sflowtool and parses the JSON output:
#!/usr/bin/env python

import subprocess
from json import loads

p = subprocess.Popen(
  ['/usr/local/bin/sflowtool','-j'],
  stdout=subprocess.PIPE,
  stderr=subprocess.STDOUT
)
lines = iter(p.stdout.readline,'')
for line in lines:
  datagram = loads(line)
  localtime = datagram["localtime"]
  samples = datagram["samples"]
  for sample in samples:
    sampleType = sample["sampleType"]
    elements = sample["elements"]
    if sampleType == "FLOWSAMPLE":
      for element in elements:
        tag = element["flowBlock_tag"]
        if tag == "0:1":
          try:
            src = element["srcIP"]
            dst = element["dstIP"]
            pktsize = element["sampledPacketSize"]
            print "%s %s %s %s" % (localtime,src,dst,pktsize)
          except KeyError:
            pass
Running the script prints flow records showing time, source, destination and number of bytes:
$ ./flow.py 
2018-12-07T20:53:06-0800 10.0.0.70 10.0.0.238 110
2018-12-07T20:53:06-0800 10.0.0.70 10.0.0.238 70
2018-12-07T20:53:06-0800 10.0.0.70 10.0.0.238 70
2018-12-07T20:53:06-0800 10.0.0.238 10.0.0.70 90
The script can easily be modified to add additional fields, push data into an SIEM tool (e.g. Logstash), push counter data into a time series database (e.g. InfluxDB), or perform additional analysis in Python. For example, the following script builds on the example, downloading the Emerging Threats compromised address list and logging any flows that match the list:
#!/usr/bin/env python

import subprocess
from json import loads
from requests import get

blacklist = set()
r = get('https://rules.emergingthreats.net/blockrules/compromised-ips.txt')
for line in r.iter_lines():
  blacklist.add(line)

p = subprocess.Popen(
  ['/usr/local/bin/sflowtool','-j'],
  stdout=subprocess.PIPE,
  stderr=subprocess.STDOUT
)
lines = iter(p.stdout.readline,'')
for line in lines:
  datagram = loads(line)
  localtime = datagram["localtime"]
  samples = datagram["samples"]
  for sample in samples:
    sampleType = sample["sampleType"]
    elements = sample["elements"]
    if sampleType == "FLOWSAMPLE":
      for element in elements:
        tag = element["flowBlock_tag"]
        if tag == "0:1":
          try:
            src = element["srcIP"]
            dst = element["dstIP"]
            if src in blacklist or dst in blacklist:
              print "%s %s %s" % (localtime,src,dst)
          except KeyError:
            pass
The open source Host sFlow agent provides a convenient means of experimenting with sFlow if you don't have access to network devices. The Host sFlow agent is also a simple way to gather real-time telemetry from public cloud virtual machine instances where access to the physical network infrastructure is not permitted.

Finally, for advanced sFlow analytics, try sFlow-RT, a real-time analytics engine that exposes a REST API.

Thursday, November 15, 2018

Mininet, ONOS, and segment routing

Leaf and spine traffic engineering using segment routing and SDN and CORD: Open-source spine-leaf Fabric describe a demonstration at the 2015 Open Networking Summit using the ONOS SDN controller and a physical network of 8 switches.

This article will describe how to emulate a leaf and spine network using Mininet and configure the ONOS segment routing application to provide equal cost multi-path (ECMP) routing of flows across the fabric. The Mininet Dashboard application running on the sFlow-RT real-time analytics platform is used to provide visibility into traffic flows across the emulated network.

First, run ONOS using Docker:
docker run --name onos --rm -p 6653:6653 -p 8181:8181 -d onosproject/onos
Use the graphical interface, http://onos:8181, to enable the OpenFlow Provider Suite, Network Config Host Provider, Network Config Link Provider, and Segment Routing applications. The screen shot above shows the resulting set of enabled services.

Next, install sFlow-RT and the Mininet Dashboard application on host with Mininet:
wget https://inmon.com/products/sFlow-RT/sflow-rt.tar.gz
tar -xvzf sflow-rt.tar.gz
./sflow-rt/get-app.sh sflow-rt mininet-dashboard
Start sFlow-RT:
./sflow-rt/start.sh
Download the sr.py script:
wget https://raw.githubusercontent.com/sflow-rt/onos-sr/master/sr.py
Start Mininet:
sudo env ONOS=10.0.0.73 mn --custom sr.py,sflow-rt/extras/sflow.py \
--link tc,bw=10 --topo=sr '--controller=remote,ip=$ONOS,port=6653'
The sr.py script is used to create a leaf and spine topology in Mininet and send the network configuration to the ONOS controller. The sflow.py script enables sFlow monitoring of the switches and sends the network topology to sFlow-RT.

The leaf and spine topology will appear in the ONOS web interface.
The topology will also appear in the Mininet Dashboard application:
Run an iperf test using the Mininet cli:
mininet> iperf h1 h3
The path that the traffic takes is highlighted on the Mininet Dashboard topology:
In this case the traffic flowed between leaf1 and leaf2 via spine1. Since ONOS segment routing uses equal cost multi-path routing, subsequent iperf tests may take the alternative via spine2.
Switch to the Charts tab to see traffic trend charts. In this case, the trend charts show the results of six iperf tests. The Traffic chart shows the top flows and the Topology charts show the busy links and the network diameter.

See Writing Applications for an introduction to programming sFlow-RT's analytics engine. Mininet flow analytics provides a simple example of detecting large (elephant) flows.

Wednesday, November 14, 2018

Real-time visibility at 400 Gigabits/s

The chart above demonstrates real-time, up to the second, flow monitoring on a 400 gigabit per second link. The chart shows that the traffic is composed of four, roughly equal, 100 gigabit per second flows.

The data was gathered from The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18) being held this week in Dallas. The conference network, SCinet, is described as the fastest and most powerful network in the world.
This year, the SCinet network includes recently announced 400 gigabit switches from Arista networks, see Arista Introduces 400 Gigabit Platforms. Each switch delivers 32 400G ports in a 1U form factor.
NRE-36 University of Southern California network topology for SuperComputing 2018
The switches are part of 400G demonstration network connecting USC, Caltech and StarLight booths. The chart shows traffic on a link connecting the USC and Caltech booths.

Providing the visibility needed to manage large scale high speed networks is a significant challenge. In this example, line rate traffic of 80 million packets per second is being monitored on the 400G port. The maximum packet rate for 64 byte packets on a 400 Gigabit, full duplex, link is approximately 1.2 billion packet per second (600 million in each direction). Monitoring all 32 ports requires a solution that can handle over 38 billion packets per second.

In this case, industry standard sFlow instrumentation built into the Broadcom Tomahawk 3 ASICs in the Arista switches provides line rate visibility. Real-time sFlow telemetry from all ports on all switches in the network stream to a central sFlow analyzer that provides network wide visibility. The overall bandwidth capacity delivered to SC18 exhibitors is 9.322 terabits per second.
The chart was generated using the open source Flow Trend application running on sFlow-RT. The sFlow-RT analytics software takes streaming sFlow telemetry from all the devices in the network, providing real-time visibility to orchestration, DevOps and SDN systems.

Wednesday, October 3, 2018

Ryu measurement based control

ONOS measurement based control describes how real-time streaming telemetry can be used to automatically trigger SDN controller actions. The article uses DDoS mitigation as an example.

This article recreates the demonstration using the Ryu SDN framework and emulating a network using Mininet. Install both pieces of software on a Linux server or virtual machine in order to follow this example.

Start Ryu with the simple_switch and ryu.app.ofctl_rest applications loaded:
ryu-manager ryu.app.simple_switch,ryu.app.ofctl_rest
Note: The simple_switch and ofctl_rest scripts are part of a standard Ryu installation.
This demonstration uses the sFlow-RT real-time analytics engine to process standard sFlow streaming telemetry from the network switches.

Download sFlow-RT:
wget https://inmon.com/products/sFlow-RT/sflow-rt.tar.gz
tar -xvzf sflow-rt.tar.gz
Install the Mininet Dashboard application:
sflow-rt/get-app.sh sflow-rt mininet-dashboard
The following script, ryu.js, implements the DDoS mitigation function described in the previous article:
var ryu = '127.0.0.1';
var controls = {};

setFlow('udp_reflection',
 {keys:'ipdestination,udpsourceport',value:'frames'});
setThreshold('udp_reflection_attack',
 {metric:'udp_reflection',value:100,byFlow:true,timeout:2});

setEventHandler(function(evt) {
 // don't consider inter-switch links
 var link = topologyInterfaceToLink(evt.agent,evt.dataSource);
 if(link) return;

 // get port information
 var port = topologyInterfaceToPort(evt.agent,evt.dataSource);
 if(!port) return;

 // need OpenFlow info to create Ryu filtering rule
 if(!port.dpid || !port.ofport) return;

 // we already have a control for this flow
 if(controls[evt.flowKey]) return;

 var [ipdestination,udpsourceport] = evt.flowKey.split(',');
 var msg = {
  priority:40000,
  dpid:parseInt(port.dpid,16),
  match: {
   in_port:port.ofport,
   dl_type:0x800,
   nw_dst:ipdestination+'/32',
   nw_proto:17,
   tp_src:udpsourceport 
  }
 };

 var resp = http2({
  url:'http://'+ryu+':8080/stats/flowentry/add',
  headers:{'Content-Type':'application/json','Accept':'application/json'},
  operation:'post',
  body: JSON.stringify(msg)
 });

 controls[evt.flowKey] = {
  time:Date.now(),
  threshold:evt.thresholdID,
  agent:evt.agent,
  metric:evt.dataSource+'.'+evt.metric,
  msg:msg
 };

 logInfo("blocking " + evt.flowKey);
},['udp_reflection_attack']);

setIntervalHandler(function() {
 var now = Date.now();
 for(var key in controls) {
  let rec = controls[key];

  // keep control for at least 10 seconds
  if(now - rec.time < 10000) continue;
  // keep control if threshold still triggered
  if(thresholdTriggered(rec.threshold,rec.agent,rec.metric,key)) continue;

  var resp = http2({
   url:'http://'+ryu+':8080/stats/flowentry/delete',
   headers:{'Content-Type':'application/json','Accept':'application/json'},
   operation:'post',
   body: JSON.stringify(rec.msg)
  });

  delete controls[key];

  logInfo("unblocking " + key);
 }
});
Some notes on the script:
  1. The Ryu ryu.app.ofctl_rest is used to add/remove filters that block the DDoS traffic
  2. The udp_reflection flow definition is designed to detect UDP amplification attacks, e.g. DNS amplification attacks
  3. Controls are applied to the switch port where traffic enters the network
  4. The controls structure is used to keep track of state associated with deployed configuration changes so that they can be undone
  5. The intervalHandler() function is used to automatically release controls after 10 seconds - the timeout is short for the purposes of demonstration, in practical deployments the timeout would be much measured in hours
  6. For simplicity, this script is missing the error handling needed for production use.
  7. See Writing Applications for more information.
Run the following command to start sFlow-RT and run the ryu.js script:
./sflow-rt/start.sh -Dscript.file=../ryu.js
We are going to use hping3 to simulate a DDoS attack, so install the software using the following command:
sudo apt install hping3
Next, start Mininet:
sudo mn --custom sflow-rt/extras/sflow.py --link tc,bw=10 \
--controller=remote,ip=127.0.0.1 --topo tree,depth=2,fanout=2
Generate normal traffic between hosts h1 and h3:
mininet> iperf h1 h3
The weathermap view shows the flow crossing the network from switch s2 to s3 via s1.
Generate an attack:
mininet> h1 hping3 --flood --udp -k -s 53 h3
The weathermap view verifies that the attack has been successfully blocked since none of the traffic is seen traversing the network.

The chart at the top of this article shows the iperf test followed by the simulated attack. The top chart shows the top flows entering the network, showing the DNS amplification attack traffic in blue. The middle chart shows traffic broken out by switch port. Here, the blue line shows the attack traffic arriving at switch s2 port s2-eth1 while the red line shows that only a small amount of traffic is forwarded to switch s3 port s3-eth3 before the attack is blocked at switch s2 by the controller.

Mininet with Ryu and sFlow-RT is a great way to rapidly develop and test SDN applications, avoiding the time and expense involved in setting up a physical network. The application is easily moved from the Mininet virtual network to a physical network since it is based on the same industry standard sFlow telemetry generated by physical switches. In this case, using commodity switch hardware to cost effectively detect and filter massive (100's of Gbit/s) DDoS attacks.

Note: Northbound Networks Zodiac GX is an inexpensive gigabit switch that provides a convenient way to transition from an emulated Mininet environment to a physical network handling real traffic.

Monday, October 1, 2018

Systemd traffic marking

Monitoring Linux services describes how the open source Host sFlow agent exports metrics from services launched using systemd, the default service manager on most recent Linux distributions. In addition, the Host sFlow agent efficiently samples network traffic using Linux kernel capabilities: PCAP/BPF, nflog, and ulog.

This article describes a recent extension to the Host sFlow systemd module, mapping sampled traffic to the individual services the generate or consume them. The ability to color traffic by application greatly simplifies service discovery and service dependency mapping; making it easy to see how services communicate in a multi-tier application architecture.

The following /etc/hsflowd.conf file configures the Host sFlow agent, hsflowd, to sampling packets on interface eth0, monitor systemd services and mark the packet samples, and track tcp performance:
sflow {
  collector { ip = 10.0.0.70 }
  pcap { dev = eth0 }
  systemd { markTraffic = on }
  tcp { }
}
The diagram above illustrates how the Host sFlow agent is able to efficiently monitor and classify traffic. In this case both the Host sFlow agent and an Apache web server are are running as services managed by systemd. A network connection , shown in red, to the HTTP service. In this case, configuring the pcap module to monitor interface eth0 on the server programs a Berkeley Packet Filter (BPF) that randomly samples packets in the Linux kernel and provides copies (shown as the dotted red line) to the Host sFlow agent. In addition, the Host sFlow agent queries systemd to obtain a list of running services and the resources allocated to them. Further Linux kernel tables are queried to identify the network sockets that were opened by each service.

The Host sFlow then attaches an additional record to exported packet samples to indicate the services generating or consuming the packets:
/* Traffic source/sink entity reference */
/* opaque = flow_data; enterprise = 0; format = 2210 */
/* Set Data source to all zeroes if unknown
struct extended_entities {
 sflow_data_source_expanded src_ds;    /* Data Source associated with
                                          packet source */
 sflow_data_source_expanded dst_ds;    /* Data Source associated with
                                          packet destination */
}
Note: The data source references point to the performance metrics exported by the systemd module, see Monitoring Linux services.

Finally, enabling the tcp module adds delay, retransmit, loss, and reordering information to the sampled packet, see Network performance monitoring.
The screen capture above shows network traffic colored by service name. The chart colors traffic associated with the httpd.service in blue, remote login traffic associated with the sshd.service in red, BGP traffic associated with the bird.service in gold, and traffic to the inmsfd.service in green.

The chart was generated using the open source Flow Trend application running on the sFlow-RT real-time analytics platform. The chart is the result of the following flow definition:
host:[or:dssource:dsdestination]:vir_host_name
The host: function is used to join information from the sampled flow with telemetry reported for each of the services. Additional keys can be added to the flow definition to break out the traffic by network addresses, quality of service, or any of the many properties reported by sFlow, see Defining Flows for additional information.

Networking on the host has been referred to as the "Goldilocks Zone" because the host provides context that is unavailable in network switches and routers. The sFlow standard defines measurements that network, host, and application entities send in a continuous telemetry stream to analytics software that can combine the data to provide a comprehensive end-to-end view of activity, see sFlow Host Structures.

Tuesday, September 25, 2018

Microsoft Office 365

Office 365 IP Address and URL Web service describes a simple REST API that can be used to query for the IP address ranges associated with Microsoft Office 365 servers.

This information is extremely useful, allowing traffic analytics software to combine telemetry obtained from network devices with information obtained using the Microsoft REST API  in order to identifying clients, links, and devices carrying the traffic, as well as any issues, such as link errors, and congestion,  that may be impacting performance.
The sFlow-RT analytics engine is programmable and includes a REST client that can be used to query the Microsoft API and combine the information with industry standard sFlow telemetry from network devices. The following script, office365.js, provides a simple example:
var api = 'https://endpoints.office.com/endpoints/worldwide';

function uuidv4() {
  return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c) {
    var r = Math.random() * 16 | 0, v = c == 'x' ? r : (r & 0x3 | 0x8);
    return v.toString(16);
  });
}

var reqid = uuidv4();

function updateAddressMap() {
  var res, i, ips, id, groups;
  try { res = http(api+'?clientrequestid='+reqid); }
  catch(e) { logWarning('request failed ' + e); }
  if(res == null) return;
  res = JSON.parse(res);
  groups = {};
  for(i = 0; i < res.length; i++) {
    ips = res[i].ips;
    id = res[i].id;
    if(ips && id) groups[id] = ips;
  }
  setGroups('o365',groups); 
}

updateAddressMap();

setIntervalHandler(function() {
  updateAddressMap(); 
},60*60*24);
Note: See Writing Applications for an introduction to sFlow-RT's scripting API.

The chart at the top of this page demonstrates how the address information can be used.  The screen capture shows real-time, up to the second, traffic flowing from Microsoft Office servers to local hosts. The open source Flow Trend application shown is easily launched using Docker:
docker run -v $PWD/office365.js:/sflow-rt/office365.js \
-e "RTPROP=-Dscript.file=office365.js" -p 6343:6343/udp -p 8008:8008 \
sflow/flow-trend
The application web interface is accessed on port 8008.

Type the following expression in the Keys: field to define the flow:
group:ipsource:o365,ipdestination
Note: See Defining Flows for details.

The sFlow-RT engine can be programmed to use the classified flow information in a variety of ways: pushing control actions to orchestration tools (e.g. OpenStack, Mesos, Docker Swarm, etc.) or SDN controllers (OpenDaylight, ONOS, Faucet, etc), generating metrics for DevOps tools (e.g. InfluxDB, Prometheus, etc.), and reporting policy violations to an SIEM tool (e.g. Splunk, Logstash, etc.).

Thursday, August 30, 2018

Northbound Networks Zodiac GX

Mininet is widely used to emulate software defined networks (SDNs). Mininet flow analytics describes how standard sFlow telemetry, from Open vSwitch used by Mininet emulate the network, provides feedback to an SDN controller, allowing the controller to adapt the network to changing traffic, for example, to mitigate a distributed denial of service (DDoS) attack.

Northbound Networks Zodiac GX is an inexpensive open source software based switch that is ideal for experimenting with software defined networking (SDN) in a physical network setting. The small fanless package makes the switch an attractive option for desktop use. The Zodiac GX is also based on Open vSwitch, making it easy to take SDN control strategies developed on Mininet.
Enabling sFlow on the Zodiac GX is easy, navigate to the System>Startup page and add the following line to the end of the startup script (before the exit 0 line):
ovs-vsctl -- --id=@sflow create sflow agent=$OVS_BR target=$IP_CONTROLLER_1 sampling=100 polling=10 -- set bridge $OVS_BR sflow=@sflow
Reboot the switch for the changed to take effect.

Use sflowtool to verify that sFlow is arriving at the controller host and to examine the contents of the telemetry stream. Running sflowtool using Docker is a simple alternative to building the software from sources:
docker run --rm -p 8008:8008 -p 6343:6343/udp sflow/sflowtool
The text output from sflowtool can be piped into scripts to perform basic sFlow analysis.
A graphical sFlow analyzer performs the analysis tasks for you. The screen shot above shows sFlowTrend, a free sFlow analyzer that displays traffic trends. The software can be downloaded and installed or run using Docker:
docker run --rm -p 6343:6343/udp -p 8087:8087 -p 8443:8443 sflow/sflowtrend
The sFlowTrend charts update every minute. This is generally fast enough for human consumption, but real-time, up to the second, visibility is critical for SDN use cases.
The screen shot from Flow Trend shows an up to the second view of traffic. The spike in traffic is due to a 4K video being streamed from YouTube. The following command runs the software:
docker run --rm -p 6343:6343/udp -p 8008:8008 sflow/flow-trend
Flow Trend is an application running on the sFlow-RT real-time analytics platform.
Applications running on the sFlow-RT platform deliver real-time visibility to SDN, DevOps and Orchestration stacks, enabling new classes of performance aware application such as load balancing, DDoS mitigation, and workload placement.

RYU provides a framework that can be used to develop SDN applications in Python. For example, the following command runs the simple learning bridge application that ships with RYU:
docker run -it --rm -p 6633:6633 osrg/ryu ryu-manager --verbose ryu/ryu/app/simple_switch_13.py
As soon as the switch connects to the controller, you should see a flurry of events as the controller programs flows on the Zodiac GX switch.

Faucet is an SDN controller for production networks implemented using RYU. Before we can use Faucet, we need to gather basic OpenFlow information from the switch.
$ ssh -t admin@10.0.0.230 "sudo ovs-ofctl show ovslan"
admin@10.0.0.230's password: 
Password: 
OFPT_FEATURES_REPLY (xid=0x2): dpid:000044d1fa6291b2
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 1(eth0.1): addr:44:d1:fa:62:91:b2
     config:     0
     state:      STP_FORWARD
     current:    1GB-FD AUTO_NEG
     speed: 1000 Mbps now, 0 Mbps max
 2(eth0.2): addr:44:d1:fa:62:91:b2
     config:     0
     state:      STP_FORWARD
     current:    1GB-FD AUTO_NEG
     speed: 1000 Mbps now, 0 Mbps max
 3(eth0.3): addr:44:d1:fa:62:91:b2
     config:     0
     state:      STP_FORWARD
     current:    1GB-FD AUTO_NEG
     speed: 1000 Mbps now, 0 Mbps max
 4(eth0.4): addr:44:d1:fa:62:91:b2
     config:     0
     state:      STP_FORWARD
     current:    1GB-FD AUTO_NEG
     speed: 1000 Mbps now, 0 Mbps max
 5(eth0.5): addr:44:d1:fa:62:91:b2
     config:     0
     state:      STP_FORWARD
     current:    1GB-FD AUTO_NEG
     speed: 1000 Mbps now, 0 Mbps max
 LOCAL(ovslan): addr:44:d1:fa:62:91:b2
     config:     0
     state:      0
     current:    1GB-FD AUTO_NEG
     speed: 1000 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
Connection to 10.0.0.230 closed.
Update August 30, 2018: The only piece of information needed to construct the faucet config file below is the switch dpid. The Zodiac GX uses the MAC address of the switch as the dpid, so you can simply read the MAC address printed on a label on the bottom of the switch.
Now create a directory called faucet that contains the initial Faucet configuration file, faucet.yaml:
vlans:
    office:
        vid: 100
        description: "office network"

dps:
    zodiac:
        dp_id: 0x000044d1fa6291b2
        hardware: "ZodiacGX"
        interfaces:
            1:
                name: "eth0.1"
                description: "port1"
                native_vlan: office
            2:
                name: "eth0.2"
                description: "port2"
                native_vlan: office
            3:
                name: "eth0.3"
                description: "port3"
                native_vlan: office
            4:
                name: "eth0.4"
                description: "port4"
                native_vlan: office
            5:
                name: "eth0.5"
                description: "port5"
                native_vlan: office
            0xfffffffe:
                name: "ovslan"
                description: "local"
                native_vlan: office
Now run Faucet:
docker run -it --rm -v $PWD/faucet/:/etc/faucet/ -v $PWD/faucet/:/var/log/faucet/ -p 6633:6653 -p 9302:9302 faucet/faucet
As soon as the switch connects to the controller, you should see events logged to the faucet.log file in the same directory as the faucet.yaml configuration file.

Faucet Documentation describes how to extend the configuration to include firewall, routing, segmentation, and network function virtualization (NFV) rules to the configuration.

The next step is integrating sFlow analytics with the controller. Writing Applications describes how write sFlow-RT applications use REST API and embedded JavaScript API. The document includes Python examples that could be embedded in RYU controller applications. Alternatively, sFlow-RT's embedded HTTP client can be used to push control actions to an SDN controller, see ONOS measurement based control for an example.

The sFlow telemetry stream contains detailed Open vSwitch performance metrics in addition to flow and interface counter data. The sFlow-RT analytics pipeline can be programmed to generate and push statistics to time series databases and dashboards, see Prometheus and Grafana and InfluxDB and Grafana.

Exporting events using syslog describes how sFlow-RT can be programmed to detect and report on traffic anomalies, sending events to Security Information and Event Management (SIEM) tools, using Splunk and Logstash as examples.

The sflow/sflow-rt Docker image provides a convenient means of developing and deploying sFlow-RT applications alongside the SDN controllers demonstrated in this article.

An important benefit of sFlow telemetry is that it decouples monitoring from the control plane. You are free to change SDN controllers, use distributed routing / switching protocols, move between network operating systems, or build your own control plane while maintaining the same level of visibility. Industry standard sFlow is widely supported by vendors, including: A10, Aerohive, ALUe, Allied Telesis, Arista, Aruba, Big Switch, Broadcom, Cisco, Cumulus, Dell, D-Link, Edge-Core, Extreme, F5, Fortinet, Huawei, IP Infusion, Juniper, Mellanox, Netgear, OpenSwitch, Pica8, Proxim, Quanta, SMC, ZTE, and ZyXEL.

Monday, August 20, 2018

RDMA over Converged Ethernet (RoCE)

RDMA over Converged Ethernet is a network protocol that allows remote direct memory access (RDMA) over an Ethernet network. One of the benefits running RDMA over Ethernet is the visibility provided by standard sFlow instrumentation embedded in the commodity Ethernet switches used to build data center leaf and spine networks where RDMA is most prevalent.

The sFlow telemetry stream includes packet headers, sampled at line rate by the switch hardware. Hardware packet sampling allows the switch to monitor traffic at line rate on all ports, keeping up with the high speed data transfers associated with RoCE.

The diagram above shows the packet headers associated with RoCEv1 and RoCEv2 packets. Decoding the InfiniBand Global Routing Header (IB GRH) and InfiniBand Base Transport Header (IB BTH) allows an sFlow analyzer to report in detail on RoCE traffic.
The sFlow-RT real-time analytics engine recently added support for RoCE by decoding InfiniBand Global Routing and InfiniBand Base Transport fields. The screen capture of the sFlow-RT Flow-Trend application shows traffic associated with an RoCEv2 connection between two hosts, 10.10.2.22 and 10.10.2.52. The traffic consists of SEND and ACK messages exchanged as part of a reliable connection (RC).

The standard sFlow instrumentation provides comprehensive network wide visibility into RoCE and all other applications sharing the network resources. Real-time visibility is an essential part of automating networks, providing the feedback needed to ensure that resources are efficiently allocated and rapidly identifying overloaded resources so that remediation action can be taken before significant service degradation occurs.

Thursday, July 19, 2018

ExtremeXOS 22.5.1 adds support Broadcom ASIC table utilization statistics

ExtremeXOS 22.5.1 is now available! describes added support in sFlow for "New data structures to support reporting on hardware/table utilization statistics." The feature is available on Summit X450-G2, X460-G2, X670-G2, X770, and ExtremeSwitching X440-G2, X870, X620, X690 series switches.

Figure 1 shows the packet processing pipeline of a Broadcom ASIC. The pipeline consists of a number of linked hardware tables providing bridging, routing, access control list (ACL), and ECMP forwarding group functions. Operations teams need to be able to proactively monitor table utilizations in order to avoid performance problems associated with table exhaustion.

Broadcom's sFlow specification, sFlow Broadcom Switch ASIC Table Utilization Structures, leverages the industry standard sFlow protocol to offer scaleable, multi-vendor, network wide visibility into the utilization of these hardware tables.

The following output from the open source sflowtool command line utility shows the raw table measurements (this is in addition to the extensive set of measurements already exported via sFlow by ExtremeXOS):
bcm_asic_host_entries 4
bcm_host_entries_max 8192
bcm_ipv4_entries 0
bcm_ipv4_entries_max 0
bcm_ipv6_entries 0
bcm_ipv6_entries_max 0
bcm_ipv4_ipv6_entries 9
bcm_ipv4_ipv6_entries_max 16284
bcm_long_ipv6_entries 3
bcm_long_ipv6_entries_max 256
bcm_total_routes 10
bcm_total_routes_max 32768
bcm_ecmp_nexthops 0
bcm_ecmp_nexthops_max 2016
bcm_mac_entries 3
bcm_mac_entries_max 32768
bcm_ipv4_neighbors 4
bcm_ipv6_neighbors 0
bcm_ipv4_routes 0
bcm_ipv6_routes 0
bcm_acl_ingress_entries 842
bcm_acl_ingress_entries_max 4096
bcm_acl_ingress_counters 68
bcm_acl_ingress_counters_max 4096
bcm_acl_ingress_meters 18
bcm_acl_ingress_meters_max 8192
bcm_acl_ingress_slices 3
bcm_acl_ingress_slices_max 8
bcm_acl_egress_entries 36
bcm_acl_egress_entries_max 512
bcm_acl_egress_counters 36
bcm_acl_egress_counters_max 1024
bcm_acl_egress_meters 18
bcm_acl_egress_meters_max 512
bcm_acl_egress_slices 2
bcm_acl_egress_slices_max 2
The sflowtool output is useful for troubleshooting and is easy to parse with scripts.

A convenient way to run sflowtool is to use Docker:
docker run -p 6343:6343/udp sflow/sflowtool

Ethernet Fabric Visibility

Ethernet Fabrics: Extreme Networks ExtremeFabric
Leaf and spine fabrics are challenging to monitor. The fabric spreads traffic across all the switches and links in order to maximize bandwidth. Unlike traditional hierarchical network designs, where a small number of links can be monitored to provide visibility, a leaf and spine network has no special links or switches where running CLI commands or attaching a probe would provide visibility. Even if it were possible to attach probes, the effective bandwidth of a leaf and spine network can be as high as a Petabit/second, well beyond the capabilities of current generation monitoring tools.

Fabric View solves the visibility challenge by using the industry standard sFlow instrumentation built into data center switches. Fabric View represents the fabric as if it were a single large chassis switch, treating each leaf switch as a line card and the spine switches as the backplane. The result is an intuitive tool that is easily understood by anyone familiar with traditional networks.

A demonstration can be run using Docker:
docker run --entrypoint /sflow-rt/run_demo.sh -p 8008:8008 sflow/fabric-view
Access the web interface on port 8008.
The first chart shows the largest TCP/UDP flows traversing the fabric (calculated from a continues stream of packet samples received from all the switches in the fabric). The chart updates every second, providing a real-time view of traffic crossing the fabric.
The last two charts are based on the hardware/table utilization statistics that are now implemented in ExtremeXOS, trending the maximum utilization of each table across all the switches in the fabric.

sFlow-RT

FabricView is one of a number of applications developed for sFlow-RT. Examples include: DDoS mitigation, Internet routing using top of rack switches, and other articles on this blog.
The sFlow-RT analytics engine receives a continuous telemetry stream from sFlow Agents embedded in network devices, hosts and applications and converts them into actionable metrics, accessible through APIs. Applications can be external, written in any language that supports HTTP/REST calls, or internal, using sFlow-RT's embedded JavaScript/ECMAScript.

Monday, July 16, 2018

Visualizing real-time network traffic flows at scale

Particle has been released on GitHub, https://github.com/sflow-rt/particle. The application is a real-time visualization of network traffic in which particles flow between hosts arranged around the edges of the screen. Particle colors represent different types of traffic.

Particles provide an intuitive representation of network packets transiting the network from source to destination. The animation slows time so that the particle takes 10 seconds (instead of milliseconds) to transit the network. Groups of particles traveling the same path represent flows of packets between the hosts. Particle size and frequency are used to indicate the intensity of the traffic flowing on a path.

Particles don't follow straight lines, instead following quadratic Bézier curves around the center of the screen. Warping particle paths toward the center of the screen ensures that all paths are of similar length and visible - even if the start and end points are on the same axis.

The example above is from a site with over 500 network switches carrying hundreds of Gigabits of traffic. Internet, Customer, Site and Datacenter hosts have been assigned to the North, East, South and West sides respectively.
The screen is updated 60 times per second for smooth animation. Active flow metrics are updated every second. Hovering over the screen freezes the animation, highlights the nearest particle, and displays details.

To try out the software, first create a configuration file to label axes and assign addresses for your network.
particle.axisN=Internet
particle.cidrN=0.0.0.0/0
particle.axisS=Site
particle.cidrS=10.1.1.0/24,10.1.2.0/24
particle.axisE=Datacenter
particle.cidrE=10.2.0.0/16
particle.axisW=Remote
particle.cidrW=10.3.0.0/16
The above, particle.conf file, provides an example.

The simplest way to run the software is to use the pre-built Docker image.
docker run -p 8008:8008 -p 6343:6343/udp \
-v $PWD/particle.conf:/sflow-rt/particle.conf \
-e "RTPROP=-Dsystem.propertyFiles=particle.conf" \
sflow/particle
Access the web interface on port 8008.
The Docker image also contains a random simulation of flows to demonstrate the software:
docker run -e "RTPROP=-Dparticle.demo=yes" \
-p 6343:6343/udp -p 8008:8008 sflow/particle
This particle visualization was inspired by experiments with Vizceral, see Real-time traffic visualization using Netflix Vizceral. Vizceral focuses on interactions between layered microservices.

Visualizing network traffic unique challenges that needed to be addressed. For example, in these examples the North, Internet, axis (0.0.0.0/0) represents over 4 billion hosts - a number far greater than the number of pixels available on the screen. Instead of trying to represent each host individually, hosts are assigned a position proportional to their location in the range. For example, host 120.0.0.0 is assigned a position half way along the axis. Assigning fixed positions to each host ensures that traffic between the hosts will always take the same path across the screen, making it easier to recognize patterns and identify changes.

Chances are you have network equipment that supports sFlow telemetry since the standard is widely supported by vendors, including: A10, Aerohive, ALUe, Allied Telesis, Arista, Aruba, Big Switch, Cisco, Cumulus, Dell, D-Link, Edge-Core, Extreme, F5, Fortinet, Huawei, IP Infusion, Juniper, Netgear, OpenSwitch, Pica8, Proxim, Quanta, SMC, ZTE, and ZyXEL. Give Particle a try and see how traffic flows on your network.

Wednesday, July 11, 2018

sFlow available on Juniper PTX series routers


sFlow functionality introduced on the PTX1000 and PTX10000 platforms—Starting in Junos OS Release 18.2R1, the PTX1000 and PTX10000 routers support sFlow, a network monitoring protocol for high-speed networks. With sFlow, you can continuously monitor tens of thousands of ports simultaneously. The mechanism used by sFlow is simple, not resource intensive, and accurate.  - New and Changed Features

The recent article, sFlow available on Juniper MX series routers, describes how Juniper is extending sFlow support to include routers to provide visibility across their entire range of switching and routing products.

Universal support for industry standard sFlow as a base Junos feature reduces the operational complexity and cost of network visibility for enterprises and service providers. Real-time streaming telemetry from campus switches, routers, and data center switches, provides centralized, real-time, end-to-end visibility needed to troubleshoot, optimize, and account for network usage.

Analytics software is a critical factor in realizing the full benefits of sFlow monitoring. Choosing an sFlow analyzer discusses important factors to consider when selecting from the range of open source and commercial sFlow analysis tools.

Monday, April 9, 2018

SDKLT

Logical Table Software Development Kit (SDKLT) is a new, powerful, and feature rich Software Development Kit (SDK) for Broadcom switches. SDKLT provides a new approach to switch configuration using Logical Tables.

Building the Demo App describes how to get started using a simulated Tomahawk device. Included, is a CLI that can be used to explore tables. For example, the following CLI output shows the attributes of the sFlow packet sampling table:
BCMLT.0> lt list -d MIRROR_PORT_ENCAP_SFLOW
MIRROR_PORT_ENCAP_SFLOW
  Description: The MIRROR_PORT_ENCAP_SFLOW logical table is used to specify
               per-port sFlow encapsulation sample configuration.
  11 fields (1 key-type field):
    SAMPLE_ING_FLEX_RATE
        Description: Sample ingress flex sFlow packet if the generated sFlow random
                     number is greater than the threshold. A lower threshold leads to
                     higher sampling frequency.
    SAMPLE_EGR_RATE
        Description: Sample egress sFlow packet if the generated sFlow random number is
                     greater than the threshold. A lower threshold leads to
                     higher sampling frequency.
    SAMPLE_ING_RATE
        Description: Sample ingress sFlow packet if the generated sFlow random number is
                     greater than the threshold. A lower threshold leads to
                     higher sampling frequency.
    SAMPLE_ING_FLEX_MIRROR_INSTANCE
        Description: Enable to copy ingress flex sFlow packet samples to the ingress
                     mirror member using the sFlow mirror instance configuration.
    SAMPLE_ING_FLEX_CPU
        Description: Enable to copy ingress flex sFlow packet samples to CPU.
    SAMPLE_ING_MIRROR_INSTANCE
        Description: Enable to copy ingress sFlow packet samples to the ingress
                     mirror member using the sFlow mirror instance configuration.
    SAMPLE_ING_CPU
        Description: Enable to copy ingress sFlow packet samples to CPU.
    SAMPLE_ING_FLEX
        Description: Enable to sample ingress port-based flex sFlow packets.
    SAMPLE_EGR
        Description: Enable to sample egress port-based sFlow packets.
    SAMPLE_ING
        Description: Enable to sample ingress port-based sFlow packets.
    PORT_ID
        Description: Logical port ID.
SDKLT is a part of the OpenNSL suite, which makes it possible for the development of open network operating system projects, including: Open Network Linux, OpenSwitch, and SONiC.
The network operating system bridges the gap between applications (BGP, SNMP, sFlow, etc.) and the low level hardware capabilities accessed through the SDK. For example, OpenSwitch describes how the open source Host sFlow agent uses Control Plane Services (CPS) and Open Compute Project (OCP) Switch Abstraction Interface (SAI) to configure hardware packet sampling via vendor specific SDKs (such as OpenNSL).

Friday, April 6, 2018

sFlow available on Juniper MX series routers

sFlow support on MX Series devices—Starting in Junos OS Release 18.1R1, you can configure sFlow technology (as a sFlow agent) on a MX Series device, to continuously monitor traffic at wire speed on all interfaces simultaneously. The sFlow technology is a monitoring technology for high-speed switched or routed networks.  - New and Changed Features

Understanding How to Use sFlow Technology for Network Monitoring on a MX Series Router lists the following benefits of sFlow Technology on a MX Series Router:
  • sFlow can be used by software tools like a network analyzer to continuously monitor tens of thousands of switch or router ports simultaneously.
  • Since sFlow uses network sampling (forwarding one packet from ‘n’ number of total packets) for analysis, it is not resource intensive (for example processing, memory and more). The sampling is done at the hardware application-specific integrated circuits (ASICs) and hence it is simple and more accurate.
With the addition of the MX series routers, Juniper now supports sFlow across its entire product range:
Universal support for industry standard sFlow as a base Junos feature reduces the operational complexity and cost of network visibility for enterprises and service providers. Real-time streaming telemetry from campus switches, routers, and data center switches, provides centralized, real-time, end-to-end visibility needed to troubleshoot, optimize, and account for network usage.
Analytics software is a critical factor in realizing the full benefits of sFlow monitoring. Choosing an sFlow analyzer discusses important factors to consider when selecting from the range of open source and commercial sFlow analysis tools.

Thursday, April 5, 2018

ONOS measurement based control

ONOS traffic analytics describes how to run the ONOS SDN controller with a virtual network created using Mininet. The article also showed how to monitor network traffic using industry standard sFlow instrumentation available in Mininet and in physical switches.
This article uses the same ONOS / Mininet test bed to demonstrate how sFlow-RT real-time flow analytics can be used to push controls to the network through the ONOS REST API.  Leaf and spine traffic engineering using segment routing and SDN used real-time flow analytics to load balance an ONOS controlled physical network. In this example, we will use ONOS to filter DDoS attack traffic on a Mininet virtual network.

The following sFlow-RT script, ddos.js, detects DDoS attacks and programs ONOS filter rules to block the attacks:
var user = 'onos';
var password = 'rocks';
var onos = '192.168.123.1';
var controls = {};

setFlow('udp_reflection',
 {keys:'ipdestination,udpsourceport',value:'frames'});
setThreshold('udp_reflection_attack',
 {metric:'udp_reflection',value:100,byFlow:true,timeout:2});

setEventHandler(function(evt) {
 // don't consider inter-switch links
 var link = topologyInterfaceToLink(evt.agent,evt.dataSource);
 if(link) return;

 // get port information
 var port = topologyInterfaceToPort(evt.agent,evt.dataSource);
 if(!port) return;

 // need OpenFlow info to create ONOS filtering rule
 if(!port.dpid || !port.ofport) return;

 // we already have a control for this flow
 if(controls[evt.flowKey]) return;

 var [ipdestination,udpsourceport] = evt.flowKey.split(',');
 var msg = {
  flows: [
   {
    priority:4000,
    timeout:0,
    isPermanent:true,
    deviceId:'of:'+port.dpid,
    treatment:[],
    selector: {
     criteria: [
      {type:'IN_PORT',port:port.ofport},
      {type:'ETH_TYPE',ethType:'0x800'},
      {type:'IPV4_DST',ip:ipdestination+'/32'},
      {type:'IP_PROTO',protocol:'17'},
      {type:'UDP_SRC',udpPort:udpsourceport} 
     ]
    }
   }
  ]
 };

 var resp = http2({
  url:'http://'+onos+':8181/onos/v1/flows?appId=ddos',
  headers:{'Content-Type':'application/json','Accept':'application/json'},
  operation:'post',
  user:user,
  password:password,
  body: JSON.stringify(msg)
 });

 var {deviceId,flowId} = JSON.parse(resp.body).flows[0];
 controls[evt.flowKey] = {
  time:Date.now(),
  threshold:evt.thresholdID,
  agent:evt.agent,
  metric:evt.dataSource+'.'+evt.metric,
  deviceId:deviceId,
  flowId:flowId
 };

 logInfo("blocking " + evt.flowKey);
},['udp_reflection_attack']);

setIntervalHandler(function() {
 var now = Date.now();
 for(var key in controls) {
   let rec = controls[key];

   // keep control for at least 10 seconds
   if(now - rec.time < 10000) continue;
   // keep control if threshold still triggered
   if(thresholdTriggered(rec.threshold,rec.agent,rec.metric,key)) continue;

   var resp = http2({
    url:'http://'+onos+':8181/onos/v1/flows/'
        +encodeURIComponent(rec.deviceId)+'/'+encodeURIComponent(rec.flowId),
    headers:{'Accept':'application/json'},
    operation:'delete',
    user:user,
    password:password
   });

   delete controls[key];

   logInfo("unblocking " + key);
 }
});
Some notes on the script:
  1. The ONOS REST API is used to add/remove filters that block the DDoS traffic.
  2. The controller address, 192.168.123.1, can be found on the ONOS Cluster Nodes web page.
  3. The udp_reflection flow definition is designed to detect UDP amplification attacks, e.g. DNS amplification attacks
  4. Controls are applied to the switch port where traffic enters the network
  5. The controls structure is used to keep track of state associated with deployed configuration changes so that they can be undone
  6. The intervalHandler() function is used to automatically release controls after 10 seconds - the timeout is short for the purposes of demonstration, in practical deployments the timeout would be much measured in hours
  7. For simplicity, this script is missing the error handling needed for production use. 
  8. See Writing Applications for more information.
We are going to use hping3 to simulate a DDoS attack, so install the software using the following command:
sudo apt install hping3
Run the following command to start sFlow-RT and run the ddos.js script:
env RTPROP=-Dscript.file=ddos.js ./start.sh
Next, start Mininet with ONOS:
sudo mn --custom ~/onos/tools/dev/mininet/onos.py,sflow-rt/extras/sflow.py \
--link tc,bw=10 --controller onos,1 --topo tree,2,2
Generate normal traffic between hosts h1 and h3:
mininet-onos> iperf h1 h3
The weathermap view above shows the flow crossing the network from switch s2 to s3 via s1.
Next, launch the simulated DNS amplification attack from h1 to h3:
mininet-onos> h1 hping3 --flood --udp -k -s 53 h3
The weathermap view verifies that the attack has been successfully blocked since none of the traffic is seen traversing the network.

The chart at the top of this article shows the iperf test followed by the simulated attack. The top chart shows the top flows entering the network, showing the DNS amplification attack traffic in blue. The middle chart shows traffic broken out by switch port. Here, the blue line shows the attack traffic arriving at switch s2 port s2-eth1 while the orange line shows that only a small amount of traffic is forwarded to switch s3 port s3-eth3 before the attack is blocked at switch s2 by the controller.

Mininet with ONOS and sFlow-RT is a great way to rapidly develop and test SDN applications, avoiding the time and expense involved in setting up a physical network. The application is easily moved from the Mininet virtual network to a physical network since it is based on the same industry standard sFlow telemetry generated by physical switches. In this case, using commodity switch hardware to cost effectively detect and filter massive (100's of Gbit/s) DDoS attacks.

Wednesday, April 4, 2018

ONOS traffic analytics

Open Network Operating System (ONOS) is "a software defined networking (SDN) OS for service providers that has scalability, high availability, high performance, and abstractions to make it easy to create applications and services." The open source project is hosted by the Linux Foundation.

Mininet and onos.py workflow describes how to run ONOS using the Mininet network emulator. Mininet allows virtual networks to be quickly constructed and is a simple way to experiment with ONOS. In addition, Mininet flow analytics describes how to enable industry standard sFlow streaming telemetry in Mininet, proving a simple way monitor traffic in the ONOS controlled network.

For example, the following command creates a Mininet network, controlled by ONOS, and monitored using sFlow:
sudo mn --custom ~/onos/tools/dev/mininet/onos.py,sflow-rt/extras/sflow.py \
--link tc,bw=10 --controller onos,1 --topo tree,2,2
The screen capture above shows the network topology in the ONOS web user interface.
Install Mininet dashboard to visualize the network traffic. The screen capture above shows a large flow over the same topology being displayed by ONOS, see Mininet weathermap for more examples.

In this case, the traffic was created by the following Mininet command:
mininet-onos> iperf h1 h3
The screen capture above shows top flows, busiest switch ports, and the diameter of the network topology.


The Mininet dashboard is a simple application running on the sFlow-RT analytics platform. For a more realistic example, watch the demonstration of SDN leaf and spine traffic engineering recorded at the Open Networking Summit. In the demonstration, a redundant pair of ONOS controllers implement segment routing, using OpenFlow 1.3 to control an eight switch leaf and spine network of commodity switches. Real-time flow analytics drives the dashboards in the demonstration and trigger load balancing of flows across the fabric. Leaf and spine traffic engineering using segment routing and SDN provides a more detailed explanation.

Mininet with ONOS and sFlow-RT is a great way to rapidly develop and test SDN applications, avoiding the time and expense involved in setting up a physical network.

Tuesday, April 3, 2018

Real-time baseline anomaly detection

The screen capture demonstrates the real-time baseline and anomaly detection based on industry standard sFlow streaming telemetry. The chart was generated using sFlow-RT analytics software. The blue line is an up to the second measure of traffic (measured in Bits per Second). The red and gold lines represent dynamic upper and lower limits calculated by the baseline function. The baseline function flags "high" and "low" value anomalies when values move outside the limits. In this case, a "low" value anomaly was flagged for the drop in traffic shown in the chart.

Writing Applications provides a general introduction to sFlow-RT programming. The baseline functionality is exposed through through the JavaScript API.

Create new baseline
baselineCreate(name,window,sensitivity,repeat);
Where:
  • name, name used to reference baseline.
  • window, the number of previous intervals to consider in calculating the limits.
  • sensitivity, the number of standard deviations used to calculate the limits.
  • repeat, the number of successive data points outside the limits before flagging anomaly 
In this example, baseline parameter values were window=180 (seconds), sensitivity=2, and repeat=3.

Update baseline
var status = baselineCheck(name,value);
Where:
  • status, "learning" while baseline is warming up (takes window intervals),  "normal" if value is in expected range, "low" if value is exceptionally low, "high" if value is exceptionally high.
  • value, latest value to check against baseline
The baselineCheck function is called periodically to update baseline statistics and check for anomalies.

Query baseline statistics
var {mean,variance,sdev,min,max} = baselineStatistics(name);
Note: Statistics are only available once the baseline has exited the "learning" status.

Reset baseline
baselineReset(name);
Resets the statistics and sets state to "learning"

Delete baseline
baselineDelete(name);
Delete the baseline and free up associated resources.

The sFlow-RT baseline functionality is designed to be resource efficient and to converge quickly so that large numbers of baselines can be created and updated for real-time anomaly detection.

The baseline functions work best when the variable being tracked represents the activity of a large population and is relatively stable. For example, WAN traffic is generally a good candidate for baselining since it is composed of the activity of many systems and users. On the other hand, individual host activity tends to be highly variable and not well suited to baseline monitoring.
The table from Baseline contrasts two methods of baseline calculation. The baseline functionality described in this article is an example of a temporal baseline. Cluster performance metrics describes how sFlow-RT can be used to calculate statistics from large populations of devices. These functions can be used for spatial baselining and anomaly detection, for example, by finding a virtual machine in a service pool that is behaving inconsistently when compared to its peers.