Wednesday, May 1, 2013

Software defined analytics

Figure 1: Performance aware software defined networking
Software defined networking (SDN) separates the network Data Plane and Control Plane, permitting external software to monitor and control network resources. Open Southbound APIs like sFlow and OpenFlow are an essential part of this separation, connecting network devices to external controllers, which in turn present high level Open Northbound APIs to SDN applications.

This article demonstrates the architectural similarities between OpenFlow and sFlow configuration and use within an SDN stack. Developers working in the SDN field are likely familiar with the configuration and use of OpenFlow and it is hoped that this comparison will be helpful as a way to understand how to incorporate sFlow measurement technology to create performance aware SDN solutions such as load balancing, DDoS protection and packet brokers.

OpenFlow and sFlow

In this example, Open vSwitch, Floodlight and sFlow-RT are used to demonstrate how switches are configured to use the OpenFlow and sFlow protocols to communicate with the centralized control plane. Next, representative Northbound REST API calls are used to illustrate how control plane software presents network wide visibility and control functionality to SDN applications.

1. Connect switches to control plane

Configure each switch to connect to the OpenFlow controller:
ovs-vsctl set-controller br0 tcp:10.0.0.1:6633
Similarly, configure each switch to send measurements to the sFlow analyzer:
ovs-vsctl -- --id=@sflow create sflow agent=eth0  target=\"10.0.0.1:6343\" sampling=1000 polling=20 -- -- set bridge br0 sflow=@sflow
2. REST APIs for network wide visibility and control

The following command uses the Floodlight static flow pusher API to set up a forwarding path:
curl -d '{"switch": "00:00:00:00:00:00:00:01", "name":"flow-mod-1", "cookie":"0", "priority":"32768", "ingressport":"1","active":"true", "actions":"output=2"}' http://10.0.0.1:8080/wm/core/staticflowentrypusher/json
The following command uses sFlow-RT's flow API to setup monitoring of TCP flows across all switches:
curl -H "Content-Type:application/json" -X PUT --data "{keys:'ipsource,ipdestination,tcpsourceport,tcpdestinationport', value:'bytes'}" http://10.0.0.1:8008/flow/tcp/json
Next, the following command finds the top TCP flow currently in progress anywhere in network:
curl http://10.0.0.162:8008/metric/ALL/tcp/json
[{
 "agent": "10.0.0.30",
 "dataSource": "2",
 "metricN": 14,
 "metricName": "incoming",
 "metricValue": 3.4061718002956964E7,
 "topKeys": [{
  "key": "10.0.0.52,10.0.0.54,80,52577",
  "updateTime": 1367092118446,
  "value": 3.4061718002956964E7
 }],
 "updateTime": 1367092118446
}]
The response doesn't just identify the flow, HTTP packets from a web server 10.0.0.52 to a client 10.0.0.54, it also identifies the switch and port carrying the traffic, information that would allow the OpenFlow controller to take action to rate limit, tap, re-route or block this traffic.
Figure 2: Rapidly detecting large flows, sFlow vs. NetFlow/IPFIX
The flexible Software Defined Analytics (SDA) functionality shown in this example is possible because the sFlow architecture shifts analytic functions to external software, relying on minimal core measurements embedded in the switch hardware data plane to deliver wire-speed performance. The simplicity and openness of the sFlow standard has resulted in widespread adoption in merchant silicon and by switch vendors.

In contrast, measurement technologies such as Cisco NetFlow and IPFIX perform traffic analysis using specialized hardware on the switch.  Configuring the hardware measurement features can be complex: for example, monitoring TCP flows using Cisco's Flexible NetFlow requires the following CLI commands:

1. define a flow record
flow record tcp-analysis
match transport tcp destination-port
match transport tcp source-port
match ipv4 destination address
match ipv4 source address
collect counter bytes
2. specify the collector
flow exporter export-to-server
destination 10.0.0.1
transport udp 9985
template data timeout 60
3. define a flow cache
flow monitor my-flow-monitor
record tcp-analysis
exporter export-to-server
cache timeout active 60
4. enable flow monitoring on each switch interface:
interface Ethernet 1/0
ip flow monitor my-flow-monitor input
...
interface Etherent 1/48
ip flow monitor my-flow-monitor input
5. For network wide visibility, go to step 1 and repeat for each switch in the network

Based on the architecture of on-switch flow analysis and this configuration example, it is apparent that there are limitations to this approach to monitoring, particularly in the context of software defined networking:
  1. Flexible NetFlow is complex to configure, see Complexity Kills.
  2. Configuration changes to switches are typically limited to infrequent maintenance windows making it difficult to deploy new measurements.
  3. Each flow cache (step 3 in the Flexible NetFlow configuration) consumes significant on-switch memory, limiting the number of simultaneous flow measurements that can be made, and taking memory that could be used for additional forwarding rules.
  4. Hardware differences mean that measurements are inconsistent between vendors, or even between different products from the same vendor, see Snowflakes, IPFIX, NetFlow and sFlow.
  5. Adding support for new protocols, like GRE, VXLAN etc. involves upgrading switch firmware and may require new hardware.
What about using OpenFlow counters to drive analytics? Since maintaining OpenFlow counters relies on switch hardware to decode packets and track flows, OpenFlow based traffic measurement shares many of the same limitations described for NetFlow/IPFIX, see Hey, You Darned Counters! Get Off My ASIC!

On the other hand, software defined analytics based on the sFlow standard is highly scaleable and extremely flexible. For example, adding an additional flow definition to report on tunneled traffic across the data center involves a single additional REST API call:
url -H "Content-Type:application/json" -X PUT --data "{keys:'stack,ipsource,ipdestination,ipsource.1,ipdestination.1', value:'bytes'}" http://10.0.0.1:8008/flow/stack/json
The following command retrieves the top tunneled flow:
curl http://10.0.0.1:8008/metric/ALL/stack/json
[{
 "agent": "10.0.0.253",
 "dataSource": "3",
 "metricN": 6,
 "metricName": "stack",
 "metricValue": 74663.29589986047,
 "topKeys": [{
  "key": "eth.ip.gre.ip.tcp,10.0.0.151,10.0.0.152,10.0.201.1,10.0.201.2",
  "updateTime": 1367096917146,
  "value": 74663.29589986047
 }],
 "updateTime": 1367096917146
}]
The result shows that the top tunneled flow currently traversing the network is a TCP connection in a GRE tunnel between inner addresses 10.0.201.1 and 10.0.201.2.
Note: Monitoring and controlling tunneled traffic is an important use case since tunnels are widely used for network virtualization and IPv6 migration, see Tunnels and Down the rabbit hole.
Perhaps the greatest limitation of on-switch flow analysis is the fact that the measurements are delayed on the switch, making them inaccessible to SDN applications, see Rapidly detecting large flows, sFlow vs. NetFlow/IPFIX. Centralized flow analysis liberates measurements from the devices to deliver real-time network wide analytics that support new classes of performance aware SDN application such as: load balancing, DDoS protection and packet brokers.

2 comments:

  1. Hi Peter, thanks for the post, this was really useful. Quick question, is the '$' in target=\”$10.0.0.1:6343\” a typo? Maybe you had a variable there before?

    ReplyDelete
    Replies
    1. Good catch. I have corrected the error in the article. Thanks.

      Delete