Saturday, January 3, 2015

Hybrid OpenFlow ECMP testbed

SDN fabric controller for commodity data center switches describes how the real-time visibility and hybrid control capabilities of commodity data center switches can be used to automatically adapt the network to changing traffic patterns and optimize performance. The article identifies hybrid OpenFlow as a critical component of the solution, allowing SDN to be combined with proven distributed routing protocols (e.g. BGP, ISIS, OSPF, etc) to deliver scaleable, production ready solutions that fully leverage the capabilities of commodity hardware.

This article will take the example of large flow marking that has been demonstrated using physical switches and show how Mininet can be used to emulate hybrid control of data center networks and deliver realistic results.
The article Elephant Detection in Virtual Switches & Mitigation in Hardware describes a demonstration by VMware and Cumulus Networks that shows how real-time detection and marking of large "Elephant" flows can dramatically improve application response time for small latency sensitive "Mouse" flows without impacting the throughput of the Elephants - see Marking large flows for additional background.
Performance optimizing hybrid OpenFlow controller demonstrated how hybrid OpenFlow can be used to mark Elephant flows on a top of rack switch. However, building test networks with physical switches to test the controller with realistic topologies is expensive and time consuming.

Mininet offers an attractive alternative, providing a lightweight network emulator that can be run in a virtual machine on a laptop and realistically simulate network topologies. In this example, Mininet will be used to emulate the four switch leaf and spine network shown in the diagram at the top of this article.

The sFlow-RT SDN controller includes a script that configures Mininet to emulate ECMP leaf and spine fabrics with hybrid OpenFlow capable switches. To run the emulation, copy the script from the sFlow-RT extras directory to your Mininet system and run the following command to create the leaf and spine network:
sudo ./ --collector= --controller= --topofile
There are a few points to note about the emulation:
  1. While physical networks might have link speeds ranging from 1Gbit/s to 100Gbit/s, the emulation scales link speeds down to 10Mbit/s so that they can be emulated in software.
  2. The sFlow sampling rate is scaled proportionally - see Large flow detection
  3. A pair of OpenFlow 1.3 tables is used to emulate normal ECMP forwarding and hybrid OpenFlow overrides
  4. Linux Traffic Control (tc) commands are used to emulate hardware priority queueing based on Differentiated Services Code Point (DSCP) class, mapping DSCP class 8 to a lower priority or "less than best effort" queue.
  5. The script posts the topology as a JSON file under the default Apache document root so that it can be retrieved remotely by an SDN controller
  6. In this example the sFlow-RT controller is running on host - change the address to match your setup.
The following script runs the ping command to test response time and plots the results as a simple text-based bar chart:
ping $1 | awk -v SCALE=$SCALE 'BEGIN {FS="[= ]"; } NF==11 { n = $10 * SCALE; bar
 = ""; while(n >= 1) { bar = bar "*"; n--} print bar " " $10 " ms" }
Open and xterm on host h1 and run the command:
./pingtest 10
Next type the following command at the Mininet prompt to generate a large flow:
iperf h2 h3
The following screen capture shows the result of the iperf test:
The reported throughput of around the 10Mbit/s shows that the traffic is saturating the emulated 10Mbit/s links.

The following screen capture shows the ping results during the iperf test.
The ping test clearly shows the impact that the Elephant flow is having on response time. In addition, the increased response times of around 3ms are consistent with values shown in VMware / Cumulus Networks charts shown earlier.

The following sFlow-RT mark.js script implements an SDN controller that marks Elephant flows:

// get topology from Mininet

// Define large flow as greater than 1Mbits/sec for 1 second or longer
var bytes_per_second = 1000000/8, duration_seconds = 1;

// define TCP flow cache
 {keys:'ipsource,ipdestination,tcpsourceport,tcpdestinationport', filter:'direct
  value:'bytes', t:duration_seconds}

// set threshold identifying Elephant flows
setThreshold('elephant', {metric:'tcp', value:bytes_per_second, byFlow:true, tim

// set OpenFlow marking rule when Elephant is detected
var idx = 0;
setEventHandler(function(evt) {
 if(topologyInterfaceToLink(evt.agent,evt.dataSource)) return;
 var port = ofInterfaceToPort(evt.agent,evt.dataSource);
 if(port) {
  var dpid = port.dpid;
  var id = "mark" + idx++;
  var k = evt.flowKey.split(',');
  var rule= {
    match:{in_port: port.port, dl_type:2048, ip_proto:6, nw_src:k[0], nw_dst:k[1], tcp_src:k[2], tcp_dst:k[3]},
    actions:["set_ip_dscp=8","output=normal"], priority:1000, idleTimeout:5
About the script:
  1. The included leafandspine-hybrid.js script emulates hybrid OpenFlow by rewriting the NORMAL OpenFlow action to jump to the table that contains the ECMP forwarding rules. 
  2. The script assumes that Mininet emulation is running on host Modify the address in the setTopology() function for your setup.
  3. The setFlow() function instructs the controller build a flow cache to track TCP connections
  4. The setThreshold() function defines Elephant flows as TCP connections that exceed 10% of the link's bandwidth (in this case 1Mbit/second) for 1 second or more.
  5. The setEventHandler() function processes each Elephant flow notification and applies an OpenFlow marking rules to the ingress port on the edge switch where the traffic enters the fabric.
  6. The OpenFlow rules have an idleTimeout of 5 seconds, ensuring that they are automatically deleted by the switch when the flow ends.
Modify the sFlow-RT script to include the following settings:
Start sFlow-RT:
Repeat the iperf test.
The iperf results show that throughput of large flows is unaffected by the controller.
The screen capture shows the controller actions. The controller installs an OpenFlow rule as soon as the large flow is detected, settings the ip_dscp value to 8 and outputting the packets to the normal ECMP forwarding pipeline. The marked packets are treated as lower priority than the ping packets. Since the ping packets aren't stuck behind the deep queues caused by the iperf tests, the reported response times should be unaffected by the large flow.
The ping test confirms that with the controller running, response times are unaffected by Elephant flows, an approximately 10 times improvement in response time that is consistent with the results shown for a physical switch in the VMware / Cumulus charts shown earlier.

More broadly, hybrid OpenFlow provides an effective way to deliver SDN solutions in production, using OpenFlow to enhance the performance of existing networks. In addition to large flow marking, other cases described on this blog include: DDoS mitigation,  enforcing black lists, ECMP load balancing, and packet brokers.

Increasingly, vendors recognize the critical importance of hybrid OpenFlow in delivering practical SDN solutions - HP proposes hybrid OpenFlow discussion at Open Daylight design forum. The article Super NORMAL offers some suggestions for enhancing hybrid OpenFlow to address additional use cases, reduce operational complexity and increase reliability in production settings.

Finally, the sFlow measurement standard is critical to unlocking the full potential of hybrid OpenFlow. Support for sFlow is build into commodity switch hardware, providing cost effective visibility into traffic on production networks. The comprehensive real-time traffic analytics delivered by sFlow allows an SDN controller to effectively target actions, managing the limited hardware resources on the switches, to enhance network performance and security.


  1. Hi,

    I added to this script, getOfRule(dpid,id) in the next line of setOfRule.
    But I can not get any rules by calling http://localhost:8008/of/rule/0000000000000003/mark1/json although the large flow is marked.
    Please give any suggestions?


    1. The rule has an idleTimeout of 5 seconds, causing the rule to be automatically removed by the switch within 5 seconds of the flow ending (and also automatically removed from sFlow-RT). Each subsequent Elephant flow rule has a different id mark2, mark3, ... making it difficult to read back rules via the REST API.

  2. Hi,
    I'd like to know that can Sflow analyzer be used as a application classifier for application aware engineering in SDN environment.Can it classify which flow is video, which flow is SSH or something like this.
    May Thu.

    1. You can classify traffic in a variety of ways. There are a large number of flow attributes that sFlow-RT extracts from packet headers, see Defining Flows. sFlow-RT can also incorporate domain name, geo location, address grouping, web server, and Host sFlow metadata from a variety of sources, e.g. Blacklists and Open Virtual Network (OVN)