Monday, June 3, 2013

Flow collisions

Controlling large flows with OpenFlow describes how to build a performance aware software defined networking testbed using Mininet. This article describes how to use Mininet to create a minimal multi-path topology to experiment with load balancing of large flows.

Mininet 2.0 (a.k.a. Mininet HiFi) includes support for link bandwidth limits, making it an attractive platform to explore network performance. Mininet 2.0 has been used to  reproduce results from a number of significant research papers, see Reproducible Network Experiments Using Container-Based Emulation.
The diagram shows the basic leaf and spine topology that we will be constructing. The two leaf switches (s1 and s2) are connected by two spine switches (s3 and s4). The two paths connecting each top of rack switch are shown in red and blue in the diagram.

Note: Scaling of network characteristics is important when building performance models in Mininet. While production networks may have 10Gbit/s links, it is not realistic to expect the software emulation to faithfully model high speed links. Scaling the speeds down makes it possible to emulate the links in software, while still preserving the basic characteristics of the network. In this case, we will scale down by a factor of 1000, using 10Mbit/s links instead of 10Gbit/s links. The settings for the monitoring system need to be similarly scaled. The article SDN and large flows recommending a 1-in-10,000 sampling rate to detect large flows on 10Gbit/s links and a sampling rate of 1-in-10 will give the same response time on the 10Mbit/s links in the emulation.

The following Python script builds the network and enables sFlow on the switches:
#!/usr/bin/env python

from mininet.net  import Mininet
from mininet.node import RemoteController
from mininet.link import TCLink
from mininet.cli  import CLI
from mininet.util import quietRun

c = RemoteController('c',ip='127.0.0.1')
net = Mininet(link=TCLink);

# Add hosts and switches
leftHost1  = net.addHost('h1',ip='10.0.0.1',mac='00:04:00:00:00:01')
leftHost2  = net.addHost('h2',ip='10.0.0.2',mac='00:04:00:00:00:02')
rightHost1 = net.addHost('h3',ip='10.0.0.3',mac='00:04:00:00:00:03')
rightHost2 = net.addHost('h4',ip='10.0.0.4',mac='00:04:00:00:00:04')

leftSwitch     = net.addSwitch('s1')
rightSwitch    = net.addSwitch('s2')
leftTopSwitch  = net.addSwitch('s3')
rightTopSwitch = net.addSwitch('s4')

# Add links
# set link speeds to 10Mbit/s
linkopts = dict(bw=10)
net.addLink(leftHost1,  leftSwitch,    **linkopts )
net.addLink(leftHost2,  leftSwitch,    **linkopts )
net.addLink(rightHost1, rightSwitch,   **linkopts )
net.addLink(rightHost2, rightSwitch,   **linkopts )
net.addLink(leftSwitch, leftTopSwitch, **linkopts )
net.addLink(leftSwitch, rightTopSwitch,**linkopts )
net.addLink(rightSwitch,leftTopSwitch, **linkopts )
net.addLink(rightSwitch,rightTopSwitch,**linkopts )

# Start
net.controllers = [ c ]
net.build()
net.start()

# Enable sFlow
quietRun('ovs-vsctl -- --id=@sflow create sflow agent=eth0 target=127.0.0.1 sampling=10 polling=20 -- -- set bridge s1 sflow=@sflow -- set bridge s2 sflow=@sflow -- set bridge s3 sflow=@sflow -- set bridge s4 sflow=@sflow')

# CLI
CLI( net )

# Clean up
net.stop()
First start Floodlight and sFlow-RT and then run the script to build the network. The topology can then be viewed by accessing the Floodlight user interface (http://xx.xx.xx.xx:8080/ui/index.html) and clicking on the Topology tab.

Next, open xterm windows on the four hosts: h1, h2, h3 and h4. The following commands, typed into each window, generate a sequence of iperf tests from h1 to h2 and a separate set of tests between h3 and h4.

h2:
iperf -s
h4:
iperf -s
h1:
while true; do iperf -c 10.0.0.2 -i 60 -t 60; sleep 20; done
h3:
while true; do iperf -c 10.0.0.4 -i 60 -t 60; sleep 30; done

Run the following commands to install flow metrics in sFlow-RT to track traffic from each host:
curl -H "Content-Type:application/json" -X PUT --data "{value:'bytes',filter:'ipsource=10.0.0.1'}" http://localhost:8008/flow/h1/json
curl -H "Content-Type:application/json" -X PUT --data "{value:'bytes',filter:'ipsource=10.0.0.2'}" http://localhost:8008/flow/h2/json
curl -H "Content-Type:application/json" -X PUT --data "{value:'bytes',filter:'ipsource=10.0.0.3'}" http://localhost:8008/flow/h3/json
curl -H "Content-Type:application/json" -X PUT --data "{value:'bytes',filter:'ipsource=10.0.0.4'}" http://localhost:8008/flow/h4/json
The following sFlow-RT page shows the traffic for hosts h1 and h3:
The chart shows that each flow reaches a peak of around 1.2Mbytes/s (10Mbits/s), demonstrating that Mininet is emulating the 10Mbit/s links in the configuration. The chart also shows that there is no interaction between the flows, which is expected since shortest path flows between h1 and h2 are restricted to s1 and flows between h3 and h4 are restricted to s2.

Note: If the emulator can't keep up with the number or speed of links, you might see spurious interactions between the traffic streams. It is a good idea to run some tests like the one above to verify the emulation before moving on to more complex scenarios.

The next experiment generates flows across the spine switches.

h3:
iperf -s
h4:
iperf -s
h1:
while true; do iperf -c 10.0.0.2 -i 60 -t 60; sleep 20; done
h2:
while true; do iperf -c 10.0.0.4 -i 60 -t 60; sleep 30; done

The following sFlow-RT chart shows the traffic for hosts h1 and h2.
This chart clearly displays the effect of flow collisions on performance. When flows collide, each flow only achieves half the throughput.

While the topology has full cross sectional bandwidth between all pairs of hosts, Floodlight's shortest path forwarding algorithm places all flows between a pair of switches on the same path, resulting in collisions. However, even if ECMP routing were used, the chance of hash collisions between simultaneous flows in this two path network would be 50%.

Note: The paper, Hedera: Dynamic Flow Scheduling for Data Center Networks describes the effect of collisions on large scale ECMP fabrics and the results of the paper have been reproduced using Mininet, see Reproducible Network Experiments Using Container-Based Emulation.

The IETF Operations and Management Area Working Group (OPSAWG) recently adopted Mechanisms for Optimal LAG/ECMP Component Link Utilization in Networks as a working group draft. The draft mentions sFlow as a method for detecting large flows and the charts in this article demonstrate that sFlow's low latency traffic measurements provide clear signals that can be used to quickly detect collisions, allowing an SDN controller to re-route the colliding flows.

Note: The draft mentions NetFlow as a possible measurement technology, however, the article Rapidly detecting large flows, sFlow vs. NetFlow/IPFIX demonstrates that flow monitoring delays measurement, limiting their value for SDN control applications. It should also be noted that OpenFlow's metering mechanism shares the same architectural limitations as NetFlow/IPFIX. In addition, using polling mechanisms for retrieving metrics is also slower and less scaleable as a method for detecting large flows, see Measurement delay, counters vs. packet samples.

There are a number of additional factors than make sFlow an attractive as the measurement component is a load balancing solution.  The sFlow standard is widely supported by switch vendors and the measurements scale to 100Gbits/s and beyond (detecting large flows on a 100Gbit/s link with the same responsiveness shown in this testbed requires a sampling rate of only 1-in-100,000). In addition, sFlow monitoring scales to large numbers of devices (a single instance of sFlow-RT can easily monitor 10's of thousands of switch ports), providing the measurements needed to load balance large ECMP fabrics. Finally, every sFlow capable device provides the full set of flow identification attributes described in the draft, including: source MAC address, destination MAC address, VLAN ID, IP Protocol, IP source address, IP destination address, flow label (IPv6 only), TCP/UDP source port, TCP/UDP destination port and tunnels (GRE, VxLAN, NVGRE).

Load balancing large flows is only one of a number of promising applications for performance aware software defined networking. Others application include, traffic engineeringdenial of service (DoS) mitigation and packet brokers.

No comments:

Post a Comment