Friday, June 12, 2015

Leaf and spine traffic engineering using segment routing and SDN

The short 3 minute video is a live demonstration showing how software defined networking (SDN) can be used to orchestrate the measurement and control capabilities of commodity data center switches to automatically load balance traffic on a 4 leaf, 4 spine, 10 Gigabit leaf and spine network.
The diagram shows the physical layout of the demonstration rack. The four logical racks with their servers and leaf switches are combined in a single physical rack, along with the spine switches, and SDN controllers. All the links in the data plane are 10G and sFlow has been enabled on every switch and link with the following settings, packet sampling rate 1-in-8192 and counter polling interval 20 seconds. The switches have been configured to send the sFlow data to sFlow-RT analytics software running on Controller 1.

The switches are also configured to enable OpenFlow 1.3 and connect to multiple controllers in the redundant ONOS SDN controller cluster running on Controller 1 and Controller 2.
The charts from The Nature of Datacenter Traffic: Measurements & Analysis show data center traffic measurements published by Microsoft. Most traffic flows are short duration. However, combined they consume less bandwidth than a much smaller number of large flows with durations ranging from 10 seconds to 100 seconds. The large number of small flows are often referred to as "Mice" and the small number of large flows as "Elephants."

This demonstration focuses on the Elephant flows since they consume most of the bandwidth. The iperf load generator is used to generate two streams of back to back 10Gbyte transfers that should take around 8 seconds to complete over the 10Gbit/s leaf and spine network.
while true; do iperf -B -c -n 10000M; done
while true; do iperf -B -c -n 10000M; done
These two independent streams of connections from switch 103 to 104 drive the demo.
The HTML 5 dashboard queries sFlow-RT's REST API to extract and display real-time flow information.

The dashboard shows a topological view of the leaf and spine network in the top left corner. Highlighted "busy" links have a utilization of over 70% (i.e. 7Gbit/s). The topology shows flows taking independent paths from 103 to 104 (via spines 105 and 106). The links are highlighted in blue to indicate that the utilization on each link is driven by a single large flow. The chart immediately under the topology trends the number of busy links. The most recent point, to the far right of the chart, has a value of 4 and is colored blue, recording that 4 blue links are shown in the topology.

The bottom chart trends the total traffic entering the network broken out by flow. The current throughput is just under 20Gbit/s and is comprised of two roughly equal flows.

The ONOS controller configures the switches to forward packets using Equal Cost Multi-Path (ECMP) routing. There are four equal cost (hop count) paths from leaf switch 103 to leaf switch 104 (via spine switches 105, 106, 107 and 108). The switch hardware selects between paths based on a hash function calculated over selected fields in the packets (e.g. source and destination IP addresses + source and destination TCP ports), e.g.
index = hash(packet fields) % group.size
selected_physical_port = group[index]
Hash based load balancing works well for large numbers of Mice flows, but is less suitable for the Elephant flows. The hash function may assign multiple Elephant flows to the same path resulting in congestion and poor network performance.
This screen shot shows the effect of a collision between flows. Both flows have been assigned the same path via spine switch 105. The analytics software has determined that there are multiple large flows on the pair of busy links and indicates this by coloring the highlighted links yellow. The most recent point, to the far right of the upper trend chart, has a value of 2 and is colored yellow, recording that 2 yellow links are shown in the topology.

Notice that the bottom chart shows that the total throughput has dropped to 10Gbit/s and that each of the flows is limited to 5Gbit/s - halving the throughput and doubling the time taken to complete the data transfer.

The dashboard demonstrates that the sFlow-RT analytics engine has all the information needed to characterize the problem - identifying busy links and the large flows. What is needed is a way to take action to direct one of the flows on a different path across the network.

This is where the segment routing functionality of the ONOS SDN controller comes into its own. The controller implements Segment Routing in Networking (SPRING)  as the method of ECMP forwarding and provides a simple REST API for specifying paths across the network and assigning traffic to those paths.

In this example, the traffic is colliding because both flows are following a path running through spine switch 105. Paths from leaf 103 to 104 via spines 106, 107 or 108 have available bandwidth.

The following REST operation instructs the segment routing module to build a path from 103 via 106 to 104:
curl -H "Content-Type: application/json" -X POST http://localhost:8181/onos/segmentrouting/tunnel -d '{"tunnel_id":"t1", "label_path":[103,106,104]}'
Once the tunnel has been defined, the following REST operation assigns one of the colliding flows to the new path:
curl -H "Content-Type: application/json" -X POST http://localhost:8181/onos/segmentrouting/policy -d '{"policy_id":"p1", "priority":1000, "src_ip":"", "dst_ip":"", "proto_type":"TCP", "src_tp_port":53163, "dst_tp_port":5001, "policy_type":"TUNNEL_FLOW", "tunnel_id":"t1"}'
However, manually implementing these controls isn't feasible since there is a constant stream of flows that would require policy changes every few seconds.
The final screen shot shows the result of enabling the Flow Accelerator application on sFlow-RT. Flow Accelerator watches for collisions and automatically applies and removes segment routing policies as required to separate Elephant flows, in this case the table on the top right of the dashboard shows that a single policy has been installed sending one of the flows via spine 107.

The controller has been running for about half the interval show in the two trend charts (approximately two and half minutes). To the left you can see frequent long collisions and consequent dips in throughput. To the right you can see that more of the links are kept busy and flows experience consistent throughput.
Traffic analytics are a critical component of this demonstration. Why does this demonstration use sFlow? Could NetFlow/JFlow/IPFIX/OpenFlow etc. be used instead? The above diagram illustrates the basic architectural difference between sFlow and other common flow monitoring technologies. For this use case the key difference is that with sFlow real-time data from the entire network is available in a central location (the sFlow-RT analytics software), allowing the traffic engineering application to make timely load balancing decisions based on complete information. Rapidly detecting large flows, sFlow vs. NetFlow/IPFIX presents experimental data to demonstrate the difference is responsiveness between sFlow and the other flow monitoring technologies. OK, but what about using hardware packet counters periodically pushed via sFlow, or polled using SNMP or OpenFlow? Here again, measurement delay limits the usefulness of the counter information for SDN applications, see Measurement delay, counters vs. packet samples. Fortunately, the requirement for sFlow is not limiting since support for standard sFlow measurement is built into most vendor and white box hardware - see Drivers for growth.

Finally, the technologies presented in this demonstration have broad applicability beyond the leaf and spine use case. Elephant flows dominate data center, campus, wide area, and wireless networks (see SDN and large flows). In addition, segment routing is applicable to wide area networks as was demonstrated by an early version of the ONOS controller (Prototype & Demo Videos). The demonstration illustrates that the integration real-time sFlow analytics in SDN solutions enables fundamentally new use cases that drive SDN to a new level - optimizing networks rather than simply provisioning them.


  1. Hi,
    I would like to use this tutorial for my study. So, please give any references for the segment routing topology's python file and network-config.json file that run on this tutorial.
    Best Regards,

  2. hello peter, is there any file or script for this that can be used for reference?

    1. Unfortunately, there isn't a complete version I can share. However, the Fabric Metrics application is based on the demonstration and has hooks for implementing control schemes.