sFlow: June 2015

Thursday, June 25, 2015

WAN optimization using real-time traffic analytics

TATA Consultancy Services white paper, Actionable Intelligence in the SDN Ecosystem: Optimizing Network Traffic through FRSA, demonstrates how real-time traffic analytics and SDN can be combined to perform real-time traffic engineering of large flows across a WAN infrastructure.

The architecture being demonstrated is shown in the diagram (this diagram has been corrected - the diagram in the white paper incorrectly states that sFlow-RT analytics software uses a REST API to poll the nodes in the topology. In fact, the nodes stream telemetry using the widely supported, industry standard, sFlow protocol, providing real-time visibility and scaleability that would be difficult to achieve using polling - see Push vs Pull).

The load balancing application receives real-time notifications of large flows from the sFlow-RT analytics software and programs the SDN Controller (in this case OpenDaylight) to push forwarding rules to the switches to direct the large flows across a specific path. Flow Aware Real-time SDN Analytics (FRSA) provides an overview of the basic ideas behind large flow traffic engineering that inspired this use case.

While OpenDaylight is used in this example, an interesting alternative for this use case would be the ONOS SDN controller running the Segment Routing application. ONOS is specifically designed with carriers in mind and segment routing is a natural fit for the traffic engineering task described in this white paper.

Leaf and spine traffic engineering using segment routing describes a demonstration combining real-time analytics and SDN control in a data center context. The demonstration was part of the recent 2015 Open Networking Summit (ONS) conference Showcase and presented in the talk, CORD: FABRIC An Open-Source Leaf-Spine L3 Clos Fabric, by Saurav Das.

Sunday, June 21, 2015

Optimizing software defined data center

The recent Fortune magazine article, Software-defined data center market to hit $77.18 billion by 2020, starts with the quote "Data centers are no longer just about all the hardware gear you can stitch together for better operations. There’s a lot of software involved to squeeze more performance out of your hardware, and all that software is expected to contribute to a burgeoning new market dubbed the software-defined data center."

The recent ONS2015 Keynote from Google's Amin Vahdat describes how Google builds large scale software defined data centers. The presentation is well worth watching in its entirety since Google has a long history of advancing distributed computing with technologies that have later become mainstream.

There are a number of points in the presentation that relate to the role of networking to the performance of cloud applications. Amin states, "Networking is at this inflection point and what computing means is going to be largely determined by our ability to build great networks over the coming years. In this world data center networking in particular is a key differentiator."

This slide shows the the large pools of storage and compute connected by the data center network that are used to deliver data center services. Amin states that the dominant costs are compute and storage and that the network can be relatively inexpensive.

In Overall Data Center Costs James Hamilton breaks down the monthly costs of running a data center and puts the cost of network equipment at 8% of the overall cost.

However, Amin goes on to explain why networking has a disproportionate role in the overall value delivered by the data center.

The key to an efficient data center is balance. If a resource is scarce, then other resources are left idle and this increases costs and limits the overall value of the data center. Amin goes on to state, "Typically the resource that is most scarce is the network."

The need to build large scale high-performance networks has driven Google to build networks with the following properties:

Leaf and Spine (Clos) topology
Merchant silicon based switches (white box / brite box / bare metal)
Centralized control (SDN)

The components and topology of the network are shown in the following slide.

Here again Google is leading the overall network market transition to inexpensive leaf and spine networks built using commodity hardware.

Google is not alone in leading this trend. Facebook has generated significant support for the Open Compute Project (OCP), which publishes open source designs data center equipment, including merchant silicon based leaf and spine switches. A key OCP project is the Open Network Install Environment (ONIE), which allows third party software to be installed on the network equipment. ONIE separates hardware from software and has spawned a number of innovative networking software companies, including: Cumulus Networks, Big Switch Networks, Pica8, Pluribus Networks. Open network hardware and the related ecosystem of software is entering the mainstream as leading vendors such as Dell and HP deliver open networking hardware, software and support to enterprise customers.

The ONS2015 keynote from AT&T's John Donovan, describes the economic drivers for AT&T's transition to open networking and compute architectures.

John discusses the rapid move from legacy TDM (Time Division Multiplexing) technologies to commodity Ethernet, explaining that "video now makes up the majority of traffic on our network." This is a fundamental shift for AT&T and John states that "We plan to virtualize and control more than 75% of our network using cloud infrastructure and a software defined architecture."

John mentions the CORD (Central Office Re-architected as a Datacenter) project which proposes an architecture very similar to Google's, consisting of a leaf and spine network built using open merchant silicon based hardware connecting commodity servers and storage. A prototype of the CORD leaf and spine network was shown as part of the ONS2015 Solutions Showcase.

ONS2015 Solutions Showcase: Open-source spine-leaf Fabric

Leaf and spine traffic engineering using segment routing and SDN describes a live demonstration presented in ONS2015 Solutions Showcase. The demonstration shows how centralized analytics and control can be used to optimize the performance of commodity leaf and spine networks handling the large "Elephant" flows that typically comprise most traffic on the network (for example, video streams - see SDN and large flows for a general discussion).

Getting back to the Fortune article, it is clear that the move to open commodity network, server and storage hardware shifts value from hardware to the software solutions that optimize performance. The network in particular is a critical resource that constrains overall performance and network optimization solutions can provide disproportionate benefits by eliminating bottlenecks that constrain compute and storage and limit the value delivered by the data center.

Friday, June 12, 2015

Leaf and spine traffic engineering using segment routing and SDN

The short 3 minute video is a live demonstration showing how software defined networking (SDN) can be used to orchestrate the measurement and control capabilities of commodity data center switches to automatically load balance traffic on a 4 leaf, 4 spine, 10 Gigabit leaf and spine network.

The diagram shows the physical layout of the demonstration rack. The four logical racks with their servers and leaf switches are combined in a single physical rack, along with the spine switches, and SDN controllers. All the links in the data plane are 10G and sFlow has been enabled on every switch and link with the following settings, packet sampling rate 1-in-8192 and counter polling interval 20 seconds. The switches have been configured to send the sFlow data to sFlow-RT analytics software running on Controller 1.

The switches are also configured to enable OpenFlow 1.3 and connect to multiple controllers in the redundant ONOS SDN controller cluster running on Controller 1 and Controller 2.

The charts from The Nature of Datacenter Traffic: Measurements & Analysis show data center traffic measurements published by Microsoft. Most traffic flows are short duration. However, combined they consume less bandwidth than a much smaller number of large flows with durations ranging from 10 seconds to 100 seconds. The large number of small flows are often referred to as "Mice" and the small number of large flows as "Elephants."

This demonstration focuses on the Elephant flows since they consume most of the bandwidth. The iperf load generator is used to generate two streams of back to back 10Gbyte transfers that should take around 8 seconds to complete over the 10Gbit/s leaf and spine network.

while true; do iperf -B 10.200.3.32 -c 10.200.3.42 -n 10000M; done

while true; do iperf -B 10.200.3.33 -c 10.200.3.43 -n 10000M; done

These two independent streams of connections from switch 103 to 104 drive the demo.

The HTML 5 dashboard queries sFlow-RT's REST API to extract and display real-time flow information.

The dashboard shows a topological view of the leaf and spine network in the top left corner. Highlighted "busy" links have a utilization of over 70% (i.e. 7Gbit/s). The topology shows flows taking independent paths from 103 to 104 (via spines 105 and 106). The links are highlighted in blue to indicate that the utilization on each link is driven by a single large flow. The chart immediately under the topology trends the number of busy links. The most recent point, to the far right of the chart, has a value of 4 and is colored blue, recording that 4 blue links are shown in the topology.

The bottom chart trends the total traffic entering the network broken out by flow. The current throughput is just under 20Gbit/s and is comprised of two roughly equal flows.

The ONOS controller configures the switches to forward packets using Equal Cost Multi-Path (ECMP) routing. There are four equal cost (hop count) paths from leaf switch 103 to leaf switch 104 (via spine switches 105, 106, 107 and 108). The switch hardware selects between paths based on a hash function calculated over selected fields in the packets (e.g. source and destination IP addresses + source and destination TCP ports), e.g.

index = hash(packet fields) % group.size
selected_physical_port = group[index]

Hash based load balancing works well for large numbers of Mice flows, but is less suitable for the Elephant flows. The hash function may assign multiple Elephant flows to the same path resulting in congestion and poor network performance.

This screen shot shows the effect of a collision between flows. Both flows have been assigned the same path via spine switch 105. The analytics software has determined that there are multiple large flows on the pair of busy links and indicates this by coloring the highlighted links yellow. The most recent point, to the far right of the upper trend chart, has a value of 2 and is colored yellow, recording that 2 yellow links are shown in the topology.

Notice that the bottom chart shows that the total throughput has dropped to 10Gbit/s and that each of the flows is limited to 5Gbit/s - halving the throughput and doubling the time taken to complete the data transfer.

The dashboard demonstrates that the sFlow-RT analytics engine has all the information needed to characterize the problem - identifying busy links and the large flows. What is needed is a way to take action to direct one of the flows on a different path across the network.

This is where the segment routing functionality of the ONOS SDN controller comes into its own. The controller implements Segment Routing in Networking (SPRING) as the method of ECMP forwarding and provides a simple REST API for specifying paths across the network and assigning traffic to those paths.

In this example, the traffic is colliding because both flows are following a path running through spine switch 105. Paths from leaf 103 to 104 via spines 106, 107 or 108 have available bandwidth.

The following REST operation instructs the segment routing module to build a path from 103 via 106 to 104:

curl -H "Content-Type: application/json" -X POST http://localhost:8181/onos/segmentrouting/tunnel -d '{"tunnel_id":"t1", "label_path":[103,106,104]}'

Once the tunnel has been defined, the following REST operation assigns one of the colliding flows to the new path:

curl -H "Content-Type: application/json" -X POST http://localhost:8181/onos/segmentrouting/policy -d '{"policy_id":"p1", "priority":1000, "src_ip":"10.200.3.33/32", "dst_ip":"10.200.4.43/32", "proto_type":"TCP", "src_tp_port":53163, "dst_tp_port":5001, "policy_type":"TUNNEL_FLOW", "tunnel_id":"t1"}'

However, manually implementing these controls isn't feasible since there is a constant stream of flows that would require policy changes every few seconds.

The final screen shot shows the result of enabling the Flow Accelerator application on sFlow-RT. Flow Accelerator watches for collisions and automatically applies and removes segment routing policies as required to separate Elephant flows, in this case the table on the top right of the dashboard shows that a single policy has been installed sending one of the flows via spine 107.

The controller has been running for about half the interval show in the two trend charts (approximately two and half minutes). To the left you can see frequent long collisions and consequent dips in throughput. To the right you can see that more of the links are kept busy and flows experience consistent throughput.

Traffic analytics are a critical component of this demonstration. Why does this demonstration use sFlow? Could NetFlow/JFlow/IPFIX/OpenFlow etc. be used instead? The above diagram illustrates the basic architectural difference between sFlow and other common flow monitoring technologies. For this use case the key difference is that with sFlow real-time data from the entire network is available in a central location (the sFlow-RT analytics software), allowing the traffic engineering application to make timely load balancing decisions based on complete information. Rapidly detecting large flows, sFlow vs. NetFlow/IPFIX presents experimental data to demonstrate the difference is responsiveness between sFlow and the other flow monitoring technologies. OK, but what about using hardware packet counters periodically pushed via sFlow, or polled using SNMP or OpenFlow? Here again, measurement delay limits the usefulness of the counter information for SDN applications, see Measurement delay, counters vs. packet samples. Fortunately, the requirement for sFlow is not limiting since support for standard sFlow measurement is built into most vendor and white box hardware - see Drivers for growth.

Finally, the technologies presented in this demonstration have broad applicability beyond the leaf and spine use case. Elephant flows dominate data center, campus, wide area, and wireless networks (see SDN and large flows). In addition, segment routing is applicable to wide area networks as was demonstrated by an early version of the ONOS controller (Prototype & Demo Videos). The demonstration illustrates that the integration real-time sFlow analytics in SDN solutions enables fundamentally new use cases that drive SDN to a new level - optimizing networks rather than simply provisioning them.