|Figure 1: Hash collision on a link aggregation group|
The draft describes the challenge of managing long lived connections in the context of service provider backbones, but similar problems occur in the data center where long lived storage connections (iSCSI/FCoE) and network virtualization tunnels (VxLAN, NVGRE, STT, GRE etc) are responsible for a significant fraction of data center traffic.
The challenge posed by long lived flows is best illustrated by a specific example. The article, Link aggregation, provides a basic overview of LAG/MLAG topologies. Figure 1 shows a detailed view of an aggregation group consisting of 4 links connecting Switch A and Switch C and is used to illustrate the problem posed by long lived flows.
To ensure that packets in a flow arrive in order at their destination, Switch C computes a hash function over selected fields in the packets (e.g. L2: source and destination MAC address, L3: source and destination IP address, or L4: source and destination IP address + source and destination TCP/UDP ports) and picks a link based on the value of the hash, e.g.
index = hash(packet fields) % linkgroup.size selected_link = linkgroup[index]Hash based load balancing is easily implemented in hardware and works reasonably well, as long as traffic consists of large numbers of short duration flows, since the hash function randomly distributes flows across the members of the group. However, consider the problem posed by the two long lived high traffic flows, shown in the diagram as Packet Flow 1 and Packet Flow 2. There is a 1 in 4 chance that the two flows will be assigned the same group member and they will share this link for their lifetime.
|Figure 2: from article Link aggregation|
Collisions between high traffic flows can result in chronic performance problems poorly balanced load on the link. The link carrying colliding flows may become overloaded and experiencing packet loss and delay while other links in the group may be lightly loaded with plenty of spare capacity. The challenge is identifying links with flow collisions and changing the path selection used by the switches in order to use spare capacity in the link group.
|Figure 3: Elements of an SDN stack to load balance aggregation groups|
- Measurement - The sFlow standard provides multi-vendor, scaleable, low latency monitoring of the entire network infrastructure.
- Analytics - The sFlow-RT real-time analytics engine receives the sFlow measurements and rapidly identifies large flows. In addition, the analytics engine provides the detailed information into link aggregation topology and health needed to choose alternate paths.
- SDN application - The SDN application implements a load balancing algorithm, immediately responding to large flows with commands to the OpenFlow controller.
- Controller - The OpenFlow controller translates high level instructions to re-route flows into low level OpenFlow commands.
- Configuration - The OpenFlow protocol provides a fast, programatic, means for the controller to re-configuring forwarding in the network devices.
|Figure 4: Load balance link aggregation group by moving large flow|
Load balancing is a continuous process, traffic carried by each of the long lived flows is continually changing and different flows will collide at different times and in different places. To be effective, the the control system needs to have pervasive visibility into traffic and control of switch forwarding. Selecting switches that support both the OpenFlow and sFlow standards creates a solid foundation for deploying performance aware software defined networking solutions like the one described in this article.
Load balancing isn't just an issue for link aggregation (LAG/MLAG) topologies, the same issues occur with equal cost multi-path routing (ECMP) and WAN traffic optimization. Other applications for performance aware SDN include denial of service mitigation, multi-tenant performance isolation and workload placement.