Tuesday, March 12, 2013

ECMP load balancing

Figure 1: Examples of ECMP collisions resulting in reduced bisection bandwidth
(from Hedera: Dynamic Flow Scheduling for Data Center Networks)
The paper Hedera: Dynamic Flow Scheduling for Data Center Networks describes the impact of colliding flows on effective ECMP cross sectional bandwidth. The paper gives an example which demonstrates that effective cross sectional bandwidth can be reduced by a factor of between 20% to 60%, depending on the number of simultaneous flows per host.

Figure 1 illustrates the two types of collision that can occur: local collisions when large flows converge on an uplink as they are forwarded from source aggregation switches to the core, and downstream collisions when large flows converge on a downlink from the core switches down to the target aggregation switch. Optimizing forwarding paths for large flows is an interesting challenge that requires end-to-end visibility across the fabric. An aggregation switch could use local visibility to avoid collisions on the uplinks by selecting a different core switch (e.g. Agg 0 can choose to forward the colliding flow through Core 1). However, there is only one downlink from a core switch to each aggregation switch and so avoiding downstream collisions is not a local decision (e.g. the collision on the downlink from Core 2 can be avoided if Agg 2 sends the flow via Core 3).
Figure 2: Performance aware software defined networking
Figure 2 shows the architecture described in Performance aware software defined networking,  Load balancing LAG/ECMP groups, and SDN and large flows. This architecture uses the standard sFlow monitoring embedded within most vendor's switches to continuously monitor all the links in the fabric. The sFlow-RT analytics engine rapidly detects large flows, providing end-to-end visibility to the SDN load balancing application. The load balancer communicates with an OpenFlow controller, or a vendor supplied fabric controller REST API to implement globally optimal forwarding decisions that avoid collisions and significantly increase the effective bandwidth of the fabric.

No comments:

Post a Comment