Saturday, October 11, 2014

Super NORMAL

KennyK/Shutterstock
HP proposes hybrid OpenFlow discussion at Open Daylight design forum describes some of the benefits of integrated hybrid OpenFlow and the reasons why the OpenDaylight community would be a good venue for addressing operational and multi-vendor interoperability issues relating to hybrid OpenFlow.

HP's slide presentation from the design forum, OpenFlow-hybrid Mode, gives an overview of hybrid mode OpenFlow and its benefits. The advantage of hybrid mode in leveraging the proven scaleability and operational robustness of existing distributed control mechanisms and complementing them with centralized SDN control is compelling and a number of vendors have released support, including: Alcatel Lucent Enterprise, Brocade, Extreme, Hewlett-Packard, Mellanox, and Pica8. HP's presentation goes on to propose enhancements to the OpenDaylight controller to support hybrid OpenFlow agents.

InMon recently built a hybrid OpenFlow controller and, based on our experiences, this article will discuss how integrated hybrid mode is currently implemented on the switches, examine operational issues, and propose an agent profile for hybrid OpenFlow designed to reduce operational complexity, particularly when addressing traffic engineering use cases such as DDoS mitigation, large flow marking and large flow steering on ECMP/LAG networks.

Mechanisms for Optimizing LAG/ECMP Component Link Utilization in Networks is an IETF Internet Draft, authored by Brocade, Dell, Huawei, Tata, and ZTE that discussed the benefits and operational challenges of the flow steering use case. In particular:
6.2. Handling Route Changes
Large flow rebalancing must be aware of any changes to the FIB.  In cases where the nexthop of a route no longer to points to the LAG, or to an ECMP group, any PBR entries added as described in Section 4.4.1 and 4.4.2 must be withdrawn in order to avoid the creation of forwarding loops. 
The essential feature of hybrid OpenFlow is that it leverages the capabilities of existing routing, switching and link state mechanisms to handle traffic without controller intervention. The controller only needs to install rules when it wants to override the default behavior. However, hybrid OpenFlow, as currently implemented, does not fully integrate with the on-switch control plane, resulting in complex and unpredictable behavior that is hard to align with forwarding policy established through the on-switch control plane (BGP, ISIS, LACP, etc), particularly when steering flows.

In order to best understand the challenges, it is worth taking a look at the architecture of an OpenFlow agent.
Figure 1: OpenFlow 1.3 switch
Figure 1 shows the functional elements of an OpenFlow 1.3 agent. Multiple tables in the Data Plane are exposed through OpenFlow to the OpenFlow controller. Packets entering the switch pass from table to table, matching different packet headers. If there is no match, the packet is discarded, if there is a match, an associated set of actions is applied to the packet, typically forwarding the packet to a specific egress port on the switch. The key to hybrid OpenFlow is the NORMAL action:
Optional: NORMAL: Represents the traditional non-OpenFlow pipeline of the switch (see 5.1). Can be used only as an output port and processes the packet using the normal pipeline. If the switch cannot forward packets from the OpenFlow pipeline to the normal pipeline, it must indicate that it does not support this action.
With integrated hybrid OpenFlow, the agent is given a low priority default rule that matches all packets and applies an action to send them to the NORMAL port (i.e. apply forwarding rules determined by the switch's control plan). There are two ways that vendors have chosen to install this rule:
  1. Explicit The controller is responsible for installing the default NORMAL rule when the switch connects to it.
  2. Implicit The switch is configured to operate in integrated hybrid mode and behaves as if the default NORMAL rule was installed.
HP's OpenDaylight presentation describes enhancements to the OpenDaylight controller required to support the explicit hybrid OpenFlow configuration:
The controller would send a default rule which tells the switch to forward packets to the
NORMAL port. This rule delegates the forwarding decision to the controlled switches, but it means that the controller would receive ZERO packet_in messages if no other rules were pushed. For this reason, we’d put this rule at priority 0 in the last hardware OF table of the pipeline. Without this rule, the default behavior for OF 1.0 is to steal to the controller and the default behavior for OF 1.3 is to drop all packets.
Note: Integrated hybrid OpenFlow control of HP switches provides a simple example demonstrating integration between InMon's controller and HP switches.

Explicit configuration requires that the controller understand each vendor's forwarding pipeline and deploy an appropriate default rule. The implicit method supported by other vendors (e.g. Brocade, Alcatel Lucent Enterprise) is much simpler since the vendor takes responsibility for applying the default NORMAL rule at the appropriate point in the pipeline.

The implicit method also has a number of operational advantages:
  1. The rule exists at startup In the implicit case the switch will forward normally before the switch connects to a controller and the switch will successfully forward packets if the controller is down or fails. In the explicit case the switch will drop all traffic on startup and continue to drop traffic if it can't connect to the controller and get the NORMAL rule. 
  2. The rule cannot be deleted In the implicit case the default NORMAL isn't visible to the controller and can't be accidentally deleted (which would disable all forwarding on the switch). In the explicit case, the OpenFlow controller must add the rule and it may be accidentally deleted by an SDN application.
  3. The agent knows its in hybrid mode In the implicit case the switch is responsible for adding the default rule and knows its in hybrid mode. In the explicit case, there switch would need to examine the rules that the controller had inserted and try and infer the correct behavior. As we'll see later, the switch must be able to differentiate between hybrid mode and pure OpenFlow mode in order to trigger more intelligent behavior.
However, even in the implicit case, there are significant challenges with integrated hybrid OpenFlow as it is currently implemented. The main problem is that the demarcation of responsibility between the NORMAL forwarding logic and the OpenFlow controller isn't clearly specified. For example, a use case described in Mechanisms for Optimizing LAG/ECMP Component Link Utilization in Networks:
Within a LAG/ECMP group, the member component links with least average port utilization are identified.  Some large flow(s) from the heavily loaded component links are then moved to those lightly-loaded member component links using a policy-based routing (PBR) rule in the ingress processing element(s) in the routers.
Figure 2, from the OpenDaylight Dynamic Flow Management proposal expands on the SDN controller architecture for global large flow load balancing:
Figure 2: Large Flow Global Load Balancing
Suppose that the controller has detected a large flow collision and constructs the following OpenFlow rule to direct one of the flows to a different port:
node:{id:'00:00:00:00:00:00:00:01', type:'OF'},
etherType:'0x0800',
nwSrc: '10.0.0.1', nwDst: '10.1.10.2',
protocol: '6', tpSrc: '42344', tpDst: '80'
actions:['OUTPUT=2']
The rule will fail to have the desired effect because the NORMAL control plane in this network is ECMP routing. Successfully sending the packet on port 2 so that it reaches its destination and doesn't interfere with the NORMAL forwarding protocols requires that the layer 2 headers be rewritten to set the VLAN to match port 2's VLAN, set the destination MAC address to match the next hop router's MAC address, the source MAC address to match port 2's MAC address, and finally decrementing the IP TTL.
node:{id:'00:00:00:00:00:00:00:01', type:'OF'},
etherType:'0x0800',
nwSrc: '10.0.0.1', nwDst: '10.1.10.2',
protocol: '6', tpSrc: '42344', tpDst: '80'
actions:[
'setDlSrc='00:04:00:00:00:02',
'setDlDst='00:04:00:00:02:02',
'setVLAN='1',
'decNwTTL',
'OUTPUT=2']
These additional actions involve information that is already known to the NORMAL control plane and which is difficult for the SDN controller to know. It gets even more complicated if you want to take routing and link state into account. The selected port may not represent a valid route, or the link may be down. In addition, routes may change and a rule that was once valid may become invalid and so must be removed (see 6.2. Handling Route Changes above).

Exposing hardware details makes sense if the external controller is responsible for all forwarding decisions (i.e. a pure OpenFlow environment). However, in a hybrid environment the NORMAL control plane is already populating the tables and the external controller should not need to concern itself with the hardware details.
Figure 3: Super NORMAL hybrid OpenFlow switch
Figure 3 proposes an alternative model for implementing integrated hybrid OpenFlow. It is referred to as "Super NORMAL" because it recognizes that the switch's forwarding agent is already managing the physical resources in the data plane and that the goal of integrated hybrid OpenFlow is integration with the forwarding agent, not direct control of the forwarding hardware. In this model a single OpenFlow table is exposed by the forwarding agent with keys and actions that can be composed with the existing control plane. In essence, the OpenFlow protocol is being used to manage forwarding policy, expressed as an OpenFlow table,  that is read by the Forwarding Agent and used to influence forwarding behavior.
Figure 4: SDN fabric controller for commodity data center switches
This model fits well with the hardware architecture, shown in Figure 4, of merchant silicon ASICs used in most current generation data center switches. The NORMAL control plane populates most of the tables in the ASIC and the forwarding agent can apply OpenFlow rules to the ACL Policy Flow Table to override default behavior. Many existing OpenFlow implementations are already very close to this model, but lack the integration needed to compose the OpenFlow rules with their forwarding method. The following enhancements to the hybrid OpenFlow agent would greatly improve the utility of hybrid OpenFlow:
  1. Implement implicit default NORMAL behavior
  2. Never generate Packet-In events (a natural result of implementing 1. above)
  3. Support NORMAL output action
  4. Expose a single table with matches and actions that are valid and compose with the configured forwarding protocol(s)
  5. Reject rules that are not valid options according to the NORMAL control plane:
    • if the NORMAL output would send a packet to a LAG and the specified port is not a member of the LAG, then the rule must be rejected.
    • if the NORMAL output would send a packet to an ECMP group and the specified port is not a member of the group then the rule must be rejected.
    • if the specified port is down then the rule must be rejected
    • if the rule cannot be fully implemented in the hardware data plane, then the rule must be rejected
  6. Remove rules that are no longer valid and send a flow removed message to the controller. A flow is not valid if it would be rejected (e.g. if a port goes down, rules directing traffic to that port must be immediately removed)
  7. Automatically add any required details needed to forward the traffic (e.g. rewrite source and destination mac addresses and decrement IP TTL if the packet is being routed)
Hybrid control of forwarding is the most complex operation and requires Super NORMAL functionality. Simpler operations such as blocking traffic or QoS marking are easily handled by the output DROP and NORMAL actions and solutions based on hybrid OpenFlow have been demonstrated:
Understanding the distinct architectural differences between hybrid and pure OpenFlow implementations is essential to get the most out of each approach to SDN. Pure OpenFlow is still an immature technology with limited applications. On the other hand, Hybrid OpenFlow works well with commodity switch hardware, leverages mature control plane protocols, and delivers added value in production networks.