Sunday, March 31, 2013

Pragmatic software defined networking

Figure 1: Fabric: A Retrospective on Evolving SDN
The article, Fabric: A Retrospective on Evolving SDN, makes the case for a two tier software defined networking (SDN) architecture; comprising a smart edge and an efficient core.

Smart edge

Figure 2: Network virtualization (credit Brad Hedlund)
Network Virtualization: a next generation modular platform for the data center virtual network describes the elements of the Nicira/VMware network virtualization platform which represents the current state of the art in smart edges. Figure 2 shows the architectural elements of the solution, all of which are implemented a the network edge. The purple box, labelled Any Network, maps to the Fabric Elements core shown in Figure 1.

Through a process of network function virtualization (NFV), layer 3-7 components such as routers, load balancers and firewalls are abstracted as services that can be implemented in virtual machines, or as physical devices linked together by a virtual topology connected by tunnels across the physical network.

The Open vSwitch (OVS) is a critical component is this architecture, providing a number of critical features:
  1. Flexibility - the virtual switch is implemented in software, allowing a rapidly evolving set of advanced features to be developed that would be difficult, time consuming, and expensive to replicate in hardware.
  2. Configuration - OVSDB configuration protocol allows the controller to coordinate configurations among the virtual switches.
  3. OpenFlow - allows centralized control of the complex forwarding policies needed to create virtual networks.
  4. Monitoring - including NetFlow and sFlow to provide visibility into traffic flows.
Figure 2 claims that any network can be used to carry traffic between the edge switches. However, while technically true, it is clear from the diagram that East-West (traffic between vSwitches and between top of rack switches) dominates. The shared physical network must meet the bandwidth and latency requirements of the overlayed virtual network in order for network virtualization to be a viable solution.    

Efficient core

The architecture in figure 1 simplifies the core by shifting complex classification tasks to the network edge. The core fabric is left with the task of efficiently managing physical resources in order to deliver low latency, high bandwidth connectivity between edge switches.

Current fabrics make use of distributed routing protocols such as Transparent Interconnection of Lots of Links (TRILL), Link Aggregation (LAG/MLAG), Multiprotocol Label Switching (MPLS) and Equal Cost Multi-Path Routing (ECMP) to control forwarding paths. In order to deliver improved efficiency, the feature set required from the SDN/OpenFlow control plane needs to address the requirements of traffic engineering.

Centralized configuration mechanisms (e.g. NETCONF) are useful for provisioning the fabrics and the distributed control planes provide the robust, high performance switching performance needed at the core. However, there are important classes of traffic that are poorly handled by the distributed control plane and would be more efficiently routed by a central SDN controller, see SDN and large flows. A hybrid solution combining the best elements of existing hardware, control planes and OpenFlow offers a solution.
Figure 3: Hybrid Programmable Forwarding Planes
Figure 3 shows two models for hybrid OpenFlow deployment, allowing OpenFlow to be used in conjunction with existing routing protocols. The Ships-in-the-Night model divides the switch into two, allocating selected ports to external OpenFlow control and the remaining ports are left to the internal control plane. It is not clear how useful this model is, other than for experimentation. For production use cases (e.g. the top of rack (ToR) case shown in Figure 2, where the switch is used to virtualize network services for a rack of physical servers) a pure OpenFlow switch is much simpler and likely to provide a more robust solution.

The Integrated hybrid model is much more interesting since it can be used to combine the best attributes of OpenFlow and existing distributed routing protocols to deliver a robust solutions. The OpenFlow 1.3.1 specification includes supports for the integrated hybrid model by defining the NORMAL action:
Optional: NORMAL: Represents the traditional non-OpenFlow pipeline of the switch (see 5.1). Can be used only as an output port and processes the packet using the normal pipeline. If the switch cannot forward packets from the OpenFlow pipeline to the normal pipeline, it must indicate that it does not support this action.
Hybrid solutions leverage the full capabilities of vendor and  merchant silicon which efficiently support distributed forwarding protocols. In addition, most switch and merchant silicon vendors embed support for the sFlow standard, allowing the fabric controller to rapidly detect large flows and apply OpenFlow forwarding rules to steer the flows and optimize performance. The articles Load balancing LAG/ECMP groups and ECMP load balancing describe hybrid control strategies for increasing the performance of switch fabrics.

Existing switching silicon is often criticized for the limited size of the hardware forwarding tables, supporting too few general match OpenFlow forwarding rules to be useful in production settings. However, consider that SDN and large flows defines a large flow as a flow that consumes 10% of a link's bandwidth. Using this definition, a 48 port switch would require a maximum of 480 general match rules in order to steer all large flows, well within the capabilities of current hardware (see OpenFlow Switching Performance: Not All TCAM Is Created Equal).
Figure 4: Mad Max supercharger switch
Among the advantages of a hybrid solution is that dependency on the central fabric controller is limited - if the controller fails, the switches fall back to embedded forwarding and network connectivity is maintained. The hybrid SDN controller can be viewed as a supercharger that boosts the performance of existing networks by finding global optimizations that are inaccessible to the distributed control planes - hence the gratuitous picture from Mad Max in figure 4 - for the uninitiated the red switch activates a supercharger that dramatically improves the car's performance.

In data centers, the network is a small part of overall costs and is often seen as unimportant. However, network bottlenecks idle expensive, power hungry servers and reduce overall data center performance and throughput. Improving the performance of the network increases throughput of servers and delivers increased ROI.

Network performance problems are insidious costs for most organisations because of the split between networking and compute teams: network managers don't see the impact of network congestions on server throughput, and application development and operations teams (DevOps) don't have visibility into how application performance is being constrained by the network. The article, Network virtualization, management silos and missed opportunities discusses how these organizational problems risk being transferred into current cloud orchestration frameworks. What is needed is a framework for coordinating between layers in order to achieve optimal performance.

Coordinated control

Figure 5: Virtual and physical packet paths
Figure 5 shows a virtual network on the upper layer and maps the paths onto a physical network below. The network virtualization architecture is not aware of the topology of the underlying physical network and so the physical location of virtual machines and resulting packet paths are unlikely to bear any relationship to their logical relationships, resulting in an inefficient "spaghetti" of traffic flows.

Note: A popular term for this type of inefficient traffic path is a hairpin or a traffic trombone, however these terms imply a singular mistake, rather than a systematic problem resulting in a marching band of trombones. The term spaghetti routing has been around a long time and conjures the image of a chaotic tangle that is more appropriate to the traffic patterns that result from the willful lack of locality awareness in current network virtualization frameworks.

In practice there is considerable structure to network traffic that can be exploited by the controller:
  1. Traffic within the group of virtual machines belonging to a tenant is much greater than traffic between different tenants.
  2. Traffic between hosts within scale-out clusters and between clusters is highly structured.
Note: Plexxi refers to the structured communication patterns as affinities, see Traffic Patterns and Affinities.
Figure 6: Cloud operating system
System boundary describes how extending the span of control to include network, server and compute resources provides new opportunities for increasing efficiency. Figure 6 extends the controller hierarchy from Figure 1 to include the compute controller responsible for virtual machine placement and adds an overarching cloud operating system with APIs connecting to the compute, edge and fabric controllers. This architecture allows for coordination of resources between the compute, edge and core subsystems. For example the cloud operating system can use topology information learned from the Fabric controller to direct the Compute controller to move virtual machines in order to disentangle the spaghetti of traffic flows. Pro-actively, the cloud operating system can take into account information about the locations and communication patterns of a tenant's existing virtual machines and find the optimal location when asked to create a new virtual machine.

Note: Plexxi puts an interesting spin on topology optimization, using optical wave division multiplexing (WDM) in their top of rack switches to dynamically create topologies matched to traffic affinities, see Affinity Networking for Data Centers and Clouds.

A comprehensive measurement system is an essential component of an efficient cloud architecture, providing feedback to the controller so that it can optimize resource allocation. The sFlow standard addresses the requirements for pervasive visibility by embedding instrumentation within physical and virtual networking and in the servers and applications making use of the network. The main difference between the architecture shown in Figure 6 and current cloud orchestration architectures like OpenStack is the inclusion of feedback paths, the upward arrows, that allow the lower layer controllers and the cloud operating system to be performance and location aware when making load placement decisions, see Network virtualization, management silos and missed opportunities.

No comments:

Post a Comment