sFlow: October 2009

Sunday, October 25, 2009

Probes

The RMON (Remote MONitoring) standard was developed in the early 1990's to standardize network monitoring devices (usually referred to as "probes"). At the time, Ethernet LANs consisted of coax cables that where shared by a number of hosts. Repeaters were used to connect the the cables and extend the network. In this environment, a single RMON probe would see all the traffic on the shared network, providing complete network visibility.

Multi-port switches started to become popular in the mid 1990's and SPAN/mirror ports were added to switches to continue to allow probe-based monitoring. The increasing number of ports per switch and the increasing port speeds has made the use of probes a challenge. The need for embedded instrumentation was becoming clear.

In the late 1990's, Cisco introduced the NetFlow protocol, embedding L3-4 monitoring in routers and in 2001 the sFlow protocol was introduced, embedding L2-7 monitoring in switches. Interest in network visibility has accelerated the adoption of the sFlow standard among switch vendors, further limiting the role of probes.

The chart clearly shows the trend toward embedded monitoring. Google Insight for Search was used to trend the popularity of the search terms sFlow, NetFlow, RMON and Probe compared to overall searches relating to Network Monitoring & Management. The sFlow and NetFlow lines track closely and exceed general interest in Network Monitoring & Management, indicating that they are increasingly important topics. The RMON and Probe lines track closely with each other and show a rapid decline compared to Network Monitoring & Management, indicating declining interest in probes as monitoring shifts from probes to embedded instrumentation.

Current trends toward data center convergence increase the need for visibility and control. Complete network visibility is likely to involve both NetFlow and sFlow. NetFlow provides visibility into routers while sFlow extends visibility into the increasingly important switching layer, including: virtual servers, blade servers and edge switches.

Note: The overall downward trend in all the lines results from the increasing population of Internet users. As more people use the Internet, the proportion of Internet users interested in any one topic is diluted. This is particularly true of technical topics. In the past, the technical barriers to using the Internet skewed the user population and resulted in more searches relating to technical topics. Now that everyone is online, the majority of searches relate to more populist topics.

Wednesday, October 21, 2009

Network visibility

The chart, created using Google News Timeline, trends the growing interest in "network visibility" from 1990 to 2008. The dot-com bubble (2000) shows up as a clear spike, followed by a plateau. Starting in 2005 interest has accelerated with strong annual gains.

The sFlow standard, designed to provide network-wide visibility, was introduced in 2002 and the rapid growth in multi-vendor support for sFlow closely tracks this chart.

Finally, current trends toward data center convergence and virtualization create new management challenges that continue to drive interest in the visibility and control made possible by sFlow.

Tuesday, October 20, 2009

802.1aq and TRILL

There are a number of drivers increasing demand for bandwidth in the data center:

Multi-core processors and blade servers greatly increase computational density, creating a corresponding demand for bandwidth.
Networked storage increases demand for bandwidth.
Virtualization and server consolidation ensures that servers are fully utilized, further increasing demand for bandwidth.

Virtual machine mobility (e.g. VMWare vMotion, Citrix XenMotion or Xen Live Migration) require a large, flat layer 2 network so that machines can be moved without reconfiguring their network settings. The increasing size of the layer 2 network, combined with the increasing demand for bandwidth challenges the scalability of current Ethernet switching technologies.

The diagram illustrates the problem. Currently, Ethernet uses the spanning tree protocol to determine forwarding paths. The tree structure forces traffic to the network core, creating a bottleneck. In addition, the tree structure doesn't allow traffic to flow on backup links, further limiting usable bandwidth. An alternative forwarding technique (used by routers), is to select shortest paths through the network. Shortest path forwarding allows traffic flows to bypass the core, reducing the bottleneck. The added benefit of shortest path forwarding is its ability to make use of all the links, including backup links, further increasing capacity.

There are two emerging standards for shortest path forwarding in switches:

TRILL (Transparent Interconnect of Lots of Links)
IEEE 802.1aq (Shortest Path Bridging)

Both TRILL and 802.1aq use IS-IS (Intermediate System to Intermediate System) routing to select shortest paths through the network. The protocols are very similar, but are being proposed by two different standards bodies: TRILL is being developed by IETF and 802.1aq by the IEEE.

It is surprising to see the IETF working on a LAN bridging protocol. The IETF is responsible for Internet protocols (TCP/IP, routing) and the IEEE is responsible for LAN protocols (802.11, Ethernet, bridging/switching). Adopting the IEEE 802.1aq standard makes the most sense, since it will ensure interoperability with the IEEE data center bridging standards being developed to support FCoE and facilitate data center convergence.

Finally, while more efficient network topologies will help increase network capacity, the days of relying on network over-provisioning are over. Much tighter control of bandwidth is going to be required in order to cope with converged data center workloads. Selecting switches that support the sFlow standard provides the visibility and control needed to manage the increasing demand for bandwidth.

Saturday, October 17, 2009

SR-IOV

The diagram (source Intel: Virtual Machine Direct Connect (VMDc)) illustrates the close relationship between the network adapter and the virtual switch in providing networking to virtual machines. Currently most virtual server systems use a software virtual switch (vSwitch) to share access to the network adapter among the virtual machines (VMs).

The Single Root I/O Virtualization (SR-IOV) standard being implemented by 10G network adapter vendors provides hardware support for virtualization, allowing virtual machines to directly share the network adapter without involving the software virtual switch. Hardware sharing improves performance and frees CPU cycles to be used by the virtual machines. Software is still needed to configure and manage the switching function on the network adapter and integrate it with the management of the virtual machines. Virtual switch software is evolving, offloading performance critical functions to the network adapter, while continuing to provide the management interface.

Maintaining visibility and control of the edge is a critical component of an effective data center management strategy. Since virtual switches provide the first layer of switching in a virtualized environment, they comprise the network edge. Integrating the virtual switches into the overall network management system is essential.

Previously, the role of the VEPA protocol in integrating software virtual switches with hardware switches was discussed. VEPA support in the network adapter offers integration between a hardware switch and the network adapter. Most switch vendors support sFlow for traffic monitoring, and the combination of sFlow and VEPA would provide visibility and control of virtual machine traffic.

Ultimately, the evolving functionality of the network adapter/virtual switch is likely to deliver the visibility, performance, security and quality of service capabilities needed from the network edge. This trend is illustrated by the roadmap for the Open vSwitch, which includes support for both sFlow and SR-IOV on its roadmap, along with OpenFlow for control of the edge.

Wednesday, October 14, 2009

VEPA

A virtual switch is a software component of a virtual server, providing network connectivity to virtual machines (VMs). The challenge with virtual switches is integrating them into the rest of the network in order to maintain visibility and control.

The diagram shows how the emerging VEPA (Virtual Ethernet Port Aggregator) standard addresses this challenge by ensuring that packets from the virtual machines connected to the virtual switch (shown in green) also pass through an adjacent hardware switch (Bridge). In a blade server, the adjacent hardware switch would be the blade switch. If stand-alone servers are used, then the adjacent hardware switch would be the top of rack switch.

Passing traffic through the hardware switch offloads tasks such as rate limiting and access control lists (ACLs), simplifying the virtual switch and freeing CPU cycles that can be used by the virtual machines.

The sFlow standard is widely supported by switch vendors. Selecting blade switches and top of rack switches with sFlow and VEPA support will offer visibility and control of the network edge.

Thursday, October 8, 2009

Network edge

InMon's quota controller brings together many of the topics that have been discussed on this blog, clearly illustrating the role of network-wide visibility in achieving effective control of the network.

The diagram shows the basic elements: a centralized sFlow analyzer receives sFlow data from every switch in the network, producing a real-time, network-wide view of traffic and accurately tracking the network topology. A centralized controller enforces management policies by automatically applying configuration settings to the edge switches in order to control traffic. For more information, see Controlling Traffic for a detailed description of InMon's controller and its application to peer-to-peer (P2P) traffic control.

Generally, this level of control is only possible because of the timely and complete picture of the network state that sFlow monitoring provides. In control engineering terms, sFlow makes the network observable; an essential prerequisite for control.

An accurate picture of the network state allows controls to be targeted where they will be most effective and have the least impact on other traffic; the edge of the network. The alternative, measurement and control at the network core, achieved at the core switches and routers, or by channeling traffic through shared control points (e.g. firewalls, traffic shapers, etc.), can result in serious performance problems as busy core devices become overloaded by additional measurement and control tasks. In addition, control at the core is ineffective if the traffic doesn't cross the core. On the other hand, all traffic crosses the edge and control at the edge is scalable since the number of edge devices grows with the network, providing additional measurement (sFlow) and control capacity as the network grows.

Interestingly, the centralized visibility into switched traffic that sFlow provides is being paralleled by a move toward centralized control of switches (see OpenFlow). The combination of centralized visibility and centralized control of network traffic paths has the potential to revolutionize data center networking, delivering the performance, scalability and control needed to build large, converged data centers.

In order to achieve visibility and control in the data center, it is essential to ensure that the edge is fully observable and controllable. Data center convergence is shifting the network edge to include components of blade servers and virtual servers. Finally, the Open vSwitch project is interesting because it will offer visibility (sFlow) and control (OpenFlow) at the edge of the virtualized data center (currently including support for Xen/XenServer, KVM, and VirtualBox).

Monday, October 5, 2009

Management silos

Currently, most organizations split the management of the data center among different groups: networking, storage, systems and possibly regulatory compliance. Enterprise applications require resources from each of these functional areas and a failure in any of these areas can have a significant impact on the business. The strategy of splitting management responsibilities by functional area has worked because these functional areas have traditionally been loosely coupled and the data center environment has been relatively static.

The trend towards convergence of computing, storage and networking in order to create a more dynamic and efficient infrastructure makes these different functions richly dependent on each other, forcing a change in management strategy. For example, server virtualization means that a small change made by the systems group could have a major effect on network bandwidth. The increasing demand for bandwidth by networked storage accounts for a significant proportion of overall network bandwidth, again making the network vulnerable to changes made by the storage group. The recent Gmail outage illustrates the complex relationships between the elements needed to maintain services in a converged environment.

Convergence and interdependence between the resources in a converged data center requires a cross functional approach to management in order to ensure successful operations. The diagram shows the migration management silos in which each group monitors their own resources and uses their own management tools (but has very limited visibility into the other components of the data center), to an integrated management strategy in which all the components in the data center are monitored by a single measurement system that provides shared visibility into all the elements of the data center. Data center wide visibility is critical, ensuring that each group is aware of its impact on shared resources, eliminating finger pointing, and providing the information needed to take control of the data center.

The sFlow measurement technology, built into most vendors network equipment, ensures data center wide visibility of all resource, including switches, storage, servers, blade servers and virtual servers. As networks, systems and storage converge, the visibility provided by sFlow in the network provides an increasingly complete picture of all aspects of data center operations, delivering the converged visibility needed to manage the converged data center.

Data center control

Feedback control is a powerful technique for managing complex systems. Most people are familiar with examples of feedback control, even if they don't have a clear understanding of the underlying theory. For example, anti-lock brakes use feedback control to ensure that a driver can quickly stop the car without skidding and losing control. Sensors attached to each wheel send measurements to a controller that adjusts the amount of braking force applied to each wheel. If too much force is applied to a wheel and it starts to skid, then the sensor detects that the wheel is locking. The controller reduces brake force, reapplying it when it detects that the wheel is turning again.

In the data center, many tools are available to deploy configuration changes. What is often missing are the sensors to provide the data center wide visibility needed to determine where configuration changes (controls) should be applied and to assess the effectiveness of the changes. The sFlow sensors, built into network devices from most network vendors, in combination with an sFlow analyzer, provide the feedback that is essential to maintain effective control over the data center.

Saturday, October 3, 2009

Virtual servers

Convergence blurs the traditional line between the servers and the network. In order to maintain visibility and control, it is important to identify and monitor all the switches in the converged network. Previously, the importance of maintaining visibility while migrating to blade servers was discussed. Maintaining visibility while virtualizing servers creates similar challenges.

The diagram shows the migration of multiple stand-alone servers connected to a stand-alone switch to a single server running multiple virtual machines. In this transition, where did the switch go? Popular virtual server systems such as VMWare® and Xen® make use of software "virtual switches" to connect virtual machines together and to the network. Using sFlow to monitor the virtual switches ensures that the benefits of virtualization can be realized without losing the visibility into network traffic that is essential for network troubleshooting, traffic accounting and security.

Currently, software probes are required to monitor virtual switches. However, the approach of using software probes has similar limitations to using probes to monitor physical switches: probes have limited performance and the installation and configuration of probes adds complexity to the task of managing the network. To provide a truly scalable solution, visibility must be an integral part of every switch, physical or virtual. The need for visibility is evident to virtualization vendors and virtual switches will soon be available with built-in sFlow support.

Multi-vendor support for sFlow ensures that all the layers in the network, from virtual switches, blade switches, top of rack switches and core switches can be monitored using a single technology. Convergence to high speed switched Ethernet unifies LAN and SAN connectivity and convergence to sFlow for traffic monitoring provides the network-wide visibility needed manage and control the unified network.

Friday, October 2, 2009

Gmail outage

The Thursday, September 24th service outage with Google Gmail was widely reported (see Google Gmail Users Hit With Another Service Disruption, The Wall Street Journal).

On Friday, September 25th Google published an incident report, Google Apps Incident Report, that describes some of the factors leading to the failure. The report makes interesting reading, concluding that the root cause was a high load on the Contacts service and that this load was the result of a combination of the following:

A network issue in a data center, which caused additional load on the Contacts service
A very high utilization of the Contacts service
An update to Gmail that inadvertently increased the load on the Contacts service

This incident demonstrates the complex dependencies between the networking and computing components in a cloud computing environment. Data center wide visibility helps avoid this type of collapse, discovering dependencies and identifying capacity problems early enough to allow proactive action to be taken. When a service failure does happen, visibility is critical for quickly identifying the problem and targetting the controls needed to mitigate the failure.

Thursday, October 1, 2009

Blade servers

The need for network visibility in the data center and the challenge posed by converged networking and networked storage can be managed if all the switches in the data center include the sFlow monitoring standard that most vendors support.

The diagram shows the migration from stand-alone servers connected to a stand-alone switch to a blade server (collapsing the discrete servers into blades within a common chassis). In this transition, where did the switch go? Typically a blade server also encloses a blade switch that provides network connectivity to each of the blades, connecting them to each other and to the rest of the data center. The management of the blade switch may be integrated into the blade server manager and the blade switch may not be described as a switch, but it serves the function of a switch (providing connectivity by directing Ethernet packets).

Convergence blurs the traditional line between the servers and the network and it is important to identify and monitor all the switches in order to maintain visibility and control of the network. In the case of a blade server, select a blade switch with sFlow in order to manage the growing demand for bandwidth that comes with data center consolidation. Ask your blade server vendor about sFlow and select a networking solution that provides the performance, visibility and control needed to successfully operate a converged data center.