sFlow: June 2009

Friday, June 26, 2009

Sampling rates

A previous posting discussed the scalability and accuracy of packet sampling and the advantages of packet sampling for network-wide visibility.

Selecting a suitable packet sampling rate is an important part of configuring sFlow on a switch. The table gives suggested values that should work well for general traffic monitoring in most networks. However, if traffic levels are unusually high the sampling rate may be decreased (e.g. use 1 in 5000 instead of 1 in 2000 for 10Gb/s links).

Configure sFlow monitoring on all interfaces on the switch for full visibility. Packet sampling is implemented in hardware so all the interfaces can be monitored with very little overhead.

Finally, select a suitable counter polling interval so that link utilizations can be accurately tracked. Generally the polling interval should be set to export counters at least twice as often as the data will be reported (see Nyquist-Shannon sampling theory for an explanation). For example, to trend utilization with minute granularity, select a polling interval of between 20 and 30 seconds. Don't be concerned about setting relatively short polling intervals; counter polling with sFlow is very efficient, allowing more frequent polling with less overhead than is possible with SNMP.

Tuesday, June 23, 2009

sFlow MIB

Configuring switches through the switch command line interface (CLI) can be complex and time consuming, especially when monitoring needs to be configured on every switch in order to achieve network-wide visibility.

The sFlow MIB provides a way for an sFlow analysis application to use SNMP to automatically configure sFlow settings on the switches that it wants to monitor. Since the sFlow MIB is an optional part of the sFlow standard, not all sFlow capable switches can be configured using SNMP. However, HP ProCurve and Alcatel-Lucent switches support the sFlow MIB making it easy to quickly try out sFlow monitoring using the free sFlowTrend application. The screen capture above shows the sFlowTrend setting needed to enable SNMP configuration of sFlow.

Many traffic analyzers that claim sFlow support do not support the sFlow MIB. If you have switches that support the sFlow MIB, then selecting an analyzer that supports the sFlow MIB will ensure a successful deployment.

Future posts on this blog will describe the configuration commands needed to enable sFlow on additional vendor's switches.

Tuesday, June 16, 2009

Trying out sFlow

If you are interested in network-wide visibility and want to start experimenting with sFlow, take a look at your network and see if any of the switches are sFlow capable. Most switch vendors support sFlow, including: Brocade, Hewlett-Packard, Juniper Networks, Extreme Networks, Force10 Networks, 3Com, D-Link, Alcatel-Lucent, H3C, Hitachi, NEC AlaxalA, Allied Telesis and Comtec (for a complete list of switches, see sFlow.org).

If you don't already have switches with sFlow support, consider purchasing a switch to experiment with. There are a number of inexpensive switches with sFlow support (check the list of switches on sFlow.org), alternatively you may be able to pick up a used switch on eBay.

Finally, the open source Host sFlow agent can be used to host traffic and traffic between virtual machines on a virtual server (Xen®, VMware®, KVM).

Once you have access to a source of sFlow data, you will need an sFlow analyzer. The sFlowTrend application (shown above) is a free, purpose built, sFlow analyzer that will allow you to try out the full range of sFlow functionality, including:

decoding and filtering on data from packet headers (including VLANs, priorities, MAC addresses, Ethernet types, as well as TCP/IP fields)
accurate analysis, trending and reporting of packet samples
trending of sFlow counters
support for sFlow MIB to automatically configure sFlow on switches

Many traffic analyzers claim support for sFlow, but provide only partial support. It is worth starting with sFlowTrend to see the full capabilities of sFlow and to gain experience with sFlow monitoring before evaluating larger scale solutions.

Future posts on this blog will use sFlowTrend to demonstrate how sFlow monitoring can be used to solve common network problems. Downloading a copy of sFlowTrend will allow you to try the different strategies on your own network.

Saturday, June 6, 2009

Choosing an sFlow analyzer

sFlow achieves network-wide visibility by shifting complexity away from the switches to the sFlow analysis application. Simplifying the monitoring task for the switch makes it possible to implement sFlow in hardware, providing wire-speed performance, without increasing the cost of the switch. However, the shift of complexity to the sFlow analysis application makes the selection of the sFlow analyzer a critical factor in realizing the full benefits of sFlow monitoring.

To illustrate some of the features that you should look for in an sFlow analyzer, consider the following basic question, "Which hosts are generating the most traffic on the network?" The chart provides information that answers the question, displaying the top traffic sources and the amount of traffic that they generate. In order to generate this chart, the sFlow analyzer needs to support the following features:

Since the busiest hosts in the network could be anywhere, the sFlow analyzer needs to monitor every link in the network to accurately generate the chart.
Traffic may traverse a number of monitored switch ports, in the example above, traffic between hosts A and B is monitored by 10 switch ports. In order to correctly report on the amount of traffic by host, the sFlow analyzer needs to combine data from the different switch ports in a way that correctly calculates the traffic totals and avoids under or over counting.
The sFlow analyzer must fully support sFlow's packet sampling mechanism in order to accurately calculate traffic volumes.
Notice that the chart contains IPv4, IPv6 and MAC addresses. The sFlow analyzer needs to be able to decode packet headers and report on all the protocols in use on the network, including layer 2 and layer 3 traffic. Traffic on local area networks (LANs) is much more diverse than routed wide area network (WAN) traffic. In addition to the normal TCP/IP traffic seen on the WAN, LAN traffic can include multicast, broadcast, service discovery (Bonjour), host configuration (DHCP), printing, backup and storage traffic not typically seen on the WAN.

When selecting an sFlow analyzer, try to arrange an evaluation and test the product on a full scale production network. Evaluating scalability and accuracy is not something that is easily performed in a test lab.

Monday, June 1, 2009

Accuracy and packet loss

Traffic records are often lost:

A switch must reliably perform it's primary function of forwarding packets, so if there is any contention for resources in the switch, measurement records will be discarded.
There will inevitably be some loss of measurement records as they are transferred over the network from the switches to the traffic analyzer. Again, measurement traffic is a low priority and may be discarded if the network is busy.
Finally, a traffic analyzer may lose traffic records if larger numbers of switches are being monitored and records are arriving faster than they can be processed.

The chart shows the effect of lost records on the accuracy of sFlow and NetFlow monitoring:

NetFlow has no mechanism to compensate for lost records. If NetFlow records are lost then traffic will be underreported. The greater the number of records lost, the lower the reported traffic. The bursty and unpredictable traffic produced by NetFlow monitoring increases the likelihood that NetFlow records will be lost. The loss of even one NetFlow record can significantly affect accuracy since a single flow record may summarize a large transfer of data and represent a substantial fraction of the overall network traffic.
sFlow's packet sampling mechanism treats record loss as a decrease in the sampling probability. The sFlow records contain information that allows the traffic analyzer to measure the effective sampling rate, compensate for the packet loss, and generate corrected values. Each sFlow record represents a single packet event and large flows of traffic will generate a number of sFlow records. Thus, the loss of an sFlow record does not represent a significant loss of data and doesn't affect the overall accuracy of traffic measurements.

Underreporting traffic, particularly during peak periods is a serious problem for troubleshooting, congestion management and traffic engineering applications. For usage-based billing applications, underreported traffic represents lost revenue.

When monitoring using NetFlow and sFlow to achieve network-wide visibility, situating the traffic analyzer near the NetFlow sources will help reduce the loss of flow records and improve accuracy.