Sunday, May 8, 2011

NetFlow lite

Netflow-lite is a recently released packet sampling technology for the Cisco Catalyst 4948E Ethernet switch. The technology is described in Configuring Netflow-lite and at first glance Netflow-lite appears to be random 1 in N packet sampling, exporting sampled packets as IPFIX or Netflow V9 records.

However, looking more closely, the following description raises some questions:

The packet sampling mechanism tries to achieve random 1-in-N sampling. The accuracy of the algorithm is dependent on the size of the packets arriving at a given interface. To tune the relative accuracy of the algorithm, use the average-packet-size parameter. The whole system supports a maximum of 200 monitors.

The system automatically determines the average packet size at an interface based on observation of input traffic and uses that value in rate DBL sampling.

The acronym DBL refers to Cisco's Dynamic Buffer Limiting, a technology for congestions avoidance (see Quality of Service on Cisco Catalyst 4500 Series Supervisor Engines, page 16, for a description). It is clear that the 4948E doesn't have hardware support for 1 in N packet sampling. Instead, an estimate of average packet size is required to allow hardware designed for monitoring data rates to be used to select packets to sample.

This mechanism raises red flags regarding Netflow-lite accuracy, particularly in a data center top of rack setting where 4948E switches are intended to be deployed:
  1. Average packet size. Average packet sizes are highly variable. For example, network storage traffic generates large data packets (as large as 9000 byte jumbo frames) in one direction and small acknowledgement packets in the other (as small as 64 bytes).  Further complicating matters, traffic patterns change quickly, for example changing from storage read to write operations. Thus, the estimate of average packet size from one interval is likely to be a poor estimate of the packet size in the next interval, resulting in large errors in the Netflow-lite measurements. Finally, each class of traffic has its own characteristic packet size distribution. Relying on a single average packet size estimate to control sampling is likely to bias results for classes of traffic that have smaller or larger average packet sizes.
  2. Time. The rate based mechanism of DBL introduces time into the sampling process (since rates are a measure of traffic over some time interval). Time-based sampling methods yield inaccurate results for many typical network traffic patterns.
Before relying on Netflow-lite measurements, it is prudent to run tests to verify accuracy in a production setting. Testing accuracy using a traffic generator is likely to mask errors that will be apparent when monitoring the irregular traffic patterns seen in a production setting.

Netflow-lite exposes an important difference between IPFIX/NetFlow and sFlow. While IPFIX/NetFlow specify the format of data transmitted from a measurement device to a collector, they do not specify how the measurements should be made. The result is that seemingly identical data from different vendor's switches (or even different models of switch from a single vendor) can represent measurements made using very different methodologies. These differences make it very difficult to rely on the measurements or compare data between different devices. In contrast, sFlow standardizes how traffic is measured, ensuring that every device supporting the sFlow standard performs sampling in a standard way, yielding accurate and consistent results.

The article, Complexity kills, describes how sFlow's standard measurements simplify large scale monitoring of data center traffic. Accurate traffic measurements are increasingly important as convergence and virtualization place greater demands on network capacity. For additional information, the Data center convergence, visibility and control presentation describes the critical role that measurement plays in managing costs and optimizing performance.

No comments:

Post a Comment