Monday, June 1, 2009

Accuracy and packet loss


Traffic records are often lost:
  1. A switch must reliably perform it's primary function of forwarding packets, so if there is any contention for resources in the switch, measurement records will be discarded.
  2. There will inevitably be some loss of measurement records as they are transferred over the network from the switches to the traffic analyzer. Again, measurement traffic is a low priority and may be discarded if the network is busy.
  3. Finally, a traffic analyzer may lose traffic records if larger numbers of switches are being monitored and records are arriving faster than they can be processed.
The chart shows the effect of lost records on the accuracy of sFlow and NetFlow monitoring:
  1. NetFlow has no mechanism to compensate for lost records. If NetFlow records are lost then traffic will be underreported. The greater the number of records lost, the lower the reported traffic. The bursty and unpredictable traffic produced by NetFlow monitoring increases the likelihood that NetFlow records will be lost. The loss of even one NetFlow record can significantly affect accuracy since a single flow record may summarize a large transfer of data and represent a substantial fraction of the overall network traffic.
  2. sFlow's packet sampling mechanism treats record loss as a decrease in the sampling probability. The sFlow records contain information that allows the traffic analyzer to measure the effective sampling rate, compensate for the packet loss, and generate corrected values. Each sFlow record represents a single packet event and large flows of traffic will generate a number of sFlow records. Thus, the loss of an sFlow record does not represent a significant loss of data and doesn't affect the overall accuracy of traffic measurements.
Underreporting traffic, particularly during peak periods is a serious problem for troubleshooting, congestion management and traffic engineering applications. For usage-based billing applications, underreported traffic represents lost revenue.

When monitoring using NetFlow and sFlow to achieve network-wide visibility, situating the traffic analyzer near the NetFlow sources will help reduce the loss of flow records and improve accuracy.

4 comments:

  1. Regarding your comment "NetFlow has no mechanism to compensate for lost records." Are you familiar with NetFlows FlowSequence number which indicates missed flow records?

    Both technologies have their merits. Seeing how this is sflow.com it would be great if you focused on sFlow and why it is great rather than why it is better than NetFlow. I for one like both and don't believe one is better than the other or that this blog will be unbiased in its opinions when comparing the two.

    ReplyDelete
  2. I am familiar with NetFlow sequence numbers. NetFlow sequence numbers let you detect that data has been lost, but they don't provide a way to compensate for the lost records. Understanding that NetFlow accuracy is sensitive to packet loss is an important consideration when configuring NetFlow. Care should be taken to check for lost NetFlow records and NetFlow configuration settings may need to be tuned to minimize loss.

    I agree that both NetFlow and sFlow have their merits, but they are very different technologies and have different strengths and weaknesses. Understanding the differences is important if you want to pick the best solution for a given application. If you want to monitor thousands of 10G Ethernet switch ports in a data center then sFlow is clearly the best choice. If you want to monitor routed DS3 links and collect billing data then NetFlow may well be the best choice.

    Arguing about whether sFlow or NetFlow is better makes as much sense as arguing whether switching or routing is better, or whether Ethernet or IP is better. In practice people use all of the technologies because each solves a distinct problem and plays an important role in the overall network design.

    ReplyDelete
  3. If you want to read about the differences between NetFlow and sFlow, I would check out this blog: http://www.networkworld.com/community/node/29117

    ReplyDelete
  4. Thank you for the link. I read the article with interest and have the following comments:

    The blog entry contains a limited comparison of sFlow and NetFlow data from a single link (the topic of network-wide scalability isn't discussed). The article does demonstrate some important differences, sFlow reports on IP and non-IP traffic, NetFlow is IP only. The time smearing effect of NetFlow monitoring and its effect in delaying measurements and distorting utilization trends is also demonstrated.

    Unfortunately, problems with the experimental setup make it hard to draw any conclusions about accuracy from the results. Problems with the methodology, some of which were identified in the article, include: the lack of an accurate reference (how can you compare measurement technologies if you don't know the correct answer?), problems with the NetFlow and sFlow configuration in the switches (flow timeouts, sampling rates) and finally problems with the scaling of sampled data (see the comments at the end of the article).

    ReplyDelete