Monday, February 14, 2022

UDP vs TCP for real-time streaming telemetry

This article compares UDP and TCP and their suitability for transporting real-time network telemetry. The results obtained demonstrate that poor throughput and high message latency in the face of packet loss makes TCP unsuitable for providing visibility during congestion events. We demonstrate that the use of UDP transport by the sFlow telemetry standard overcomes the limitations of TCP to deliver robust real-time visibility during extreme traffic events when visibility is most needed.
Summary of the AWS Service Event in the Northern Virginia (US-EAST-1) Region, "This congestion immediately impacted the availability of real-time monitoring data for our internal operations teams, which impaired their ability to find the source of congestion and resolve it." December 10th, 2021

The data in these charts was created using Mininet to simulate packet loss in a simple network. If you are interested in replicating these results, Multipass describes how to run Mininet on your laptop.

sudo mn --link tc,loss=5

For example, the above command simulates a simple network consisting of two hosts connected by a switch. A packet loss rate of 5% is configured for each link.

Simple Python scripts running on the simulated hosts were used to simulate transfer of network telemetry.

#!/usr/bin/env python3

import socket
import time
import sys
import struct

HOST = ''
PORT = 65432

buf = bytearray(1000)
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
  s.connect((HOST, PORT))
  while True:
    now = time.time()
    buf[0:8] = struct.pack('>d',now)

The above script builds a 1000 byte timestamped message, sends the message, and sleeps for 0.01 seconds, and repeats the steps in a continuous loop. This level of message traffic is typical of what you might see from an sFlow agent embedded in a datacenter switch on a moderately busy network.

#!/usr/bin/env python3

import socket
import time
import struct

HOST = ''
PORT = 65432 

def recv_msg(sock,n):
  data = bytearray()
  while len(data) < n:
    packet = sock.recv(n - len(data))
    if not packet:
      return None
  return data

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
  s.bind((HOST, PORT))
  conn, addr = s.accept()
  with conn:
    while True:
      data = recv_msg(conn,1000)
      if not data:
      time_sent = struct.unpack('>d',data[0:8])[0]
      time_now = time.time()
      print("%f %f" % (time_now,time_now - time_sent), flush=True)

The above script accepts a connection from the sender, reads messages, and computes latency using the encoded sender timestamp.

The average delay is a constant 160 ┬ÁS for UDP. TCP message delay exceeds 1 second average at 14% packet loss and averages 30 seconds at 20% packet loss.

The maximum measurement delay chart tells a dire story. While UDP never sees a message delay greater than 8 mS, maximum TCP message delay reaches 14 minutes at 20% packet loss!

Both TCP and UDP achieve 96 messages per second throughput at 0% packet loss. UDP throughput declines linearly with packet loss as one would expect, dropping to 77 messages per second at 20% packet loss. The throughput of TCP stays at 96 message per second until the packet loss rate reaches 10%, where it drops to 90 messages per second. At a packet loss of 12% the throughput of TCP / UDP is equal at 85 messages per second. TCP throughput drops off rapidly and falls to 10 messages per second at 20% packet loss.

The results demonstrate that TCP based telemetry fails to provide adequate visibility in the face of packet loss, with a steep drop in throughput and a dramatic increase in delay that makes the measurements useless for real-time troubleshooting and automated control. UDP based telemetry on the other hand provides consistently low message delay and maintains high throughput at the cost of moderate message loss.

The sFlow standard is designed for use with UDP, specifying measurements that degrade gracefully in the presence of expected packet loss:

  1. Monotonic Counters For example, the total number of packets and bytes sent and received on an interface are sent, leaving it up to the sFlow analyzer to compute packet rates and interface utilizations. If a message is lost there is no need for it to be retransmitted since the next message will contain the current value of the counter.
  2. Packet samples sFlow's packet sampling mechanism treats record loss as a decrease in the sampling probability. The sFlow records contain information that allows the traffic analyzer to measure the effective sampling rate, compensate for the packet loss, and generate corrected values. Each sFlow record represents a single packet event and large flows of traffic will generate a number of sFlow records. Thus, the loss of an sFlow record does not represent a significant loss of data and doesn't affect the overall accuracy of traffic measurements.

In addition, recent extensions to the sFlow protocol enhance visibility into sources of delay and packet loss in the network, see: Transit delay and queueing and Real-time trending of dropped packets. As we have demonstrated, services running in the datacenter that rely on TCP will be severely impacted by packet loss and so the ability to monitor and control packet loss is essential for maintaining service levels.

The following articles discuss how sFlow relates to, and complements, emerging telemetry standards:

Deploying an sFlow monitoring solution is straightforward since sFlow is widely implemented in datacenter equipment from vendors including: A10, Arista, Aruba, Cisco, Edge-Core, Extreme, Huawei, Juniper, NEC, Netgear, Nokia, NVIDIA, Quanta, and ZTE. In addition, the open source Host sFlow agent makes it easy to extend visibility beyond the physical network into the compute infrastructure.

The diagram shows the high level architecture for real-time datacenter-wide sFlow analytics. Telemetry continuously streams from all network, host and application instances to a central analyzer which constructs a real-time view of performance, identifying hot spots, and driving automated remediation. In this case the sFlow-RT analytics engine continuously monitors tens of thousands of industry standard sFlow telemetry sources and responds within a seconds to performance challenges.

Finally, a couple of popular use cases give an idea of the solutions that can be built on sFlow real-time telemetry.

DDoS protection quickstart guide describes how to deploy sFlow along with BGP RTBH/Flowspec to automatically detect and mitigate DDoS flood attacks. The use of sFlow provides sub-second visibility into network traffic during the periods of high packet loss experienced during a DDoS attack. The result is a system that can reliably detect and respond to attacks in real-time.

Flow metrics with Prometheus and Grafana describes how to import user defined flow metrics into the Prometheus time series data base and build real-time dashboards using Grafana. Here gain, reliable, low latency measurements during periods of packet loss ensure that operators have the information needed to find the root cause and quickly respond.


  1. Hello Peter,

    How did you generate these graphs?


    1. I analysed the logs to compute the statistics for each loss rate an created the charts using Numbers on an Mac.