Monday, March 22, 2021

In-band Network Telemetry (INT)

The recent addition of in-band streaming telemetry (INT) measurements to the sFlow industry standard simplifies deployment by addressing the operational challenges of in-band monitoring.

The diagram shows the basic elements of In-band Network Telemetry (INT) in which the ingress switch is programmed to insert a header containing measurements to packets entering the network. Each switch in the path is programmed to append additional measurements to the packet header. The egress switch is programmed to remove the header so that the packet can be delivered to its destination. The egress switch is responsible for processing the measurements or sending them on to analytics software.

There are currently two competing specifications for in-band telemetry:

  1. In-band Network Telemetry (INT) Dataplane Specification
  2. Data Fields for In-situ OAM

Common telemetry attributes from both standards include:

  1. node id
  2. ingress port
  3. egress port
  4. transit delay (egress timestamp - ingress timestamp)
  5. queue depth

Visibility into network forwarding performance is very useful, however, there are practical issues that should be considered with the in-band telemetry approach for collecting the measurements:

  1. Transporting measurement headers is complex with different encapsulations for each transport protocol:  Geneve, VxLAN, GRE, UDP, TCP etc.
  2. Addition of headers increases the size of packets and risks causing traffic to be dropped downstream due to maximum transmission unit (MTU) restrictions.
  3. The number of measurements that can be added by each switch and the number of switches adding measurements in the path needs to be limited.
  4. In-band telemetry cannot be incrementally deployed. Ideally, all devices need to participate, or at a minimum, the ingress and egress devices need to be in-band telemetry aware.
  5. In-band telemetry transports data from the data plane to the control/management planes, providing a potential attack surface that could be exploited by crafting malicious packets with fake measurement headers.
  6. There is no standard mechanism for transporting measurements from the egress switch for analysis.
  7. There is no data model to link in-band telemetry to other sources of data (NETCONF, SNMP, etc.)

The sFlow Transit Delay Structures extension addresses these issues by defining how the in-band network telemetry attributes can be exported in real-time using the industry standard sFlow protocol.

The sFlow architecture, shown at the top of this article, provides an out of band alternative for transporting the per packet forwarding plane measurements. The switch ASIC attaches performance measurements as metadata to sampled packets sent to the sFlow Agent instead of adding the measurements to the egress packet. The sFlow Agent immediately forwards the additional packet metadata as part of the standard sFlow telemetry stream to a central sFlow analyzer. The sFlow Analyzer provides a real-time view of the performance of the entire network.

Using sFlow as the telemetry transport has a number of benefits:

  1. Simple to deploy since there is no modification of packets (no issues with encapsulations, MTU, number of measurements, path length, incremental deployment, etc.)
  2. Extensibility of sFlow protocol allows additional forwarding plane measurements to augment existing sFlow measurements, fully integrating the new measurements with sFlow data exported from other switches in the network (Arista, Aruba, Cisco, Dell, Huawei, Juniper, etc.)
  3. sFlow's is a unidirectional telemetry transport protocol originates from the device management plane, can be sent out of band, limiting possible attack surfaces.
  4. Measurements are delivered in real-time directly to the sFlow Analyzer.
  5. sFlow data model links telemetry to external data (SNMP, NETCONF, OpenConfig, etc.)

Transit delay and queueing describes the new sFlow measurements in more detail and demonstrates a working implementation. The instrumentation to support these measurements is widely available in current generation network ASICs. If you are interested in visibility into network performance, ask your network vendor about their plans to implement the sFlow Transit Delay Structures extension.

No comments:

Post a Comment