Thursday, July 25, 2024

Dropped packet notifications with Arista Networks

Visibility into dropped packets is essential for Artificial Intelligence/Machine Learning (AI/ML) workloads, where a single dropped packet can stall large scale computational tasks, idling millions of dollars worth of GPU/CPU resources, and delaying the completion of business critical workloads. Enabling real-time sFlow telemetry provides the observability into traffic flows and packet drops needed to effectively manage these networks.

The availability of the Arista EOS 4.31.4M maintenance release brings sFlow dropped packet monitoring (previously demonstrated using the 4.30.1F feature release - see SC23 Dropped packet visibility demonstration) to production networks, see EOS Life Cycle Policy
sflow sampling 50000
sflow polling-interval 20
sflow vrf mgmt destination 203.0.113.100
sflow vrf mgmt source-interface Management0
sflow run
The above Arista EOS commands enable sFlow counter polling and packet sampling on all ports, sending the sFlow telemetry to the sFlow analyzer at 203.0.113.100
flow tracking mirror-on-drop
  sample limit 100 pps
  !
  tracker SFLOW
    exporter SFLOW
      format sflow
      collector sflow
      local interface Management0
  no shutdown
The above commands add sFlow Dropped Packet Notification Structures to the sFlow telemetry feed using Broadcom Mirror on Drop (MoD) instrumentation. Broadcom implements mirror-on-drop in Jericho 2, Trident 3, and Tomahawk 3, or later ASICs.
In this example, the sFlow-RT real-time analytics engine receives sFlow telemetry from switches/routers and creates metrics to drive the real-time Grafana dashboard shown at the top of the article. Deploy real-time network dashboards using Docker compose describes how to quickly deploy a monitoring stack consisting of sFlow-RT, a Prometheus time series database, and Grafana dashboards.

No comments:

Post a Comment