Visibility into dropped packets is essential for Artificial Intelligence/Machine Learning (AI/ML) workloads, where a single dropped packet can stall large scale computational tasks, idling millions of dollars worth of GPU/CPU resources, and delaying the completion of business critical workloads. Enabling real-time sFlow telemetry provides the observability into traffic flows and packet drops needed to effectively manage these networks.
The availability of the Arista EOS 4.31.4M maintenance release brings sFlow dropped packet monitoring (previously demonstrated using the 4.30.1F feature release - see SC23 Dropped packet visibility demonstration) to production networks, see EOS Life Cycle Policysflow sampling 50000 sflow polling-interval 20 sflow vrf mgmt destination 203.0.113.100 sflow vrf mgmt source-interface Management0 sflow runThe above Arista EOS commands enable sFlow counter polling and packet sampling on all ports, sending the sFlow telemetry to the sFlow analyzer at 203.0.113.100
flow tracking mirror-on-drop sample limit 100 pps ! tracker SFLOW exporter SFLOW format sflow collector sflow local interface Management0 no shutdownThe above commands add sFlow Dropped Packet Notification Structures to the sFlow telemetry feed using Broadcom Mirror on Drop (MoD) instrumentation. Broadcom implements mirror-on-drop in Jericho 2, Trident 3, and Tomahawk 3, or later ASICs. In this example, the sFlow-RT real-time analytics engine receives sFlow telemetry from switches/routers and creates metrics to drive the real-time Grafana dashboard shown at the top of the article. Deploy real-time network dashboards using Docker compose describes how to quickly deploy a monitoring stack consisting of sFlow-RT, a Prometheus time series database, and Grafana dashboards.
No comments:
Post a Comment