sFlow

Monday, February 23, 2026

Real-time visualization of AI / ML traffic matrix

Heatmap is available on GitHub. The application provides a real-time traffic matrix visualization of end-to-end traffic flowing across an Ethernet fabric. Each axis represents an ordered list of network addresses. The x-axis is a flow source and the y-axis is a flow destination.

For example, the Heatmap above comes from a large high performance compute cluster running a mixture of tasks. Traffic is concentrated along the diagonal, indicating that the job scheduler is packing related tasks in racks so that most traffic is confined to the rack.

Note: Live Dashboards links to a number dashboards showing live traffic, including the Heatmap above.

The next Heatmap shows a very different traffic pattern. In this case, RoCEv2 traffic generated by GPUs performing a NCCL AllReduce/AllGather collective operation using a ring algorithm. During the collective operation, each GPU sends data to its immediate neighbor (modulo the number of GPUs) in a logical ring, resulting in two nearly continuous lines on either size of the diagonal: one for forward traffic, and the other for return traffic associated with each flow.

The final example comes from a large data center hosting a mix of front end workloads. Unlike the backend networks, this network combines internal (East/West) traffic with external (North/South) traffic flows. The internal traffic flows are contained in the central grid. The surrounding borders display external traffic.

The full range of IP addresses (0.0.0.0 - 255.255.255.255) is displayed on the heatmap using a piecewise linear scaling function. A start and end address identifies internal traffic and maps to values in the central grid and addresses outside this range are scaled to fit in the borders insets.

Representing the traffic matrix in the form of a heat map scales well to very large networks and provides real-time insight into shifting traffic patterns as workloads change. The industry standard sFlow instrumentation in data center switches used to construct the traffic matrix also scales to the large number of switches and 400/800G port speeds found in AI/ML backend networks.

Tuesday, January 13, 2026

Exporting events to Loki

Grafana Loki is an open source log aggregation system inspired by Prometheus. While it is possible to use Loki with Grafana Alloy, a simpler approach is to send logs directly using the Loki HTTP API.

The following example modifies the ddos-protect application to use sFlow-RT's httpAsync() function to send events to Loki's HTTP API.

var lokiPort = getSystemProperty("ddos_protect.loki.port") || '3100';
var lokiPush = getSystemProperty("ddos_protect.loki.push") || '/loki/api/v1/push';
var lokiHost = getSystemProperty("ddos_protect.loki.host");

function sendEvent(action,attack,target,group,protocol) {
  if(lokiHost) {
    var url = 'http://'+lokiHost+':'+lokiPort+lokiPush;
    lokiEvent = {
      streams: [
        {
          stream: {
            service_name: 'ddos-protect'
          },
          values: [[
            Date.now()+'000000',
            action+" "+attack+" "+target+" "+group+" "+protocol,
            {
              detected_level: action == 'release' ? 'INFO' : 'WARN',
              action: action,
              attack: attack,
              ip: target,
              group: group,
              protocol: protocol
            }
          ]]
        }
      ]
    };
    httpAsync({
      url: url,
      headers: {'Content-Type':'application/json'},
      operation: 'POST',
      body: JSON.stringify(lokiEvent),
      success: (response) => { 
        if (200 != response.status) {
          logWarning("DDoS Loki status " + response.status);
        }
      },
      error: (error) => {
        logWarning("DDoS Loki error " + error);
      }
    });
  }

  if(syslogHosts.length === 0) return;

  var msg = {app:'ddos-protect',action:action,attack:attack,ip:target,group:group,protocol:protocol};
  syslogHosts.forEach(function(host) {
    try {
      syslog(host,syslogPort,syslogFacility,syslogSeverity,msg);
    } catch(e) {
      logWarning('DDoS cannot send syslog to ' + host);
    }
  });
}

The highlighted code extends the existing scripts/ddos.js script to add Loki support.

Add a panel to integrate the Loki log into the Grafana sFlow-RT DDoS Protect dashboard as shown at the top of this page.

DDoS protection quickstart guide describes how to set up a DDoS mitigation solution using sFlow-RT.

Saturday, November 15, 2025

SC25: SDSC Expanse cluster live AI/ML metrics

The SDSC Expanse cluster live AI/ML metrics dashboard is a joint InMon / San Diego Supercomputer Center (SDSC) demonstration at The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC25) conference being held this week in St. Louis, November 16-21. Click on the dashboard link during the show to see live traffic.

By default, the dashboard shows the Last 24 Hours of traffic. Explore the data: select Last 30 Days to get a long term view, select Last 5 Minutes to get an up to the second view, click on items in a chart legend to show selected metric, drag to select an interval and zoom in.

The Expanse cluster at the San Diego Supercomputer Center is a batch-oriented science computing gateway serving thousands of users and a wide range of research projects, see Google News for examples.

The SDSC Expanse cluster live AI/ML metrics dashboard displays real-time metrics for workloads running on the cluster:

Total Traffic Total traffic entering fabric
Cluster Services Traffic associated with Lustre, Ceph and NFS storage, and Slurm workload management
Core Link Traffic Histogram of load on fabric links
Edge Link Traffic Histogram of load on access ports
RDMA Operations Total RDMA operations
RDMA Avg. Bytes per Operation Average RDMA operation size
Infiniband Operations Total RoCEv2 Infiniband operations broken out by type
Compute / Exchange Interval Detected period of compute / exchange activity on fabric
Congestion Notification Messages Total ECN / CNP congestion messages
Infiniband Ack. Credits Average number of credits in RoCEv2 Infiniband acknowledgements
Packet Discards Total ingress / egress discards
Packet Errors Total ingress / egress errors

AI Metrics with Prometheus and Grafana describes how to quickly set up the monitoring stack for your own AI / ML network using industry standard telemetry from leading switch vendors (Arista, Cisco, Dell, Edge-Core, Juniper, HPE, NVIDIA, SONiC etc.).

Monday, November 3, 2025

Ultra Ethernet Transport

The Ultra Ethernet Consortium has a mission to Deliver an Ethernet based open, interoperable, high performance, full-communications stack architecture to meet the growing network demands of AI & HPC at scale. The recently released UE-Specification-1.0.1 includes an Ultra Ethernet Transport (UET) protocol with similar functionality to RDMA over Converged Ethernet (RoCEv2).

The sFlow instrumentation embedded as a standard feature of data center switch hardware from all leading vendors (Arista, Cisco, Dell, Juniper, NVIDIA, etc.) provides a cost effective solution for gaining visibility into UET traffic in large production AI / ML fabrics.

docker run -p 8008:8008 -p 6343:6343/udp sflow/prometheus

The easiest way to get started is to use the pre-built sflow/prometheus Docker image to analyze the sFlow telemetry. The chart at the top of this page shows an up to the second view of UET operations using the included Flow Browser application, see Defining Flows for a list of available UET attributes. Getting Started describes how to set up the sFlow monitoring system.

Flow metrics with Prometheus and Grafana describes how collect custom network traffic flow metrics using the Prometheus time series database and include the metrics in Grafana dashboards. Use the Flow Browser to explore UET flow metrics and then configure a Prometheus scrape task to collect useful operational metrics.

Thursday, October 30, 2025

Vector Packet Processor (VPP) dropped packet notifications

Vector Packet Processor (VPP) release 25.10 extends the sFlow implementation to include support for dropped packet notifications, providing detailed, low overhead, visibility into traffic flowing through a VPP router, see Vector Packet Processor (VPP) for performance information.

sflow sampling-rate 10000
sflow polling-interval 20
sflow header-bytes 128
sflow direction both
sflow drop-monitoring enable
sflow enable GigabitEthernet0/8/0
sflow enable GigabitEthernet0/9/0
sflow enable GigabitEthernet0/a/0

The above VPP configuration commands enable sFlow monitoring of the VPP dataplane, randomly sampling packets, periodically polling counters, and capturing dropped packets and reason codes. The measurements are send via Linux netlink messages to an instance of the open source Host sFlow agent (hsflowd) which combines the measurements and streams standard sFlow telemetry to a remote collector.

sflow {
  collector { ip=192.0.2.1 udpport=6343 }
  psample { group=1 egress=on }
  dropmon { start=on limit=50 }
  vpp { }
}

The /etc/hsflowd.conf file above enables the modules needed to receive netlink messages from VPP and send the resulting sFlow telemetry to a collector at 192.0.2.1. See vpp-sflow for detailed instructions.

docker run -p 6343:6343/udp sflow/sflowtool

Run sflowtool on the sFlow collector host to verify verify that the data is being received and to see the information available in the sFlow telemetry stream. The pre-built sflow/sflowtool Docker image is the fastest way to run sflowtool.

Docker also makes it easy to try out other sFlow analytics tools, for example, Deploy real-time network dashboards using Docker compose, describes how to quickly set up measurement stack consisting of the sFlow-RT real-time analytics engine, Prometheus time series database, and Grafana dashboard builder.

The sFlow VPP module delivers the same industry standard network performance monitoring available in switches and routers from leading vendors, including Arista, Cisco, Dell, Juniper, NVIDIA, VyOS, etc. to provide comprehensive, network-wide, visibility.

Monday, October 20, 2025

AI / ML network performance metrics at scale

The charts above show information from a GPU cluster running an AI / ML training workload. The 244 nodes in the cluster are connected by 100G links to a single large switch. Industry standard sFlow telemetry from the switch is shown in the two trend charts generated by the sFlow-RT real-time analytics engine. The charts are updated every 100mS.

Per Link Telemetry shows RoCEv2 traffic on 5 randomly selected links from the cluster. Each trend is computed based on sFlow random packet samples collected on the link. The packet header in each sample is decoded and the metric is computed for packets identified as RoCEv2.
Combined Fabric-Wide Telemetry combines the signals from all the links to create a fabric wide metric. The signals are highly correlated since the AI training compute / exchange cycle is synchronized across all compute nodes in the cluster. Constructive interference from combining data from all the links removes the noise in each individual signal and clearly shows the traffic pattern for the cluster.

This is a relatively small cluster. For larger clusters, the effect is even more pronounced, resulting in extremely sharp cluster-wide metrics. The sFlow instrumentation embedded as a standard feature of data center switch hardware from all leading vendors (Arista, Cisco, Dell, Juniper, NVIDIA, etc.) provides a cost effective solution for even the largest AI / ML fabrics.

The Grafana AI Metrics dashboard shown above tracks performance metrics for AI/ML RoCEv2 network traffic, for example, large scale CUDA compute tasks using NVIDIA Collective Communication Library (NCCL) operations for inter-GPU communications: AllReduce, Broadcast, Reduce, AllGather, and ReduceScatter.

The metrics include:

Total Traffic Total traffic entering fabric
Operations Total RoCEv2 operations broken out by type
Core Link Traffic Histogram of load on fabric links
Edge Link Traffic Histogram of load on access ports
RDMA Operations Total RDMA operations
RDMA Bytes Average RDMA operation size
Credits Average number of credits in RoCEv2 acknowledgements
Period Detected period of compute / exchange activity on fabric (in this case just over 0.5 seconds)
Congestion Total ECN / CNP congestion messages
Errors Total ingress / egress errors
Discards Total ingress / egress discards
Drop Reasons Packet drop reasons

Enable sFlow in your AI / ML networks for detailed visibility into network performance. Instructions are provided in the articles: AI Metrics, AI Metrics with Prometheus and Grafana and AI Metrics with Grafana Cloud.

Wednesday, September 10, 2025

Packet trimming

The latest version of the AI Metrics dashboard uses industry standard sFlow telemetry from network switches to monitor the number of trimmed packets to use as a congestion metric.

Ultra Ethernet Specification Update describes how the Ultra Ethernet Transport (UET) Protocol has the ability to leverage optional “packet trimming” in network switches, which allows packets to be truncated rather than dropped in the fabric during congestion events. As packet spraying causes reordering, it becomes more complicated to detect loss. Packet trimming gives the receiver and the sender an early explicit indication of congestion, allowing immediate loss recovery in spite of reordering, and is a critical feature in the low-RTT environments where UET is designed to operate.

cumulus@switch:~$ nv set system forwarding packet-trim profile packet-trim-default
cumulus@switch:~$ nv config apply

NVIDIA Cumulus Linux release 5.14 for NVIDA Spectrum Ethernet Switches includes support for Packet Trimming. The above command enables packet trimming, sets the DSCP remark value to 11, sets the truncation size to 256 bytes, sets the switch priority to 4, and sets the eligibility to all ports on the switch with traffic class 1, 2, and 3. NVIDA BlueField host adapters respond to trimmed packets to ensure fast congestion recovery.

Instructions for deploying the Grafana AI Metrics dashboard are provided in the articles: AI Metrics, AI Metrics with Prometheus and Grafana and AI Metrics with Grafana Cloud.