Monday, February 23, 2026

Real-time visualization of AI / ML traffic matrix

Heatmap is available on GitHub. The application provides a real-time traffic matrix visualization of end-to-end traffic flowing across an Ethernet fabric. Each axis represents an ordered list of network addresses. The x-axis is a flow source and the y-axis is a flow destination.

For example, the Heatmap above comes from a large high performance compute cluster running a mixture of tasks. Traffic is concentrated along the diagonal, indicating that the job scheduler is packing related tasks in racks so that most traffic is confined to the rack.

Note: Live Dashboards links to a number dashboards showing live traffic, including the Heatmap above.

The next Heatmap shows a very different traffic pattern. In this case, RoCEv2 traffic generated by GPUs performing a NCCL AllReduce/AllGather collective operation using a ring algorithm. During the collective operation, each GPU sends data to its immediate neighbor (modulo the number of GPUs) in a logical ring, resulting in two nearly continuous lines on either size of the diagonal: one for forward traffic, and the other for return traffic associated with each flow.
The final example comes from a large data center hosting a mix of front end workloads. Unlike the backend networks, this network combines internal (East/West) traffic with external (North/South) traffic flows. The internal traffic flows are contained in the central grid. The surrounding borders display external traffic.
The full range of IP addresses (0.0.0.0 - 255.255.255.255) is displayed on the heatmap using a piecewise linear scaling function. A start and end address identifies internal traffic and maps to values in the central grid and addresses outside this range are scaled to fit in the borders insets.

Representing the traffic matrix in the form of a heat map scales well to very large networks and provides real-time insight into shifting traffic patterns as workloads change. The industry standard sFlow instrumentation in data center switches used to construct the traffic matrix also scales to the large number of switches and 400/800G port speeds found in AI/ML backend networks.

Tuesday, January 13, 2026

Exporting events to Loki

Grafana Loki is an open source log aggregation system inspired by Prometheus. While it is possible to use Loki with Grafana Alloy, a simpler approach is to send logs directly using the Loki HTTP API.

The following example modifies the ddos-protect application to use sFlow-RT's httpAsync() function to send events to Loki's HTTP API.

var lokiPort = getSystemProperty("ddos_protect.loki.port") || '3100';
var lokiPush = getSystemProperty("ddos_protect.loki.push") || '/loki/api/v1/push';
var lokiHost = getSystemProperty("ddos_protect.loki.host");

function sendEvent(action,attack,target,group,protocol) {
  if(lokiHost) {
    var url = 'http://'+lokiHost+':'+lokiPort+lokiPush;
    lokiEvent = {
      streams: [
        {
          stream: {
            service_name: 'ddos-protect'
          },
          values: [[
            Date.now()+'000000',
            action+" "+attack+" "+target+" "+group+" "+protocol,
            {
              detected_level: action == 'release' ? 'INFO' : 'WARN',
              action: action,
              attack: attack,
              ip: target,
              group: group,
              protocol: protocol
            }
          ]]
        }
      ]
    };
    httpAsync({
      url: url,
      headers: {'Content-Type':'application/json'},
      operation: 'POST',
      body: JSON.stringify(lokiEvent),
      success: (response) => { 
        if (200 != response.status) {
          logWarning("DDoS Loki status " + response.status);
        }
      },
      error: (error) => {
        logWarning("DDoS Loki error " + error);
      }
    });
  }

  if(syslogHosts.length === 0) return;

  var msg = {app:'ddos-protect',action:action,attack:attack,ip:target,group:group,protocol:protocol};
  syslogHosts.forEach(function(host) {
    try {
      syslog(host,syslogPort,syslogFacility,syslogSeverity,msg);
    } catch(e) {
      logWarning('DDoS cannot send syslog to ' + host);
    }
  });
}
The highlighted code extends the existing scripts/ddos.js script to add Loki support.
Add a panel to integrate the Loki log into the Grafana sFlow-RT DDoS Protect dashboard as shown at the top of this page.

DDoS protection quickstart guide describes how to set up a DDoS mitigation solution using sFlow-RT.