Monday, November 25, 2024

Topology aware flow analytics with NVIDIA NetQ

NVIDIA Cumulus Linux 5.11 for AI / ML describes how NVIDIA 400/800G Spectrum-X switches combined with the latest Cumulus Linux release deliver enhanced real-time telemetry that is particularly relevant to the AI / machine learning workloads that Spectrum-X switches are designed to handle.

This article shows how to extract Topology from an NVIDIA fabric in order to perform advanced fabric aware analytics, for example: detect flow collisions, trace flow paths, and de-duplicate traffic.

In this example, we will use NVIDIA NetQ, a highly scalable, modern network operations toolset that provides visibility, troubleshooting, and validation of your Cumulus and SONiC fabrics in real time.

netq show lldp json
For example, the NetQ Link Layer Discovery Protocol (LLDP) service simplifies the task of gathering neighbor data from switches in the network, and with the json option, makes the output easy to process with a Python script, for example, lldp-rt.py.

The simplest way to try sFlow-RT is to use the pre-built sflow/topology Docker image that packages sFlow-RT with additional applications that are useful for monitoring network topologies.

docker run -p 6343:6343/udp -p 8008:8008 sflow/topology
Configure Cumulus Linux to steam sFlow telemetry to sFlow-RT on UDP port 6343 (the default for sFlow).
netq show lldp json | ./lldp-rt.py http://sflow-rt:8008/topology/json
The above command puts it all together, taking LLDP data from NetQ, converting it to sFlow-RT format, and posting the fabric topology to the sFlow-RT REST API.
Access the sFlow-RT web interface on port 8008. The Topology application includes a dashboard to verify that all the nodes and links in the topology are fully covered by the sFlow telemetry stream.

Getting Started is a step by step guide to sFlow-RT applications, APIs, and community support.

No comments:

Post a Comment