Thursday, November 17, 2022

SC22 SCinet network monitoring

The data shown in the chart was gathered from The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22) being held this week in Dallas. The conference network, SCinet, is described as the fastest and most powerful network on Earth, connecting the SC community to the world. The chart provides an up to the second view of overall SCinet traffic, the lower chart showing total traffic hitting a sustained 8Tbps.
The poster shows the topology of the SCinet network. Monitoring flow data from 5,852 switch/router ports with 162Tbps total bandwith with sub-second latency is required to construct the charts.
The chart was generated using industry standard streaming sFlow telemetry from switches and routers in the SCinet network. An instance of the sFlow-RT real-time analytics engine computes the flow metrics shown in the charts.
Most of the load was due to large 400Gbit/s, 200Gbit/s and 100Gbit/s flows that were part of the Network Research Exhibition. The chart above shows that 10 large flows are responsible for 1.5Tbps of traffic.
Scientific network tags (scitags) describes how IPv6 flowlabels allow network flow analytics to identify network traffic associated with bulk scientific data transfers.
RDMA network visibility shows how bulk data transfers using Remote Direct Memory Access (RDMA).

Wednesday, November 16, 2022

RDMA network visibility

The Remote Direct Memory Access (RDMA) data shown in the chart was gathered from The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22) being held this week in Dallas. The conference network, SCinet, is described as the fastest and most powerful network on Earth, connecting the SC community to the world.
Resilient Distributed Processing and Reconfigurable Networks is one of the demonstrations using SCinet - Location: Booth 2847 (StarLight). Planned SC22 focus is on RDMA enabled data movement and dynamic network control.
  1. RDMA Tbps performance over global distance for timely Terabyte bulk data transfers (goal << 1 min Tbyte transfer on N by 400G network).
  2. Dynamic shifting of processing and network resources from on location/path/system to another (in response to demand and availability).
The real-time chart at the top of this page shows an up to the second view of RDMA traffic (broken out by source, destination, and RDMA operation).
The chart was generated using industry standard streaming sFlow telemetry from switches and routers in the SCinet network. An instance of the sFlow-RT analytics engine computes the RDMA flow metrics shown in the chart. RESTflow describes how sFlow disaggregates the traditional NetFlow / IPFIX analytics pipeline to offer flexible, scaleable, low latency flow measurements. Flow metrics with Prometheus and Grafana describes how metrics can be stored in a time series database for use in operational dashboards.

Real-time traffic analytics transforms network monitoring from reporting on the past to observing and acting on the present to automate troubleshooting and traffic engineering, e.g. Leaf and spine traffic engineering using segment routing and SDN and DDoS protection quickstart guide.

Tuesday, November 15, 2022

Scientific network tags (scitags)

The data shown in the chart was gathered from The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22) being held this week in Dallas. The conference network, SCinet, is described as the fastest and most powerful network on Earth, connecting the SC community to the world. The chart shows data generated as part of the Packet Marking for Networked Scientific Workflows demonstration using SCinet - Booth 2847 (StarLight).

Scientific network tags (scitags) is an initiative promoting identification of the science domains and their high-level activities at the network level. Participants include, dCacheESnet, GÉANT, Internet2, Jisc, NORDUnet, OFTS, OSG, RNP, RUCIO, StarLight, XRootD.

This article will demonstrate how industry standard sFlow telemetry streaming from switches and routers can be used to report on science domain activity in real-time using the sFlow-RT analytics engine.

The scitags initiative makes use of the IPv6 packet header to mark traffic. Experiment and activity identifiers are encoded in the IPv6 Flow label field. Identifiers are published in an online registry in the form of a JSON document, https://www.scitags.org/api.json.

One might expect IPFIX / NetFlow to be a possible alternative to sFlow for scitags reporting, but with NetFlow/IPFIX the network devices summarize the traffic before exporting flow records containing only the fields they decode in the firmware, and currently leading vendors such as Arista, Cisco and Juniper do not include the IPv6 flow label as a field that can be exported. A firmware/hardware update would be needed to access the data.  And the same roadblock may repeat for cases where the IPv6 is carried over a new tunnel encapsulation, or for any other new field that may be requested.

On the other hand, the sFlow protocol disaggregates the flow analytics pipeline, devices stream raw packet headers and metadata in real-time to an external analyzer which decodes the packets and builds flow records - see RESTflow for more information. This means that visibility into scitags traffic is available today from every sFlow capable device released over the last 20 years with no vendor involvement - the only  requirement is an sFlow collector that decodes IPv6 packet headers. Vendors supporting sFlow include: A10, Arista, Aruba, Cisco, Edge-Core, Extreme, Huawei, Juniper, NEC, NVIDIA, Netgear, Nokia, Quanta, and ZTE.

Finally, real-time visibility is a key benefit of using sFlow. The IPFIX / NetFlow flow cache on the router adds significant delay to measurements (anything from 30 seconds to 30 minutes for long lived science flows based on the active timeout setting). With sFlow, data is immediately exported by the router, allowing the sFlow analyzer to present an up to the second view of traffic. Real-time traffic analytics transforms network monitoring from reporting on the past to observing and acting on the present to automate troubleshooting and traffic engineering, e.g. Leaf and spine traffic engineering using segment routing and SDN and DDoS protection quickstart guide.

function reverseBits(val,n) {
  var bits = val.toString(2).padStart(n, '0');
  var reversed = bits.split('').reverse().join('');
  return parseInt(reversed,2);
}

function flowlabel(expId,activityId) {
  return (reverseBits(expId,9) << 9) + (activityId << 2);
}

function updateMap() {
  var tags, parsed;
  try {
    tags = http('https://www.scitags.org/api.json');
    parsed = JSON.parse(tags);
  } catch(e) {
    logWarning('SCITAGS http get failed ' + e);
    return;
  }
  var experiments = parsed && parsed.experiments;
  if(!experiments) return;
  var map = {};
  experiments.forEach(function(experiment) {
    var expName = experiment.expName;
    var expId = experiment.expId;
    var activities = experiment.activities;
    activities.forEach(function(activity) {
      var activityName = activity.activityName;
      var activityId = activity.activityId;
      var key = (expName + '.' + activityName).replace(/ /g,"_");
      map[key] = [ flowlabel(expId,activityId) ];
    });
  });

  setMap('scitag',map);
}

updateMap();
setIntervalHandler(updateMap,600);

The above scitags.js script periodically queries the registry and creates an sFlow-RT map from flow label to registry entry. See Writing Applications for more information on the script.

docker run --rm -v $PWD/scitags.js:/sflow-rt/scitags.js \
-p 8008:8008 -p 6343:6343/udp sflow/prometheus -Dscript.file=scitags.js

Use the above command to run sFlow-RT with the scitags.js using the pre-built sflow/prometheus image.

map:[bits:ip6flowlabel:261884]:scitag

Defining Flows describes how program sFlow-RT's flow analytics engine. The example above shows how to use the bits: function to mask out the Entropy bits from the ip6flowlabel and extract the Activity and Experiment bits (00111111111011111100 binary is 261884 in decimal). The masked value is used as a key in the scitag map built by the scitags.js script.

The Browse Flows trend above shows a network traffic flow identified by its scitag value.

iperf3 -c 2001:172:16:2::2 --flowlabel 65572

The ESnet iperf3 tool was used to generate the IPv6 traffic with configured flowlabel shown in the chart.

Flow metrics with Prometheus and Grafana describes how to export flow analytics to a time series database for use in operational dashboards.

  - job_name: 'sflow-rt-scitag-bps'
    metrics_path: /app/prometheus/scripts/export.js/flows/ALL/txt
    static_configs:
      - targets: ['127.0.0.1:8008']
    params:
      metric: ['scitag_networks_bps']
      key: ['ip6source','ip6destination','map:[bits:ip6flowlabel:261884]:scitag']
      label: ['src','dst','scitag']
      value: ['bytes']
      scale: ['8']
      aggMode: ['sum']
      minValue: ['1000']
      maxFlows: ['100']
For example, the Prometheus scrape job above collects the data shown in the Browse Flows chart.
The chart above shows a Grafana dashboard displaying the scitag flow data.

Thursday, September 15, 2022

Low latency flow analytics


Real-time analytics on network flow data with Apache Pinot describes LinkedIn's flow ingestion and analytics pipeline for sFlow and IPFIX exports from network devices. The solution uses Apache Kafka message queues to connect LinkedIn's InFlow flow analyzer with the Apache Pinot datastore to support low latency queries. The article describes the scale of the monitoring system, InFlow receives 50k flows per second from over 100 different network devices on the LinkedIn backbone and edge devices and states InFlow requires storage of tens of TBs of data with a retention of 30 days. The article concludes, Following the successful onboarding of flow data to a real-time table on Pinot, freshness of data improved from 15 mins to 1 minute and query latencies were reduced by as much as 95%.
The sFlow-RT real-time analytics engine provides a faster, simpler, more scaleable, alternative for flow monitoring. sFlow-RT  radically simplifies the measurement pipeline, combining flow collection, enrichment, and analytics in a single programmable stage. Removing pipeline stages improves data freshness — flow measurements represent an up to the second view of traffic flowing through the monitored network devices. The improvement from minute to sub-second data freshness enhances automation use cases such as automated DDoS mitigation, load balancing, and auto-scaling (see Delay and stability).

An essential step in improving data freshness is eliminating IPFIX and migrating to an sFlow only network monitoring system. Rapidly detecting large flows, sFlow vs. NetFlow/IPFIX describes how the on-device flow cache component of IPFIX/NetFlow measurements adds an additional stage (and additional latency, anywhere from 1 to 30 minutes) to the flow analytics pipeline. On the other hand, sFlow is stateless, eliminates the flow cache, and guarantees sub-second freshness. Moving from IPFIX/NetFlow is straightforward as most of the leading router vendors now include sFlow support, see Real-time flow telemetry for routers. In addition, sFlow is widely supported in switches, making it possible to efficiently monitor every device in the data center.

Removing pipeline stages increases scaleability and reduces the cost of the solution. A single instance of sFlow-RT can monitor 10's of thousands of network devices and over a million flows per second, replacing large numbers of scale-out pipeline instances. Removing the need for distributed message queues improves resilience by decoupling the flow monitoring system from the network so that it can reliably deliver the visibility needed to manage network congestion "meltdown" events — see Who monitors the monitoring systems?

Introducing a programmable flow analytics stage before storing the data significantly reduces storage requirements (and improves query latency) since only metrics of interest will be computed and stored (see Flow metrics with Prometheus and Grafana).

The LinkedIn paper mentions that they are developing an eBPF Skyfall agent for monitoring hosts. The open source Host sFlow agent extends sFlow to hosts and can be deployed at scale today, see Real-time Kubernetes cluster monitoring example.

docker run -p 8008:8008 -p 6343:6343/udp sflow/prometheus
Trying out sFlow-RT is easy. For example, run the above command to start the analyzer using the pre-built sflow/prometheus Docker image. Configure sFlow agents to stream telemetry to the analyzer and access the through the web interface on port 8008 (see Getting Started). The default settings should work for most small to moderately sized networks. See Tuning Performance for tips on optimizing performance for larger sites.

Wednesday, August 31, 2022

DDoS Sonification

Sonification presents data as sounds instead of visual charts. One of the best known examples of sonification is the representation of radiation level as a click rate in a Geiger counter. This article describes ddos-sonify, an experiment to see if sound can be usefully employed to represent information about Distributed Denial of Service (DDoS) attacks. The DDoS attacks and BGP Flowspec responses testbed was used to create the video demonstration at the top of this page in which a series of simulated DDoS attacks are detected and mitigated. Play the video to hear the results.

The software uses the Tone.js library to control Web Audio sound generation functionality in a web browser.

var voices = {};
var loop;
var loopInterval = '4n';
$('#sonify').click(function() {
  if($(this).prop("checked")) {
    voices.synth = new Tone.PolySynth(Tone.Synth).toDestination();
    voices.metal = new Tone.PolySynth(Tone.MetalSynth).toDestination();
    voices.pluck = new Tone.PolySynth(Tone.PluckSynth).toDestination();
    voices.membrane = new Tone.PolySynth(Tone.MembraneSynth).toDestination();
    voices.am = new Tone.PolySynth(Tone.AMSynth).toDestination();
    voices.fm = new Tone.PolySynth(Tone.FMSynth).toDestination();
    voices.duo = new Tone.PolySynth(Tone.DuoSynth).toDestination();
    Tone.Transport.bpm.value=80;
    loop = new Tone.Loop((now) => {
      sonify(now);
    },loopInterval).start(0);
    Tone.Transport.start();
  } else {
    loop.stop();
    loop.dispose();
    Tone.Transport.stop();
  }
});
Clicking on the Convert charts to sound checkbox on the web page initializes the different sound synthesizers that will be used to create sounds and starts a timed loop that will periodically call the sonify() function convert current values of each of the metrics into sounds.
var metrics = [
  {name:'top-5-ip-flood', threshold:'threshold_ip_flood', voice:'synth'},
  {name:'top-5-ip-fragmentation', threshold:'threshold_ip_fragmentation', voice:'duo'},
  {name:'top-5-icmp-flood', threshold:'threshold_icmp_flood', voice:'pluck'},
  {name:'top-5-udp-flood', threshold:'threshold_udp_flood', voice:'membrane'},
  {name:'top-5-udp-amplification', threshold:'threshold_udp_amplification', voice:'metal'},
  {name:'top-5-tcp-flood', threshold:'threshold_tcp_flood', voice:'am'},
  {name:'top-5-tcp-amplification', threshold:'threshold_tcp_amplification', voice:'fm'}
];
var notes = ['C4','D4','E4','F4','G4','A4','B4','C5'];
function sonify(now) {
  var sounds = {};
  var max = {};
  metrics.forEach(function(metric) {
    let vals = db.trend.trends[metric.name];
    let topn = vals[vals.length - 1];
    let thresh = db.trend.values[metric.threshold];
    let chord = sounds[metric.voice];
    if(!chord) {
      chord = {};
      sounds[metric.voice] = chord;
    }
    for(var key in topn) {
      let [tgt,group,port] = key.split(',');
      let note = notes[port % notes.length];
      chord[note] = Math.max(chord[note] || 0, Math.min(1,topn[key] / thresh));
      max[metric.voice] = Math.max(max[metric.voice] || 0, chord[note]);
    };
  });
  var interval = Tone.Time(loopInterval).toSeconds();
  var delay = 0;
  for(let voice in sounds) {
    let synth = voices[voice];
    let chord = sounds[voice];
    let maxval = max[voice];
    if(maxval) {
      let volume = Math.min(0,(maxval - 1) * 20);
      synth.volume.value=volume;
      let note_array = [];
      for(let note in chord) {
        let val = chord[note];
        if((val / maxval) < 0.7) continue;
        note_array.push(note);
      }
      let duration = Tone.Time(maxval*interval).quantize('64n');
      if(duration > 0) synth.triggerAttackRelease(note_array,duration,now+delay);
    }
    delay += Tone.Time('16n').toSeconds();
  }
}
The metrics array identifies individual DDoS metrics and their related thresholds and associates them with a sound (voice). The sonify() function retrieves current values of each of the metrics and scales them by their respective threshold. Each metric value is mapped to a musical note based on the TCP/UDP port used in the attack. Different attack types are mapped to different voices, for example, a udp_amplification attack will have a metallic sound while a udp_flood attack will have a percussive sound. Volume and duration of notes are proportional to the intensity of the attack.

The net effect in a production network is of a quiet rythm of instruments. When a DDoS attack occurs, the notes associated with the particular attack become much louder and drown out the background sounds. Over time it is possible to recoginize the distinct sounds on each type of DDoS attack.

Tuesday, August 23, 2022

NVIDIA ConnectX SmartNICs

NVIDIA ConnectX SmartNICs offer best-in-class network performance, serving low-latency, high-throughput applications with one, two, or four ports at 10, 25, 40, 50, 100, 200, and up to 400 gigabits per second (Gb/s) Ethernet speeds.

This article describes how use the instrumentation built into ConnectX SmartNICs for data center wide network visibility. Real-time network telemetry for automation provides some background, giving an overview of the sFlow industry standard with an example of troubleshooting a high performance GPU compute cluster.

Linux as a network operating system describes how standard Linux APIs are used in NVIDIA Spectrum switches to monitor data center network performance. Linux Kernel Upstream Release Notes v5.19 describes recent driver enhancements for ConnectX SmartNICs that extend visibility to servers for end-to-end visibility into the performance of high performance distributed compute infrastructure.

The open source Host sFlow agent uses standard Linux APIs to configure instrumentation in switches and hosts, streaming the resulting measurements to analytics software in real-time for comprehensive data center wide visibility.

Packet sampling provides detailed visibility into traffic flowing across the network. Hardware packet sampling makes it possible to monitor 400 gigabits per second interfaces on the server at line rate with minimal CPU/memory overhead.
psample { group=1 egress=on }
dent { sw=off switchport=^eth[0-9]+$ }
The above Host sFlow configuration entries enable packet sampling on the host. Linux 4.11 kernel extends packet sampling support describes the Linux PSAMPLE netlink channel used by the network adapter to send packet samples to the Host sFlow agent. The dent module automatically configures hardware packet sampling on network interfaces matching the switchport pattern (eth0, eth1, .. in this example) using the Linux tc-sample API, directing the packet samples to be sent to the specified psample group
Visibility into dropped packets offers significant benefits for network troubleshooting, providing real-time network-wide visibility into the specific packets that were dropped as well the reason the packet was dropped. This visibility instantly reveals the root cause of drops and the impacted connections.
dropmon { group=1 start=on sw=on hw=on }
The above Host sFlow configuration entry enables hardware dropped packet monitoring on the network adapter hardware.
sflow {
  collector { ip=10.0.0.1 }
  psample { group=1 egress=on }
  dropmon { group=1 start=on sw=on hw=on }
  dent { sw=off switchport=^eth[0-9]+$ }
}
The above /etc/hsflowd.conf file shows a complete configuration. The centralized collector, 10.0.0.1, receives sFlow from all the servers and switches in the network to provide comprehensive end-to-end visibility.
The sFlow-RT analytics engine receives and analyzes the sFlow telemetry stream, providing real-time analytics to visibility and automation systems (e.g. Flow metrics with Prometheus and Grafana).

Tuesday, August 9, 2022

DDoS detection with advanced real-time flow analytics

The diagram shows two high bandwidth flows of traffic to the Customer Network, the first (shown in blue) is a bulk transfer of data to a big data application, and the second (shown in red) is a distributed denial of service (DDoS) attack in which large numbers of compromised hosts attempt to flood the link connecting the Customer Network to the upstream Transit Provider. Industry standard sFlow telemetry from the customer router streams to an instance of the sFlow-RT real-time analytics engine which is programmed to detect (and mitigate) the DDoS attack.

This article builds on the Docker testbed to demonstrate how advanced flow analytics can be used to separate the two types of traffic and detect the DDoS attack.

docker run --rm -d -e "COLLECTOR=host.docker.internal" -e "SAMPLING=100" \
--net=host -v /var/run/docker.sock:/var/run/docker.sock:ro \
--name=host-sflow sflow/host-sflow
First, start a Host sFlow agent using the pre-built sflow/host-sflow image to generate the sFlow telemetry that would stream from the switches and routers in a production deployment. 
setFlow('ddos_amplification', {
  keys:'ipdestination,udpsourceport',
  value: 'frames',
  values: ['count:ipsource']
});
setThreshold('ddos_amplification', {
  metric:'ddos_amplification',
  value: 10000,
  byFlow:true,
  timeout: 2
});
setEventHandler(function(event) {
  var [ipdestination,udpsourceport] = event.flowKey.split(',');
  var [sourcecount] = event.values;
  if(sourcecount === 1) {
    logInfo("bulk transfer to " + ipdestination);
  } else {
    logInfo("DDoS port " + udpsourceport + " against " + ipdestination);
  }
},['ddos_amplification']);
The ddos.js script above provides a simple demonstration of sFlow-RT's advanced flow analytics. The setFlow() function defines the a flow signature for detecting UDP amplification attacks, identifying the targetted IP address and the amplification protocol. In addition to the primary value of frames per second, a secondary value counting the number of ipsource addresses has been included. The setThreshold() function causes an event to by generated whenever a flow exceeds 10,000 frames per second. Finally, the setEventHandler() function defines how the events will be processed. See Writing Applications for more information on developing sFlow-RT applications.
docker run --rm -v $PWD/ddos.js:/sflow-rt/ddos.js \
-p 8008:8008 -p 6343:6343/udp --name sflow-rt \
sflow/prometheus -Dscript.file=ddos.js
Start sFlow-RT using pre-built sflow/prometheus image.
docker run --rm -it sflow/hping3 --flood --udp -k \
-p 443 host.docker.internal
In a separate window, simulate a bulk tranfer using pre-built sflow/hping3 image (use CTL+C to stop the attack).
2022-08-09T00:03:20Z INFO: bulk transfer to 192.168.65.2
The transfer will be immediately detected and logged in the sFlow-RT window.
docker run --rm -it sflow/hping3 --flood --udp -k \
--rand-source -s 53 host.docker.internal
Simulate a UDP amplification attack.
2022-08-09T00:05:19Z INFO: DDoS port 53 against 192.168.65.2
The attack will be immmediately detected and logged in the sFlow-RT window.

The open source sFlow-RT ddos-protect application is a full featured DDoS mitigation solution that uses the advanced flow analytics features described in this article to detect a wide range of volumetric attacks. In addition, ddos-protect can automatically mitigate attacks using BGP remotely triggered blackhole (RTBH) or BGP Flowspec actions. DDoS protection quickstart guide describes how to test, deploy, and monitor the DDoS mitigation solution with examples using Arista, Cisco, and Juniper routers.