Thursday, September 15, 2022

Low latency flow analytics


Real-time analytics on network flow data with Apache Pinot describes LinkedIn's flow ingestion and analytics pipeline for sFlow and IPFIX exports from network devices. The solution uses Apache Kafka message queues to connect LinkedIn's InFlow flow analyzer with the Apache Pinot datastore to support low latency queries. The article describes the scale of the monitoring system, InFlow receives 50k flows per second from over 100 different network devices on the LinkedIn backbone and edge devices and states InFlow requires storage of tens of TBs of data with a retention of 30 days. The article concludes, Following the successful onboarding of flow data to a real-time table on Pinot, freshness of data improved from 15 mins to 1 minute and query latencies were reduced by as much as 95%.
The sFlow-RT real-time analytics engine provides a faster, simpler, more scaleable, alternative for flow monitoring. sFlow-RT  radically simplifies the measurement pipeline, combining flow collection, enrichment, and analytics in a single programmable stage. Removing pipeline stages improves data freshness — flow measurements represent an up to the second view of traffic flowing through the monitored network devices. The improvement from minute to sub-second data freshness enhances automation use cases such as automated DDoS mitigation, load balancing, and auto-scaling (see Delay and stability).

An essential step in improving data freshness is eliminating IPFIX and migrating to an sFlow only network monitoring system. Rapidly detecting large flows, sFlow vs. NetFlow/IPFIX describes how the on-device flow cache component of IPFIX/NetFlow measurements adds an additional stage (and additional latency, anywhere from 1 to 30 minutes) to the flow analytics pipeline. On the other hand, sFlow is stateless, eliminates the flow cache, and guarantees sub-second freshness. Moving from IPFIX/NetFlow is straightforward as most of the leading router vendors now include sFlow support, see Real-time flow telemetry for routers. In addition, sFlow is widely supported in switches, making it possible to efficiently monitor every device in the data center.

Removing pipeline stages increases scaleability and reduces the cost of the solution. A single instance of sFlow-RT can monitor 10's of thousands of network devices and over a million flows per second, replacing large numbers of scale-out pipeline instances. Removing the need for distributed message queues improves resilience by decoupling the flow monitoring system from the network so that it can reliably deliver the visibility needed to manage network congestion "meltdown" events — see Who monitors the monitoring systems?

Introducing a programmable flow analytics stage before storing the data significantly reduces storage requirements (and improves query latency) since only metrics of interest will be computed and stored (see Flow metrics with Prometheus and Grafana).

The LinkedIn paper mentions that they are developing an eBPF Skyfall agent for monitoring hosts. The open source Host sFlow agent extends sFlow to hosts and can be deployed at scale today, see Real-time Kubernetes cluster monitoring example.

docker run -p 8008:8008 -p 6343:6343/udp sflow/prometheus
Trying out sFlow-RT is easy. For example, run the above command to start the analyzer using the pre-built sflow/prometheus Docker image. Configure sFlow agents to stream telemetry to the analyzer and access the through the web interface on port 8008 (see Getting Started). The default settings should work for most small to moderately sized networks. See Tuning Performance for tips on optimizing performance for larger sites.

Wednesday, August 31, 2022

DDoS Sonification

Sonification presents data as sounds instead of visual charts. One of the best known examples of sonification is the representation of radiation level as a click rate in a Geiger counter. This article describes ddos-sonify, an experiment to see if sound can be usefully employed to represent information about Distributed Denial of Service (DDoS) attacks. The DDoS attacks and BGP Flowspec responses testbed was used to create the video demonstration at the top of this page in which a series of simulated DDoS attacks are detected and mitigated. Play the video to hear the results.

The software uses the Tone.js library to control Web Audio sound generation functionality in a web browser.

var voices = {};
var loop;
var loopInterval = '4n';
$('#sonify').click(function() {
  if($(this).prop("checked")) {
    voices.synth = new Tone.PolySynth(Tone.Synth).toDestination();
    voices.metal = new Tone.PolySynth(Tone.MetalSynth).toDestination();
    voices.pluck = new Tone.PolySynth(Tone.PluckSynth).toDestination();
    voices.membrane = new Tone.PolySynth(Tone.MembraneSynth).toDestination();
    voices.am = new Tone.PolySynth(Tone.AMSynth).toDestination();
    voices.fm = new Tone.PolySynth(Tone.FMSynth).toDestination();
    voices.duo = new Tone.PolySynth(Tone.DuoSynth).toDestination();
    Tone.Transport.bpm.value=80;
    loop = new Tone.Loop((now) => {
      sonify(now);
    },loopInterval).start(0);
    Tone.Transport.start();
  } else {
    loop.stop();
    loop.dispose();
    Tone.Transport.stop();
  }
});
Clicking on the Convert charts to sound checkbox on the web page initializes the different sound synthesizers that will be used to create sounds and starts a timed loop that will periodically call the sonify() function convert current values of each of the metrics into sounds.
var metrics = [
  {name:'top-5-ip-flood', threshold:'threshold_ip_flood', voice:'synth'},
  {name:'top-5-ip-fragmentation', threshold:'threshold_ip_fragmentation', voice:'duo'},
  {name:'top-5-icmp-flood', threshold:'threshold_icmp_flood', voice:'pluck'},
  {name:'top-5-udp-flood', threshold:'threshold_udp_flood', voice:'membrane'},
  {name:'top-5-udp-amplification', threshold:'threshold_udp_amplification', voice:'metal'},
  {name:'top-5-tcp-flood', threshold:'threshold_tcp_flood', voice:'am'},
  {name:'top-5-tcp-amplification', threshold:'threshold_tcp_amplification', voice:'fm'}
];
var notes = ['C4','D4','E4','F4','G4','A4','B4','C5'];
function sonify(now) {
  var sounds = {};
  var max = {};
  metrics.forEach(function(metric) {
    let vals = db.trend.trends[metric.name];
    let topn = vals[vals.length - 1];
    let thresh = db.trend.values[metric.threshold];
    let chord = sounds[metric.voice];
    if(!chord) {
      chord = {};
      sounds[metric.voice] = chord;
    }
    for(var key in topn) {
      let [tgt,group,port] = key.split(',');
      let note = notes[port % notes.length];
      chord[note] = Math.max(chord[note] || 0, Math.min(1,topn[key] / thresh));
      max[metric.voice] = Math.max(max[metric.voice] || 0, chord[note]);
    };
  });
  var interval = Tone.Time(loopInterval).toSeconds();
  var delay = 0;
  for(let voice in sounds) {
    let synth = voices[voice];
    let chord = sounds[voice];
    let maxval = max[voice];
    if(maxval) {
      let volume = Math.min(0,(maxval - 1) * 20);
      synth.volume.value=volume;
      let note_array = [];
      for(let note in chord) {
        let val = chord[note];
        if((val / maxval) < 0.7) continue;
        note_array.push(note);
      }
      let duration = Tone.Time(maxval*interval).quantize('64n');
      if(duration > 0) synth.triggerAttackRelease(note_array,duration,now+delay);
    }
    delay += Tone.Time('16n').toSeconds();
  }
}
The metrics array identifies individual DDoS metrics and their related thresholds and associates them with a sound (voice). The sonify() function retrieves current values of each of the metrics and scales them by their respective threshold. Each metric value is mapped to a musical note based on the TCP/UDP port used in the attack. Different attack types are mapped to different voices, for example, a udp_amplification attack will have a metallic sound while a udp_flood attack will have a percussive sound. Volume and duration of notes are proportional to the intensity of the attack.

The net effect in a production network is of a quiet rythm of instruments. When a DDoS attack occurs, the notes associated with the particular attack become much louder and drown out the background sounds. Over time it is possible to recoginize the distinct sounds on each type of DDoS attack.

Tuesday, August 23, 2022

NVIDIA ConnectX SmartNICs

NVIDIA ConnectX SmartNICs offer best-in-class network performance, serving low-latency, high-throughput applications with one, two, or four ports at 10, 25, 40, 50, 100, 200, and up to 400 gigabits per second (Gb/s) Ethernet speeds.

This article describes how use the instrumentation built into ConnectX SmartNICs for data center wide network visibility. Real-time network telemetry for automation provides some background, giving an overview of the sFlow industry standard with an example of troubleshooting a high performance GPU compute cluster.

Linux as a network operating system describes how standard Linux APIs are used in NVIDIA Spectrum switches to monitor data center network performance. Linux Kernel Upstream Release Notes v5.19 describes recent driver enhancements for ConnectX SmartNICs that extend visibility to servers for end-to-end visibility into the performance of high performance distributed compute infrastructure.

The open source Host sFlow agent uses standard Linux APIs to configure instrumentation in switches and hosts, streaming the resulting measurements to analytics software in real-time for comprehensive data center wide visibility.

Packet sampling provides detailed visibility into traffic flowing across the network. Hardware packet sampling makes it possible to monitor 400 gigabits per second interfaces on the server at line rate with minimal CPU/memory overhead.
psample { group=1 egress=on }
dent { sw=off switchport=^eth[0-9]+$ }
The above Host sFlow configuration entries enable packet sampling on the host. Linux 4.11 kernel extends packet sampling support describes the Linux PSAMPLE netlink channel used by the network adapter to send packet samples to the Host sFlow agent. The dent module automatically configures hardware packet sampling on network interfaces matching the switchport pattern (eth0, eth1, .. in this example) using the Linux tc-sample API, directing the packet samples to be sent to the specified psample group
Visibility into dropped packets offers significant benefits for network troubleshooting, providing real-time network-wide visibility into the specific packets that were dropped as well the reason the packet was dropped. This visibility instantly reveals the root cause of drops and the impacted connections.
dropmon { group=1 start=on sw=on hw=on }
The above Host sFlow configuration entry enables hardware dropped packet monitoring on the network adapter hardware.
sflow {
  collector { ip=10.0.0.1 }
  psample { group=1 egress=on }
  dropmon { group=1 start=on sw=on hw=on }
  dent { sw=off switchport=^eth[0-9]+$ }
}
The above /etc/hsflowd.conf file shows a complete configuration. The centralized collector, 10.0.0.1, receives sFlow from all the servers and switches in the network to provide comprehensive end-to-end visibility.
The sFlow-RT analytics engine receives and analyzes the sFlow telemetry stream, providing real-time analytics to visibility and automation systems (e.g. Flow metrics with Prometheus and Grafana).

Tuesday, August 9, 2022

DDoS detection with advanced real-time flow analytics

The diagram shows two high bandwidth flows of traffic to the Customer Network, the first (shown in blue) is a bulk transfer of data to a big data application, and the second (shown in red) is a distributed denial of service (DDoS) attack in which large numbers of compromised hosts attempt to flood the link connecting the Customer Network to the upstream Transit Provider. Industry standard sFlow telemetry from the customer router streams to an instance of the sFlow-RT real-time analytics engine which is programmed to detect (and mitigate) the DDoS attack.

This article builds on the Docker testbed to demonstrate how advanced flow analytics can be used to separate the two types of traffic and detect the DDoS attack.

docker run --rm -d -e "COLLECTOR=host.docker.internal" -e "SAMPLING=100" \
--net=host -v /var/run/docker.sock:/var/run/docker.sock:ro \
--name=host-sflow sflow/host-sflow
First, start a Host sFlow agent using the pre-built sflow/host-sflow image to generate the sFlow telemetry that would stream from the switches and routers in a production deployment. 
setFlow('ddos_amplification', {
  keys:'ipdestination,udpsourceport',
  value: 'frames',
  values: ['count:ipsource']
});
setThreshold('ddos_amplification', {
  metric:'ddos_amplification',
  value: 10000,
  byFlow:true,
  timeout: 2
});
setEventHandler(function(event) {
  var [ipdestination,udpsourceport] = event.flowKey.split(',');
  var [sourcecount] = event.values;
  if(sourcecount === 1) {
    logInfo("bulk transfer to " + ipdestination);
  } else {
    logInfo("DDoS port " + udpsourceport + " against " + ipdestination);
  }
},['ddos_amplification']);
The ddos.js script above provides a simple demonstration of sFlow-RT's advanced flow analytics. The setFlow() function defines the a flow signature for detecting UDP amplification attacks, identifying the targetted IP address and the amplification protocol. In addition to the primary value of frames per second, a secondary value counting the number of ipsource addresses has been included. The setThreshold() function causes an event to by generated whenever a flow exceeds 10,000 frames per second. Finally, the setEventHandler() function defines how the events will be processed. See Writing Applications for more information on developing sFlow-RT applications.
docker run --rm -v $PWD/ddos.js:/sflow-rt/ddos.js \
-p 8008:8008 -p 6343:6343/udp --name sflow-rt \
sflow/prometheus -Dscript.file=ddos.js
Start sFlow-RT using pre-built sflow/prometheus image.
docker run --rm -it sflow/hping3 --flood --udp -k \
-p 443 host.docker.internal
In a separate window, simulate a bulk tranfer using pre-built sflow/hping3 image (use CTL+C to stop the attack).
2022-08-09T00:03:20Z INFO: bulk transfer to 192.168.65.2
The transfer will be immediately detected and logged in the sFlow-RT window.
docker run --rm -it sflow/hping3 --flood --udp -k \
--rand-source -s 53 host.docker.internal
Simulate a UDP amplification attack.
2022-08-09T00:05:19Z INFO: DDoS port 53 against 192.168.65.2
The attack will be immmediately detected and logged in the sFlow-RT window.

The open source sFlow-RT ddos-protect application is a full featured DDoS mitigation solution that uses the advanced flow analytics features described in this article to detect a wide range of volumetric attacks. In addition, ddos-protect can automatically mitigate attacks using BGP remotely triggered blackhole (RTBH) or BGP Flowspec actions. DDoS protection quickstart guide describes how to test, deploy, and monitor the DDoS mitigation solution with examples using Arista, Cisco, and Juniper routers.

Friday, July 1, 2022

SR Linux in Containerlab

This article uses Containerlab to emulate a simple network and experiment with Nokia SR Linux and sFlow telemetry. Containerlab provides a convenient method of emulating network topologies and configurations before deploying into production on physical switches.

curl -O https://raw.githubusercontent.com/sflow-rt/containerlab/master/srlinux.yml

Download the Containerlab topology file.

containerlab deploy -t srlinux.yml

Deploy the topology.

docker exec -it clab-srlinux-h1 traceroute 172.16.2.2

Run traceroute on h1 to verify path to h2.

traceroute to 172.16.2.2 (172.16.2.2), 30 hops max, 46 byte packets
 1  172.16.1.1 (172.16.1.1)  2.234 ms  *  1.673 ms
 2  172.16.2.2 (172.16.2.2)  0.944 ms  0.253 ms  0.152 ms

Results show path to h2 (172.16.2.2) via router interface (172.16.1.1).

docker exec -it clab-srlinux-switch sr_cli

Access SR Linux command line on switch.

Using configuration file(s): []
Welcome to the srlinux CLI.
Type 'help' (and press <ENTER>) if you need any help using this.
--{ + running }--[  ]--                                                                                                                                                              
A:switch#

SR Linux CLI describes how to use the interface.

A:switch# show system sflow status

Get status of sFlow telemetry.

-------------------------------------------------------------------------
Admin State            : enable
Sample Rate            : 10
Sample Size            : 256
Total Samples          : <Unknown>
Total Collector Packets: 0
-------------------------------------------------------------------------
  collector-id     : 1
  collector-address: 172.100.100.5
  network-instance : default
  source-address   : 172.100.100.6
  port             : 6343
  next-hop         : <Unresolved>
-------------------------------------------------------------------------
-------------------------------------------------------------------------
--{ + running }--[  ]--

The output shows settings and operational status for sFlow configured on the switch.

Connect to the sFlow-RT Metric Browser application, http://localhost:8008/app/browse-metrics/html/index.html?metric=ifinoctets.

docker exec -it clab-srlinux-h1 iperf3 -c 172.16.2.2

Run an iperf3 throughput test between h1 and h2.

Connecting to host 172.16.2.2, port 5201
[  5] local 172.16.1.2 port 38522 connected to 172.16.2.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   675 KBytes  5.52 Mbits/sec   54   22.6 KBytes       
[  5]   1.00-2.00   sec   192 KBytes  1.57 Mbits/sec   18   12.7 KBytes       
[  5]   2.00-3.00   sec   255 KBytes  2.09 Mbits/sec   16   1.41 KBytes       
[  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    9   1.41 KBytes       
[  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec   18   1.41 KBytes       
[  5]   5.00-6.00   sec   255 KBytes  2.08 Mbits/sec   17   1.41 KBytes       
[  5]   6.00-7.00   sec   191 KBytes  1.56 Mbits/sec   10   8.48 KBytes       
[  5]   7.00-8.00   sec   191 KBytes  1.56 Mbits/sec    7   7.07 KBytes       
[  5]   8.00-9.00   sec   191 KBytes  1.56 Mbits/sec    7   8.48 KBytes       
[  5]   9.00-10.00  sec   191 KBytes  1.56 Mbits/sec   10   7.07 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.09 MBytes  1.75 Mbits/sec  166             sender
[  5]   0.00-10.27  sec  1.59 MBytes  1.30 Mbits/sec                  receiver

iperf Done.

The iperf3 test results show that the SR Linux dataplane implementation in the Docker container has severely limited forwarding performance, making the image unsuitable for emulating network traffic. In addition, the emulated forwarding plane does not support the packet sampling mechanism available on hardware switches that allows sFlow to provide real-time visibility into traffic flowing through the switch.

For readers interested in performance monitoring, the https://github.com/sflow-rt/containerlab repository provides examples showing performance monitoring of data center leaf / spine fabrics, EVPN visibility, and DDoS mitigation using Containerlab. These examples use Linux as a network operating system. In this case, the containers running on Containerlab use the Linux dataplane for maximumum performance. On physical switch hardware the Linux kernel can offload dataplane forwarding to the switch ASIC for line rate performance. 

containerlab destroy -t srlinux.yml

When you are finished, run the above command to stop the containers and free the resources associated with the emulation.

Thursday, June 2, 2022

Using Ixia-c to test RTBH DDoS mitigation

Remote Triggered Black Hole Scenario describes how to use the Ixia-c traffic generator to simulate a DDoS flood attack. Ixia-c supports the Open Traffic Generator API that is used in the article to program two traffic flows: the first representing normal user traffic (shown in blue) and the second representing attack traffic (show in red).

The article goes on to demonstrate the use of remotely triggered black hole (RTBH) routing to automatically mitigate the simulated attack. The chart above shows traffic levels during two simulated attacks. The DDoS mitigation controller is disabled during the first attack. Enabling the controller for the second attack causes to attack traffic to be dropped the instant it crosses the threshold.

The diagram shows the Containerlab topology used in the Remote Triggered Black Hole Scenario lab (which can run on a laptop). The Ixia traffic generator's eth1 interface represents the Internet and its eth2 interface represents the Customer Network being attacked. Industry standard sFlow telemetry from the Customer router, ce-router, streams to the DDoS mitigation controller (running an instance of DDoS Protect). When the controller detects a denial of service attack it pushed a control via BGP to the ce-router, which in turn pushes the control upstream to the service provider router, pe-router, which drops the attack traffic to prevent flooding of the ISP Circuit that would otherwise disrupt access to the Customer Network.

Arista, Cisco, and Juniper have added sFlow support to their BGP routers, see Real-time flow telemetry for routers, making it straightforward to take this solution from the lab to production. Support for Open Traffic Generator API across a range of platforms makes it possible to develop automated tests in the lab environment and apply them to production hardware.

Tuesday, April 26, 2022

BGP Remotely Triggered Blackhole (RTBH)

DDoS attacks and BGP Flowspec responses describes how to simulate and mitigate common DDoS attacks. This article builds on the previous examples to show how BGP Remotely Triggered Blackhole (RTBH) controls can be applied in situations where BGP Flowpsec is not available, or is unsuitable as a mitigation response.
docker run --rm -it --privileged --network host --pid="host" \
  -v /var/run/docker.sock:/var/run/docker.sock -v /run/netns:/run/netns \
  -v ~/clab:/home/clab -w /home/clab \
  ghcr.io/srl-labs/clab bash
Start Containerlab.
curl -O https://raw.githubusercontent.com/sflow-rt/containerlab/master/ddos.yml
Download the Containerlab topology file.
sed -i "s/\\.ip_flood\\.action=filter/\\.ip_flood\\.action=drop/g" ddos.yml
Change mitigation policy for IP Flood attacks from Flowspec filter to RTBH.
containerlab deploy -t ddos.yml
Deploy the topology.
Access the DDoS Protect screen at http://localhost:8008/app/ddos-protect/html/
docker exec -it clab-ddos-attacker hping3 \
--flood --rawip -H 47 192.0.2.129
Launch an IP Flood attack. The DDoS Protect dashboard shows that as soon as the ip_flood attack traffic reaches the threshold a control is implemented and the attack traffic is immediately dropped. The entire process between the attack being launched, detected, and mitigated happens within a second, ensuring minimal impact on network capacity and services.
docker exec -it clab-ddos-sp-router vtysh -c "show running-config"
See sp-router configuration.
Building configuration...

Current configuration:
!
frr version 8.2.2_git
frr defaults datacenter
hostname sp-router
no ipv6 forwarding
log stdout
!
ip route 203.0.113.2/32 Null0
!
interface eth2
 ip address 198.51.100.1/24
exit
!
router bgp 64496
 bgp bestpath as-path multipath-relax
 bgp bestpath compare-routerid
 neighbor fabric peer-group
 neighbor fabric remote-as external
 neighbor fabric description Internal Fabric Network
 neighbor fabric ebgp-multihop 255
 neighbor fabric capability extended-nexthop
 neighbor eth1 interface peer-group fabric
 no neighbor eth1 capability extended-nexthop
 !
 address-family ipv4 unicast
  redistribute connected route-map HOST_ROUTES
  neighbor fabric route-map RTBH in
 exit-address-family
 !
 address-family ipv4 flowspec
  neighbor fabric activate
 exit-address-family
exit
!
bgp community-list standard BLACKHOLE seq 5 permit blackhole
!
route-map HOST_ROUTES permit 10
 match interface eth2
exit
!
route-map RTBH permit 10
 match community BLACKHOLE
 set ip next-hop 203.0.113.2
exit
!
route-map RTBH permit 20
exit
!
ip nht resolve-via-default
!
end
The configuration creates null route for 203.0.113.2/32 and rewrites the next-hop address to 203.0.113.2 for routes that are marked with the BGP blackhole community.
docker exec -it clab-ddos-sp-router vtysh -c "show ip route"
Show the forwarding state on sp-router.
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

K>* 0.0.0.0/0 [0/0] via 172.100.100.1, eth0, 12:36:08
C>* 172.100.100.0/24 is directly connected, eth0, 12:36:08
B>* 192.0.2.0/24 [20/0] via fe80::a8c1:abff:fe32:b21e, eth1, weight 1, 12:36:03
B>  192.0.2.129/32 [20/0] via 203.0.113.2 (recursive), weight 1, 00:00:04
  *                         unreachable (blackhole), weight 1, 00:00:04
C>* 198.51.100.0/24 is directly connected, eth2, 12:36:08
S>* 203.0.113.2/32 [1/0] unreachable (blackhole), weight 1, 12:36:08
Traffic to the victim IP address, 192.0.2.129, is directed to the 203.0.113.2 next-hop, where it is discarded before it can saturate the link to the customer router, ce-router.

Monday, April 4, 2022

Real-time flow telemetry for routers

The last few years have seen leading router vendors add support for sFlow monitoring technology that has long been the industry standard for switch monitoring. Router implementations of sFlow include:
  • Arista 7020R Series Routers, 7280R Series Routers, 7500R Series Routers, 7800R3 Series Routers
  • Cisco 8000 Series Routers, ASR 9000 Series Routers, NCS 5500 Series Routers
  • Juniper ACX Series Routers, MX Series Routers, PTX Series Routers
  • Huawei NetEngine 8000 Series Routers
Broad support of sFlow in both switching and routing platforms ensures comprehensive end-to-end monitoring of traffic, see sFlow.org Network Equipment for a list of vendors and products.
Note: Most routers also support Cisco Netflow/IPFIX. Rapidly detecting large flows, sFlow vs. NetFlow/IPFIX describes why you should choose sFlow if you are interested in real-time monitoring and control applications.
DDoS mitigation is a popular use case for sFlow telemetry in routers. The combination of sFlow for real-time DDoS detection with BGP RTBH / Flowspec mitigation on routing platforms makes for a compelling solution.
DDoS protection quickstart guide describes how to deploy sFlow along with BGP RTBH/Flowspec to automatically detect and mitigate DDoS flood attacks. The use of sFlow provides sub-second visibility into network traffic during the periods of high packet loss experienced during a DDoS attack. The result is a system that can reliably detect and respond to attacks in real-time.

The following links provide detailed configuration examples:

The examples above make use of the sFlow-RT real-time analytics platform, which provides real-time visibility to drive Software Defined Networking (SDN), DevOps and Orchestration tasks.

Tuesday, March 22, 2022

DDoS attacks and BGP Flowspec responses

This article describes how to use the Containerlab DDoS testbed to simulate variety of flood attacks and observe the automated mitigation action designed to eliminate the attack traffic.

docker run --rm -it --privileged --network host --pid="host" \
  -v /var/run/docker.sock:/var/run/docker.sock -v /run/netns:/run/netns \
  -v ~/clab:/home/clab -w /home/clab \
  ghcr.io/srl-labs/clab bash
Start Containerlab.
curl -O https://raw.githubusercontent.com/sflow-rt/containerlab/master/ddos.yml
Download the Containerlab topology file.
containerlab deploy -t ddos.yml
Deploy the topology and access the DDoS Protect screen at http://localhost:8008/app/ddos-protect/html/
docker exec -it clab-ddos-sp-router vtysh -c "show bgp ipv4 flowspec detail"

At any time, run the command above to see the BGP Flowspec rules installed on the sp-router. Simulate the volumetric attacks using hping3.

Note: While the hping3 --rand-source option to generate packets with random source addresses would create a more authentic DDoS attack simulation, the option is not used in these examples because the victims responses to the attack packets (ICMP Port Unreachable) will be sent back to the random addresses and may leak out of the Containerlab test network. Instead varying source / destination ports are used to create entropy in the attacks. 

When you are finished trying the examples below, run the following command to stop the containers and free the resources associated with the emulation.

containerlab destroy -t ddos.yml

Moving the DDoS mitigation solution from Containerlab to production is straighforward since sFlow and BGP Flowspec are widely available in routing platforms. The articles Real-time DDoS mitigation using BGP RTBH and FlowSpec, DDoS Mitigation with Cisco, sFlow, and BGP Flowspec, DDoS Mitigation with Juniper, sFlow, and BGP Flowspec, provide configuration examples for Arista, Cisco, Juniper routers respectively.

IP Flood

IP packets by target address and protocol. This is a catch all signature that will catch any volumetric attack against a protected address. Setting a high threshold for the generic flood attack allows the other, more targetted, signatures to trigger first and provide a more nuanced response.

docker exec -it clab-ddos-attacker hping3 \
--flood --rawip -H 47 192.0.2.129
Launch simulated IP flood attack against 192.0.2.129 using IP protocol 47 (GRE).
BGP flowspec entry: (flags 0x498)
	Destination Address 192.0.2.129/32
	IP Protocol = 47 
	FS:rate 0.000000
	received for 00:00:12
	not installed in PBR

Resulting Flowspec entry in sp-router. The filter blocks all IP protocol 47 (GRE) traffic to the targetted address 192.0.2.129.

IP Fragmentation

IP fragments by target address and protocol. There should be very few fragmented packets on a well configured network and the attack is typically designed to exhaust host resources, so a low threshold can be used to quickly mitigate these attacks.

docker exec -it clab-ddos-attacker hping3 \
--flood -f  -p ++1024 192.0.2.129
Launch simulated IP fragmentation attack against 192.0.2.129.
BGP flowspec entry: (flags 0x498)
	Destination Address 192.0.2.129/32
	IP Protocol = 6 
	Packet Fragment = 2
	FS:rate 0.000000
	received for 00:00:10
	not installed in PBR

Resulting Flowspec entry in sp-router. The filter blocks fragemented packets to the targetted address 192.0.2.129.

ICMP Flood

ICMP packets by target address and type. Examples include Ping Flood and Smurf attacks. 

docker exec -it clab-ddos-attacker hping3 \
--flood --icmp -C 0 192.0.2.129
Launch simulated ICMP flood attack using ICMP type 0 (Echo Reply) packets against 192.0.2.129.
BGP flowspec entry: (flags 0x498)
	Destination Address 192.0.2.129/32
	IP Protocol = 1 
	ICMP Type = 0 
	FS:rate 0.000000
	received for 00:00:13
	not installed in PBR

Resulting Flowspec entry in sp-router. The filter blocks ICMP type 0 packets to the targetted address 192.0.2.129.

UDP Flood

UDP packets by target address and destination port. The UDP flood attack can be designed to overload the targetted service or exhaust resources on middlebox devices. A UDP flood attack can also trigger a flood of ICMP Destination Port Unreachable responses from the targetted host to the, often spoofed, UDP packet source addresses.

docker exec -it clab-ddos-attacker hping3 \
--flood --udp -p 53 192.0.2.129
Launch simulated UDP flood attack against port 53 (DNS) on 192.0.2.129.
BGP flowspec entry: (flags 0x498)
	Destination Address 192.0.2.129/32
	IP Protocol = 17 
	Destination Port = 53 
	FS:rate 0.000000
	received for 00:00:13
	not installed in PBR

Resulting Flowspec entry in sp-router. The filter blocks UDP packets with destination port 53 to the targetted address 192.0.2.129.

UDP Amplification

UDP packets by target address and source port. Examples include DNS, NTP, SSDP, SNMP, Memcached, and CharGen reflection attacks. 

docker exec -it clab-ddos-attacker hping3 \
--flood --udp -k -s 53 -p ++1024 192.0.2.129

Launch simulated UDP amplification attack using port 53 (DNS) as an amplifier to target 192.0.2.129.

BGP flowspec entry: (flags 0x498)
	Destination Address 192.0.2.129/32
	IP Protocol = 17 
	Source Port = 53 
	FS:rate 0.000000
	received for 00:00:43
	not installed in PBR

Resulting Flowspec entry in sp-router. The filter blocks UDP packets with source port 53 to the targetted address 192.0.2.129.

TCP Flood

TCP packets by target address and destination port. A TCP flood attack can also trigger a flood of ICMP Destination Port Unreachable responses from the targetted host to the, often spoofed, TCP packet source addresses.

This signature does not look at TCP flags, for example to identify SYN flood attacks, since Flowspec filters are stateless and filtering all packets with the SYN flag set would effectively block all connections to the target host. However, this control can help mitigate large volume TCP flood attacks (through the use of a limit or redirect Flowspec action) so that the traffic doesn't overwhelm layer 4 mitigation running on load balancers or hosts, using SYN cookies for example.

docker exec -it clab-ddos-attacker hping3 \
--flood -p 80 192.0.2.129

Launch simulated TCP flood attack against port 80 (HTTP) on 192.0.2.129.

BGP flowspec entry: (flags 0x498)
	Destination Address 192.0.2.129/32
	IP Protocol = 6 
	Destination Port = 80 
	FS:rate 0.000000
	received for 00:00:17
	not installed in PBR

Resulting Flowspec entry in sp-router. The filter blocks TCP packets with destination port 80 to the targetted address 192.0.2.129.

TCP Amplification

TCP SYN-ACK packets by target address and source port. In this case filtering on the TCP flags is very useful, effectively blocking the reflection attack, while allowing connections to the target host. Recent examples target vulerable middlebox devices to amplify the TCP reflection attack.

docker exec -it clab-ddos-attacker hping3 \
--flood -k -s 80 -p ++1024 -SA 192.0.2.129
Launch simulated TCP amplification attack using port 80 (HTTP) as an amplifier to target 192.0.2.129.
BGP flowspec entry: (flags 0x498)
	Destination Address 192.0.2.129/32
	IP Protocol = 6 
	Source Port = 80 
	TCP Flags = 18
	FS:rate 0.000000
	received for 00:00:11
	not installed in PBR

Resulting Flowspec entry in sp-router. The filter blocks TCP packets with source port 80 to the targetted address 192.0.2.129.

Wednesday, March 16, 2022

Containerlab DDoS testbed

Real-time telemetry from a 5 stage Clos fabric describes lightweight emulation of realistic data center switch topologies using Containerlab. This article extends the testbed to experiment with distributed denial of service (DDoS) detection and mitigation techniques described in Real-time DDoS mitigation using BGP RTBH and FlowSpec.
docker run --rm -it --privileged --network host --pid="host" \
  -v /var/run/docker.sock:/var/run/docker.sock -v /run/netns:/run/netns \
  -v ~/clab:/home/clab -w /home/clab \
  ghcr.io/srl-labs/clab bash
Start Containerlab.
curl -O https://raw.githubusercontent.com/sflow-rt/containerlab/master/ddos.yml
Download the Containerlab topology file.
containerlab deploy -t ddos.yml
Finally, deploy the topology.
Connect to the web interface, http://localhost:8008. The sFlow-RT dashboard verifies that telemetry is being received from 1 agent (the Customer Network, ce-router, in the diagram above). See the sFlow-RT Quickstart guide for more information.
Now access the DDoS Protect application at http://localhost:8008/app/ddos-protect/html/. The BGP chart at the bottom right verifies that BGP connection has been established so that controls can be sent to the Customer Router, ce-router.
docker exec -it clab-ddos-attacker hping3 --flood --udp -k -s 53 192.0.2.129
Start a simulated DNS amplification attack using hping3.
The udp_amplification chart shows that traffic matching the attack signature has crossed the threshold. The Controls chart shows that a control blocking the attack is Active.
Clicking on the Controls tab shows a list of the active rules. In this case the target of the attack 192.0.2.129 and the protocol 53 (DNS) has been identified.
docker exec -it clab-ddos-sp-router vtysh -c "show bgp ipv4 flowspec detail"
The above command inspects the BGP Flowspec rules on Service Provider, sp-router, router.
BGP flowspec entry: (flags 0x498)
	Destination Address 192.0.2.129/32
	IP Protocol = 17 
	Source Port = 53 
	FS:rate 0.000000
	received for 00:01:41
	not installed in PBR

Displayed  1 flowspec entries
The output verifies that the filtering rule to block the DDoS attack has been received by the Transit Provider router, sp-router, where it can block the traffic and protect the customer network. However, the not installed in PBR message indicates that the filter hasn't been installed since the FRRouting software being used for this demonstration currently doesn't have the required functionality. Once FRRouting adds support for filtering using Linux tc flower, it will be possible to use BGP Flowspec to block attacks at line rate on commodity white box hardware, see  Linux as a network operating system.
containerlab destroy -t ddos.yml
When you are finished, run the above command to stop the containers and free the resources associated with the emulation.

Moving the DDoS mitigation solution from Containerlab to production is straighforward since sFlow and BGP Flowspec are widely available in routing platforms. The articles Real-time DDoS mitigation using BGP RTBH and FlowSpec, DDoS Mitigation with Cisco, sFlow, and BGP Flowspec, DDoS Mitigation with Juniper, sFlow, and BGP Flowspec, provide configuration examples for Arista, Cisco, Juniper routers respectively.

Monday, March 14, 2022

Real-time EVPN fabric visibility

Real-time telemetry from a 5 stage Clos fabric describes lightweight emulation of realistic data center switch topologies using Containerlab. This article builds on the example to demonstrate visibility into Ethernet Virtual Private Network (EVPN) traffic as it crosses a routed leaf and spine fabric.
docker run --rm -it --privileged --network host --pid="host" \
  -v /var/run/docker.sock:/var/run/docker.sock -v /run/netns:/run/netns \
  -v ~/clab:/home/clab -w /home/clab \
  ghcr.io/srl-labs/clab bash
Start Containerlab.
curl -O https://raw.githubusercontent.com/sflow-rt/containerlab/master/evpn3.yml
Download the Containerlab topology file.
containerlab deploy -t evpn3.yml
Finally, deploy the topology.
docker exec -it clab-evpn3-leaf1 vtysh -c "show running-config"
See configuration of leaf1 switch.
Building configuration...

Current configuration:
!
frr version 8.1_git
frr defaults datacenter
hostname leaf1
no ipv6 forwarding
log stdout
!
router bgp 65001
 bgp bestpath as-path multipath-relax
 bgp bestpath compare-routerid
 neighbor fabric peer-group
 neighbor fabric remote-as external
 neighbor fabric description Internal Fabric Network
 neighbor fabric capability extended-nexthop
 neighbor eth1 interface peer-group fabric
 neighbor eth2 interface peer-group fabric
 !
 address-family ipv4 unicast
  network 192.168.1.1/32
 exit-address-family
 !
 address-family l2vpn evpn
  neighbor fabric activate
  advertise-all-vni
 exit-address-family
exit
!
ip nht resolve-via-default
!
end
The loopback address on the switch, 192.168.1.1/32, is advertised to neighbors so that the VxLAN tunnel endpoint is known to switches in the fabric. The address-family l2vpn evpn setting exchanges bridge tables across BGP connections so that they operate as a single virtual bridge.
docker exec -it clab-evpn3-h1 ping -c 3 172.16.10.2
Ping h2 from h1
PING 172.16.10.2 (172.16.10.2): 56 data bytes
64 bytes from 172.16.10.2: seq=0 ttl=64 time=0.346 ms
64 bytes from 172.16.10.2: seq=1 ttl=64 time=0.466 ms
64 bytes from 172.16.10.2: seq=2 ttl=64 time=0.152 ms

--- 172.16.10.2 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.152/0.321/0.466 ms
The results verify that there is layer 2 connectivity between the two hosts.
docker exec -it clab-evpn3-leaf1 vtysh -c "show evpn vni"
List the Virtual Network Identifiers (VNIs) on leaf1.
VNI        Type VxLAN IF              # MACs   # ARPs   # Remote VTEPs  Tenant VRF                           
10         L2   vxlan10               2        0        1               default
We can see one virtual network, VNI 10.
docker exec -it clab-evpn3-leaf1 vtysh -c "show evpn mac vni 10"
Show the virtual bridge table for VNI 10 on leaf1.
Number of MACs (local and remote) known for this VNI: 2
Flags: N=sync-neighs, I=local-inactive, P=peer-active, X=peer-proxy
MAC               Type   Flags Intf/Remote ES/VTEP            VLAN  Seq #'s
aa:c1:ab:25:7f:a2 remote       192.168.1.2                          0/0
aa:c1:ab:25:76:ee local        eth3                                 0/0
The MAC address, aa:c1:ab:25:76:ee, is reported as locally attached to port eth3. The MAC address, aa:c1:ab:25:7f:a2, is reported as remotely accessable through the VxLAN tunnel to 192.168.1.2, the loopback address for leaf2.
The screen capture shows a real-time view of traffic flowing across the network during an iperf3 test. Connect to the sFlow-RT Flow Browser application, http://localhost:8008/app/browse-flows/html/, or click here for a direct link to a chart with the settings shown above.

The chart shows VxLAN encapsulated Ethernet packets routed across the leaf and spine fabric. The inner and outer addresses are shown, allowing the flow to be traced end-to-end. See Defining Flows for more information.

docker exec -it clab-evpn3-h1 iperf3 -c 172.16.10.2
Each of the hosts in the network has an iperf3 server, so running the above command will test bandwidth between h1 and h2. The flow should immediately appear in the Flow Browser chart.
containerlab destroy -t evpn3.yml
When you are finished, run the above command to stop the containers and free the resources associated with the emulation.

Moving the monitoring solution from Containerlab to production is straightforward since sFlow is widely implemented in datacenter equipment from vendors including: A10, Arista, Aruba, Cisco, Edge-Core, Extreme, Huawei, Juniper, NEC, Netgear, Nokia, NVIDIA, Quanta, and ZTE.

Monday, March 7, 2022

Who monitors the monitoring systems?

Adrian Cockroft poses an interesting question in, Who monitors the monitoring systems? He states, The first thing that would be useful is to have a monitoring system that has failure modes which are uncorrelated with the infrastructure it is monitoring. ... I don’t know of a specialized monitor-of-monitors product, which is one reason I wrote this blog post.

This article offers a response, describing how to introduce an uncorrelated monitor-of-monitors into the data center to provide real-time visibility that survives when the primary monitoring systems fail.

Summary of the AWS Service Event in the Northern Virginia (US-EAST-1) Region, This congestion immediately impacted the availability of real-time monitoring data for our internal operations teams, which impaired their ability to find the source of congestion and resolve it. December 10th, 2021

Standardizing on a small set of communication primitives (gRPC, Thrift, Kafka, etc.) simplifies the creation of large scale distributed services. The communication primitives abstract the physical network to provide reliable communication to support distributed services running on compute nodes. Monitoring is typically regarded as a distributed service that is part of the compute infrastructure, relying on agents on compute nodes to transmit measurements to scale out analysis, storage, automation, and presentation services.

System wide failures occur when the network abstraction fails and the limitations of the physical network infrastructure intrude into the overlaid monitoring, compute, storage, and application services. Physical network service failure is the Achilles heel of distributed compute infrastructure since the physical network is the glue that ties the infrastructure together.

However, the physical network is also a solution to the monitoring problem, providing an independent uncorrelated point of observation that has visibility into all the physical network resources and the services that run over them.

Reinventing Facebook’s data center network describes the architecture of a data center network. The network is a large distributed system made up of thousands of hardware switches and the tens of thousands of optical links that connect them. The network can route trillions of packets per second and has a capacity measured in Petabits/second.

The industry standard sFlow measurement system, implemented by all major data center network equipment vendors, is specifically designed to address the challenge of monitoring large scale switched networks. The sFlow specification embeds standard instrumentation into the hardware of each switch, making monitoring an integral part of the function of the switch, and ensuring that monitoring capacity scales out as network size increases.

UDP vs TCP for real-time streaming telemetry describes how sFlow uses UDP to transport measurements from the network devices to the analytics software in order to maintain real-time visibility during network congestion events. The graphs from the article demonstrate that the different transport characteristics of UDP and TCP decouple the physical network monitoring system from the primary monitoring systems which typically depend on TCP services.

The diagram at the top of this article shows the elements of an sFlow monitoring system designed to provide independent visibility into large scale data center network infrastructure.

Each network device can be configured to send sFlow telemetry to at least two destinations. In the diagram two independent instances of the sFlow-RT real-time analytics engine are deployed on two separate compute nodes, each receiving its own stream of telemetry from all devices in the network. Since both sFlow-RT instances receive the same telemetry stream, they independently maintain copies of the network state that can be separately queried, ensuring availability in the event of a node failure.

Note: Tollerance to packet loss allows sFlow to be sent inband to reduce load on out-of-band management networks without significant loss of visibility. 

Scalability is ensured by sFlow's scale out hardware-based instrumentation. Leveraging the collective capability of the distributed instrumentation allows each sFlow-RT instance to monitor the current state of the entire data center network.

Tuning Performance describes how to optimize sFlow-RT performance for large scale monitoring.

The compute nodes hosting sFlow-RT instances should be physically separate from production compute service and have an independent network with out of band access so that measurements are available in cases were network performance issues have disrupted the production compute service.

This design is intentionally minimalist in order to reliably deliver real-time visibility during periods of extreme network congestion when all other systems fail.

In Failure Modes and Continuous Resilience, Adrian Cockroft states, The first technique is the most generally useful. Concentrate on rapid detection and response. In the end, when you’ve done everything you can do to manage failures you can think of, this is all you have left when that weird complex problem that no-one has ever seen before shows up!

With the state of the network stored in memory, each sFlow-RT instance provides an up to the second view of network performance with query response times measured in milliseconds. Queries are optimized to quickly diagnose network problems:

  • Detect network congestion
  • Identify traffic driving congestion
  • Locate control points to mitigate congestion
Real-time traffic analytics can drive automated remediation for common classes of problem to rapidly restore network service. Automated responses include:
  • Filtering out DDoS traffic
  • Shed non-essential services
  • Re-routing traffic
  • Shaping traffic
Of course, reliable control actions require an out-of-band management network so that controls can be deployed even if the production network is failing. Once physical network congestion problems have been addressed, restored monitoring services will provide visibility into production services and allow normal operations to resume. Real-time physical network visibility allows automated control actions to resolve physical network congestion problems within seconds so that there is negligible impact on the production systems.

Physical network visibility has utility beyond addressing extreme congestion events, feeding physical network flow analytics into production monitoring systems augments visibility and provides useful information for intrusion detection, capacity planning, workload placement, autoscaling, and service optimization. In addition, comparing the independently generated flow metrics profiling production services with the metrics generated by the production monitoring systems provides an early warning of  monitoring system reliability. Finally, the low latency of network flow metrics are a leading indicator of performance that can be used to improve the agility of production monitoring and control.

Getting started with sFlow is easy. Ideally, enable sFlow monitoring on the switches in a pod and install an instance of sFlow-RT to see traffic in your own environment, alternatively, Real-time telemetry from a 5 stage Clos fabric describes a lightweight emulation of realistic data center switch topologies using Docker and Containerlab that allows you to experiment with real-time sFlow telemetry and analytics before deploying into production.

Tuesday, March 1, 2022

DDoS Mitigation with Cisco, sFlow, and BGP Flowspec

DDoS protection quickstart guide shows how sFlow streaming telemetry and BGP RTBH/Flowspec are combined by the DDoS Protect application running on the sFlow-RT real-time analytics engine to automatically detect and block DDoS attacks.

This article discusses how to deploy the solution in a Cisco environment. Cisco has a long history of supporting BGP Flowspec on their routing platforms and has recently added support for sFlow, see Cisco 8000 Series routersCisco ASR 9000 Series Routers, and Cisco NCS 5500 Series Routers.

First, IOS-XR doesn't provide a way to connect to the non-standard BGP port (1179) that sFlow-RT uses by default. Allowing sFlow-RT to open the standard BGP port (179) requires that the service be given additional Linux capabilities.

docker run --rm --net=host --sysctl net.ipv4.ip_unprivileged_port_start=0 \
sflow/ddos-protect -Dbgp.port=179

The above command launches the prebuilt sflow/ddos-protect Docker image. Alternatively, if sFlow-RT has been installed as a deb / rpm package, then the required permissions can be added to the service.

sudo systemctl edit sflow-rt.service

Type the above command to edit the service configuration and add the following lines:

[Service]
AmbientCapabilities=CAP_NET_BIND_SERVICE

Next, edit the sFlow-RT configuration file for the DDoS Protect application:

sudo vi /usr/local/sflow-rt/conf.d/ddos-protect.conf

and add the line:

bgp.port=179

Finally, restart sFlow-RT:

sudo systemctl restart sflow-rt

The application is now listening for BGP connections on TCP port 179.

Now configure the router to send sFlow telemetry to sFlow-RT. The following commands configure an IOS-XR based router to sample packets at 1-in-20,000 and stream telemetry to an sFlow analyzer (192.127.0.1) on UDP port 6343.

flow exporter-map SF-EXP-MAP-1
 version sflow v5
 !
 packet-length 1468
 transport udp 6343
 source GigabitEthernet0/0/0/1
 destination 192.127.0.1
 dfbit set
!

Configure the sFlow analyzer address in an exporter-map.

flow monitor-map SF-MON-MAP
 record sflow
 sflow options
  extended-router
  extended-gateway
  if-counters polling-interval 300
  input ifindex physical
  output ifindex physical
 !
 exporter SF-EXP-MAP-1
!

Configure sFlow options in a monitor-map.

sampler-map SF-SAMP-MAP
 random 1 out-of 20000
!

Define the sampling rate in a sampler-map.

interface GigabitEthernet0/0/0/3
 flow datalinkframesection monitor-map SF-MON-MAP sampler SF-SAMP-MAP ingress

Enable sFlow on each interface for complete visibilty into network traffic.

Also configure a BGP Flowspec session with sFlow-RT

router bgp 1
  bgp router-id 3.3.3.3
  address-family ipv4 unicast
  address-family ipv4 flowspec
    route-policy AcceptAll in
    validation disable
    !
  !
  neighbor 25.2.1.11
    remote-as 1
    update-source loopback0
    address-family ipv4 flowspec
    !
  !
route-policy AcceptAll
  done
  end-policy
  !
flowspec
  local-install interface-all
  !

The above configuration establishes the BGP Flowspec session with sFlow-RT.

Real-time DDoS mitigation using BGP RTBH and FlowSpec describes how to simulate a DDoS UDP amplification attack in order to test the automated detection and control functionality.  

RP/0/RP0/CPU0:ASR9000#**show flowSpec afi-all**
Tue Jan 25 08:17:30.791 UTC

AFI: IPv4

 **Flow      :Dest:192.0.2.129/32,DPort:=53/2**
  Actions   :**Traffic-rate: 0 bps** (bgp.1)
RP/0/RP0/CPU0:ASR9000#

Command line output from the router shown above verifies that a Flowspec control blocking the amplification attack has been received. The control will remain in place for 60 minutes (the configured timeout), after which it will be automatically withdrawn. If the attack is still in progress it will be immediately detected and the control reapplied.

DDoS Protect can mitigate a wide range of common attacks, including: NTP, DNS, Memcached, SNMP, and SSDP amplification attacks; IP, UDP, ICMP and TCP flood attacks; and IP fragmentation attacks. Mitigation options include: remote triggered black hole (RTBH), filtering, rate limiting, DSCP marking, and redirection. IPv6 is fully supported in detection and mitigation of each of these attack types.

Monday, February 28, 2022

Topology aware fabric analytics

Real-time telemetry from a 5 stage Clos fabric describes how to emulate and monitor the topology shown above using Containerlab and sFlow-RT. This article extends the example to demonstrate how topology awareness enhances analytics.
docker run --rm -it --privileged --network host --pid="host" \
  -v /var/run/docker.sock:/var/run/docker.sock -v /run/netns:/run/netns \
  -v ~/clab:/home/clab -w /home/clab \
  ghcr.io/srl-labs/clab bash
Start Containerlab.
curl -O https://raw.githubusercontent.com/sflow-rt/containerlab/master/clos5.yml
Download the Containerlab topology file.
sed -i "s/prometheus/topology/g" clos5.yml
Change the sFlow-RT image from sflow/prometheus to sflow/topology in the Containerlab topology. The sflow/topology image packages sFlow-RT with useful applications that combine topology awareness with analytics.
containerlab deploy -t clos5.yml
Deploy the topology.
curl -O https://raw.githubusercontent.com/sflow-rt/containerlab/master/clos5.json
Download the sFlow-RT topology file.
curl -X PUT -H "Content-Type: application/json" -d @clos5.json \
http://localhost:8008/topology/json
Post the topology to sFlow-RT.
Connect to the sFlow-RT Topology application, http://localhost:8008/app/topology/html/. The dashboard confirms that all the links and nodes in the topology are streaming telemetry. There is currently no traffic on the network, so none of the nodes in the topology are sending flow data.
docker exec -it clab-clos5-h1 iperf3 -c 172.16.4.2
Generate traffic. You should see the Nodes No Flows number drop as switches in the traffic path stream flow data. In a production network traffic flows through all the switches and so all switches will be streaming flow telemetry.
Click on the Locate tab and enter the address 172.16.4.2. Click submit to find the location of the address. Topology awareness allows sFlow-RT to examine streaming flow data from all the links in the network and determine switch ports and MAC addresses associated with the point of ingress into the fabric.
Connect to the sFlow-RT Fabric Metrics application, http://localhost:8008/app/fabric-metrics/html/. The Traffic charts identifies each iperf3 test as a large (elephant) flow - see SDN and large flows for a discussion of the importance of large flow visibility. The Busy Links chart reports the number of links in the topology with a utilization exceeding a threshold (in this case 80% utilization).
The Ports tab contains charts showing switch ports with the highest ingress/egress utilization, packet discards, and packet errors.
Connect to the sFlow-RT Trace Flow application,  http://localhost:8008/app/trace-flow/html/.
ipsource=172.16.1.2
Enter a traffic filter and click Submit.
docker exec -it clab-clos5-h1 iperf3 -c 172.16.4.2
Run an iperf3 test and the path the traffic takes across the network topology will be immediately displayed.
Connect to the sFlow-RT Flow Browser application, http://localhost:8008/app/browse-flows/html/index.html?keys=ipsource&value=bps. Run a sequence of iperf3 tests and the traffic will be immediately shown. Defining Flows describes flow attributes that can be added to the query to examine inner / outer tunneled traffic, VxLAN, MPLS, GRE, etc.

Trace Flow demonstrated that flow telemetry is streamed by each switch port along the path from source to destination host to deliver end-to-end visibility. Simply summing the data would overestimate the size of the flow. Adding topology information allows sFlow-RT to intelligently combine data, eliminating duplicate measurements, to provide accurate measurements of traffic crossing the fabric.

containerlab destroy -t clos5.yml
When you are finished, run the above command to stop the containers and free the resources associated with the emulation.

Moving the monitoring solution from Containerlab to production is straightforward since sFlow is widely implemented in datacenter equipment from vendors including: A10, Arista, Aruba, Cisco, Edge-Core, Extreme, Huawei, Juniper, NEC, Netgear, Nokia, NVIDIA, Quanta, and ZTE. In addition, the open source Host sFlow agent makes it easy to extend visibility beyond the physical network into the compute infrastructure.

Monday, February 21, 2022

Real-time telemetry from a 5 stage Clos fabric

CONTAINERlab described how to use FRRouting and Host sFlow in a Docker container to emulate switches in a Clos (leaf/spine) fabric. The recently released open source project, https://github.com/sflow-rt/containerlab, simplifies and automates the steps needed to build and monitor topologies.
docker run --rm -it --privileged --network host --pid="host" \
  -v /var/run/docker.sock:/var/run/docker.sock -v /run/netns:/run/netns \
  -v ~/clab:/home/clab -w /home/clab \
  ghcr.io/srl-labs/clab bash
Run the above command to start Containerlab if you already have Docker installed; the ~/clab directory will be created to persist settings. Otherwise, Installation provides detailed instructions for a variety of platforms.
curl -O https://raw.githubusercontent.com/sflow-rt/containerlab/master/clos5.yml
Next, download the topology file for the 5 stage Clos fabric shown at the top of this article.
containerlab deploy -t clos5.yml
Finally, deploy the topology.
Note: The 3 stage Clos topology, clos3.yml, described in the previous article is also available.
The initial launch may take a couple of minutes as the container images are downloaded for the first time. Once the images are downloaded, the topology deploys in around 10 seconds.
An instance of the sFlow-RT real-time analytics engine receives industry standard sFlow telemetry from all the switches in the network. All of the switches in the topology are configured to send sFlow to the sFlow-RT instance. In this case, Containerlab is running the pre-built sflow/prometheus image which packages sFlow-RT with useful applications for exploring the data.
Connect to the web interface, http://localhost:8008. The sFlow-RT dashboard verifies that telemetry is being received from 10 agents (the 10 switches in the Clos fabric). See the sFlow-RT Quickstart guide for more information.
The screen capture shows a real-time view of traffic flowing across the network during a series iperf3 tests. Click on the sFlow-RT Apps menu and select the browse-flows application, or click here for a direct link to a chart with the settings shown above.
docker exec -it clab-clos5-h1 iperf3 -c 172.16.4.2
Each of the hosts in the network has an iperf3 server, so running the above command will test bandwidth between h1 and h4.
docker exec -it clab-clos5-leaf1 vtysh
Linux with open source routing software (FRRouting) is an accessible alternative to vendor routing stacks (no registration / license required, no restriction on copying means you can share images on Docker Hub, no need for virtual machines).  FRRouting is popular in production network operating systems (e.g. Cumulus Linux, SONiC, DENT, etc.) and the VTY shell provides an industry standard CLI for configuration, so labs built around FRR allow realistic network configurations to be explored.
containerlab destroy -t clos5.yml
When you are finished, run the above command to stop the containers and free the resources associated with the emulation.

Moving the monitoring solution from Containerlab to production is straightforward since sFlow is widely implemented in datacenter equipment from vendors including: A10, Arista, Aruba, Cisco, Edge-Core, Extreme, Huawei, Juniper, NEC, Netgear, Nokia, NVIDIA, Quanta, and ZTE. In addition, the open source Host sFlow agent makes it easy to extend visibility beyond the physical network into the compute infrastructure.