sFlow: October 2020

Thursday, October 29, 2020

DDoS protection of local address space

Docker DDoS testbed describes how to use Docker Desktop to experiment with Real-time DDoS mitigation using BGP RTBH and FlowSpec. In this article, Real-time BGP route analytics are used to automatically classify address space, replacing the manually configured classification in the previous example.

Routers supporting the sFlow extended_gateway extension include BGP routing information as part of the exported telemetry stream. Real-time DDoS mitigation using BGP RTBH and FlowSpec describes how to configure an Arista router.

sflow sample 16384
sflow polling-interval 30
sflow extension bgp
sflow destination 10.0.0.70
sflow run

Adding the highlighted command to the sFlow configuration above enables the extended_gateway extension.

The alternative if the router doesn't support the extended_gateway extension, or doesn't support sFlow at all, sFlow-RT can be configured to match up sFlow streams from switches with routes discovered via BGP from routers in order to perform the route analytics needed to automatically classify DDoS attacks. The Docker DDoS testbed has separate sFlow and BGP agents, and so requires the use of this technique.

Start a Host sFlow agent using the pre-built sflow/host-sflow image:

docker run --rm -d -e "COLLECTOR=host.docker.internal" -e "SAMPLING=10" \
--net=host -v /var/run/docker.sock:/var/run/docker.sock:ro \
--name=host-sflow sflow/host-sflow

Start ExaBGP using the pre-built sflow/exabgp image. ExaBGP connects to the sFlow-RT analytics software and displays BGP RTBH / Flowspec controls sent by sFlow-RT:

docker run --rm --name=exabgp sflow/exabgp

Run sFlow-RT:

AGENT=`docker exec host-sflow awk -F = '/agentIP/ { print $2 }' /etc/hsflowd.auto`

GW=`docker network inspect bridge -f '{{range .IPAM.Config}}{{.Gateway}}{{end}}'`

docker run --rm -p 6343:6343/udp -p 8008:8008 -p 1179:1179 --name=sflow-rt \
sflow/ddos-protect -Dddos_protect.router=$GW -Dddos_protect.as=65001 \
-Dddos_protect.enable.flowspec=yes -Dddos_protect.bgpgroup=local \
-Dddos_protect.router.0.agent=$AGENT -Dddos_protect.group.private= \
-Dddos_protect.mode=automatic \
-Dddos_protect.udp_amplification.action=filter \
-Dddos_protect.udp_amplification.threshold=5000

The ddos_protect.bgpgroup setting enables the automatic classification of traffic sources / destinations using the BGP data. This is the only setting required in the case where routers support the sFlow extended_gateway feature.

The ddos_protect.router.0.agent setting pairs up the sFlow agent with the first entry in the ddos_protect.router list of BGP routers. The ddos_protect.group.private statement, clearing the private address space from the exclusion list, is needed in this example because Docker uses private address space to connect the containers in the testbed and we don't want to ignore this traffic.

Now access the DDoS Protect application at http://localhost:8008/app/ddos-protect/html/index.html

Use ExaBGP to advertise the testbed address space:

GW=`docker network inspect bridge -f '{{range .IPAM.Config}}{{.Gateway}}{{end}}'`

SUBNET=`docker network inspect bridge -f '{{range .IPAM.Config}}{{.Subnet}}{{end}}'`

docker exec exabgp exabgp-cli announce route $SUBNET next-hop $GW
docker exec exabgp exabgp-cli announce route 198.51.100.0/24 next-hop $GW as-path [ 65003 ]

The $SUBNET variable contains the private address space used by Docker that will be protected from attacks. The 198.51.100.0/24 CIDR associated with AS 65003 represents external address space containing potential DDoS sources.

Finally, hping3 can be used to generate simulated DDoS attacks. Start a simulated DNS amplification attack using the pre-built sflow/hping3 image:

GW=`docker network inspect bridge -f '{{range .IPAM.Config}}{{.Gateway}}{{end}}'`

docker run --rm sflow/hping3 --flood --udp -k -a 198.51.100.1 -s 53 $GW

The udp_amplification chart in the screen capture shows the traffic rising to cross the threshold and trigger the control shown in the Controls chart.

This testbed provides a convenient way to become familiar with the tools to automatically mitigate DDoS attacks. The following articles provide additional information on moving the solution into production: Real-time DDoS mitigation using BGP RTBH and FlowSpec, Pushing BGP Flowspec rules to multiple routers, and Monitoring DDoS mitigation. Real-time network and system metrics as a service provides background on the sFlow-RT analytics platform running the DDoS Protect application.

Tuesday, October 20, 2020

Docker DDoS testbed

Docker testbed describes how to use Docker Desktop to build a test network to experiment with real-time sFlow streaming telemetry and analytics. This article extends the testbed to experiment with distributed denial of service (DDoS) detection and mitigation techniques described in Real-time DDoS mitigation using BGP RTBH and FlowSpec.

Start a Host sFlow agent using the pre-built sflow/host-sflow image:

docker run --rm -d -e "COLLECTOR=host.docker.internal" -e "SAMPLING=10" \
--net=host -v /var/run/docker.sock:/var/run/docker.sock:ro \
--name=host-sflow sflow/host-sflow

Start ExaBGP using the pre-built sflow/exabgp image. ExaBGP connects to the sFlow-RT analytics software and displays BGP RTBH / Flowspec controls sent by sFlow-RT:

docker run --rm sflow/exabgp

In a second terminal window, start an instance of the sFlow-RT analytics software using the pre-built sflow/ddos-protect image:

GW=`docker network inspect bridge -f '{{range .IPAM.Config}}{{.Gateway}}{{end}}'`

SUBNET=`docker network inspect bridge -f '{{range .IPAM.Config}}{{.Subnet}}{{end}}'`

docker run --rm -p 6343:6343/udp -p 8008:8008 -p 1179:1179 --name=sflow-rt \
sflow/ddos-protect -Dddos_protect.router=$GW -Dddos_protect.as=65001 \
-Dddos_protect.enable.flowspec=yes -Dddos_protect.group.local=$SUBNET \
-Dddos_protect.mode=automatic \
-Dddos_protect.udp_amplification.action=filter \
-Dddos_protect.udp_amplification.threshold=5000

Open the sFlow-RT dashboard at http://localhost:8008/

The sFlow Agents gauge confirms that sFlow is being received from the Host sFlow agent. Now access the DDoS Protect application at http://localhost:8008/app/ddos-protect/html/index.html

The BGP chart at the bottom right verifies that BGP connection has been established so that controls can be sent to ExaBGP, which will display them in the terminal window.

Finally, hping3 can be used to generate simulated DDoS attacks. Start a simulated DNS amplification attack using the pre-built sflow/hping3 image:

GW=`docker network inspect bridge -f '{{range .IPAM.Config}}{{.Gateway}}{{end}}'`

docker run --rm sflow/hping3 --flood --udp -k -a 198.51.100.1 -s 53 $GW

The attack shows up immediately in the DDoS protect dashboard, http://localhost:8008/app/ddos-protect/html/index.html

The udp_amplification chart shows the traffic rising to cross the threshold and trigger the control shown in the Controls chart.

The exabgp log shows the Flowspec rule that was sent to block the attack, filtering traffic to 172.17.0.1/32 with UDP source port 53. Type CNTRL+C in the hping3 window to end the attack.

Wednesday, October 7, 2020

Broadcom Mirror on Drop (MoD)

Broadcom BroadView+ Root Cause Analysis with Mirror-on-Drop (MOD) and Inband Flow Analyzer (IFA) from Gestalt IT on Vimeo.

Networking Field Day 23 included a presentation by Bhaskar Chinni describing Broadcom's Mirror-on-Drop (MOD) capability. MOD capable hardware can generate a notification whenever a packet is dropped by the ASIC, reporting the packet header and the reason that the packet was dropped. MOD is supported by Trident 3, Tomahawk 3, and Jericho 2 or later ASICs that are included in popular data center switches and widely deployed in data centers.

The recently published sFlow Dropped Packet Notification Structures specification adds drop notifications to industry standard sFlow telemetry export, complementing the existing push based counter and packet sampling measurements. The inclusion of drop monitoring in sFlow will allow the benefits of MOD to be fully realized, ensuring consistent end-to-end visibility into dropped packets across multiple vendors and network operating systems.

Using Advanced Telemetry to Correlate GPU and Network Performance Issues demonstrates how packet drop notifications from NVIDA Mellanox switches forms part of an integrated sFlow telemetry stream that provides the system wide observability needed to drive automation.

MOD instrumentation on Broadcom based switches provides the foundation needed for network vendors to integrate the functionality with sFlow agents to add dropped packet notifications.

Tuesday, October 6, 2020

Using Advanced Telemetry to Correlate GPU and Network Performance Issues

The image above was captured from the recent talk Using Advanced Telemetry to Correlate GPU and Network Performance Issues [A21870] presented at the NVIDIA GTC conference. The talk includes a demonstration of monitoring a high performance GPU compute cluster in real-time. The real-time dashboard provides an up to the second view of key performance metrics for the cluster.

This diagram shows the elements of the GPU compute cluster that was demonstrated. Cumulus Linux running on the switches reduces operational complexity by allowing you to run the same Linux operating system on the network devices as is run on the compute servers. sFlow telemetry is generated by the open source Host sFlow agent that runs on the servers and the switches, using standard Linux APIs to enable instrumentation and gather measurements. On switches, the measurements are offloaded to the ASIC to provide line rate monitoring.

Telemetry from all the switches and servers in the cluster is streamed to an sFlow-RT analyzer, which builds a real-time view of performance that can be used to drive operational dashboards and automation.

The Real-time GPU and network telemetry dashboard combines measurements from all the devices to provide view of cluster performance. Each of the three charts demonstrated a different type of measurement in the sFlow telemetry stream:

GPU Utilization is based on sFlow's counter push mechanism, exporting NVIDIA Management Library (NVML) counters. This chart trends buffer, memory, and execution utilization of the GPUs in the cluster.
Network Traffic is based on sFlow's random packet sampling mechanism, supported by the Linux kernel on servers, and offloaded to the Mellanox ASIC on the switches. This chart trends the top network flows crossing the network.
Network Drops is based on sFlow's recently added dropped packet notification mechanism, see Using sFlow to monitor dropped packets. This chart trends dropped packet source and destination addresses and the reason the packet was dropped.

The cluster is running a video transcoding workload in which video is streamed across the network to a GPU where it is transcoded, and the result returned. A normal transcoding task is shown on left, where the charts show an increase in GPU and network activity and zero dropped packets. A failed transcoding task is shown in the middle. Here the GPU activity is low, there is no network activity, and there is a sequence of packets dropped by an access control list (ACL). Removing the ACL fixes the problem, which is confirmed by the new data shown on the right of the trend charts.

The sFlow data model integrates the three telemetry streams: counters, packet samples, and drop notifications. Each type of data is useful on its own, but together they provide the system wide observability needed to drive automation.