Tuesday, June 15, 2021

DDoS mitigation using a Linux switch

Linux as a network operating system describes the benefits of using standard Linux as a network operating system for hardware switches. A key benefit is that the behavior of the physical network can be efficiently emulated using standard Linux virtual machines and/or containers.

In this article, CONTAINERlab will be used to create a simple testbed that can be used to develop a real-time DDoS mitigation controller. This solution is highly scaleable. Each hardware switch can monitor and filter terabits per second of traffic and a single controller instance can monitor and control hundreds of switches.

Create test network

The following ddos.yml file specifies the testbed topology (shown in the screen shot at the top of this article):

name: ddos
topology:
  nodes:
    router:
      kind: linux
      image: sflow/frr
    attacker:
      kind: linux
      image: sflow/hping3
    victim:
      kind: linux
      image: alpine:latest
  links:
    - endpoints: ["router:swp1","attacker:eth1"]
    - endpoints: ["router:swp2","victim:eth1"]

Run the following command to run the emulation:

sudo containerlab deploy ddos.yml

Configure interfaces on router:

interface swp1
 ip address 192.168.1.1/24
!
interface swp2
 ip address 192.168.2.1/24
!

Configure attacker interface:

ip addr add 192.168.1.2/24 dev eth1
ip route add 192.168.2.0/24 via 192.168.1.1

Configure victim interface:

ip addr add 192.168.2.2/24 dev eth1
ip route add 192.168.1.0/24 via 192.168.2.1

Verify connectivity between the attacker and the victim:

sudo docker exec -it clab-ddos-attacker ping 192.168.2.2
PING 192.168.2.1 (192.168.2.1): 56 data bytes
64 bytes from 192.168.2.1: seq=0 ttl=64 time=0.069 ms

Install visibility and control applications on router

The advantage of using Linux as a network operating system is that you can develop and install software to tailor the network to address specific requirements. In this case, for DDoS mitigation, we need real-time visibility to detect DDoS attacks and real-time control to filter out the attack traffic.

Open a shell on router:

sudo docker exec -it clab-ddos-router sh

Install and configure Host sFlow agent:

apk --update add build-base linux-headers openssl-dev dbus-dev gcc git
git clone https://github.com/sflow/host-sflow.git
cd host-sflow
make FEATURES="DENT"
make install

Edit /etc/hsflowd.conf

sflow {
  agent = eth0
  collector { ip=172.20.20.1 udpport=6343 }
  dent { sw=on switchport=swp.* }
}

Note: On a hardware switch, set sw=off to offload packet sampling to hardware.

Start hsflowd:

hsflowd

Download and run tc_server Python script for adding and removing tc flower filters using a REST API:

wget https://raw.githubusercontent.com/sflow-rt/tc_server/master/tc_server
nohup python3 tc_server > /dev/null &

The following command shows the Linux tc filters used in this example:

# tc filter show dev swp1 ingress
filter protocol all pref 1 matchall chain 0 
filter protocol all pref 1 matchall chain 0 handle 0x1 
  not_in_hw
	action order 1: sample rate 1/10000 group 1 trunc_size 128 continue
	 index 3 ref 1 bind 1

filter protocol ip pref 14 flower chain 0 
filter protocol ip pref 14 flower chain 0 handle 0x1 
  eth_type ipv4
  ip_proto udp
  dst_ip 192.168.2.2
  src_port 53
  not_in_hw
	action order 1: gact action drop
	 random type none pass val 0
	 index 1 ref 1 bind 1

The output shows the standard Linux tc-matchall and tc-flower filters used to monitor and drop traffic on the router. The Host sFlow agent automatically installs a matchall rule on each interface in order to sample packets. The tc_server script adds and removes a flower filters to drop unwanted traffic. On a hardware router, the filters are offloaded by the Linux switchdev driver to the router ASIC for line rate performance.

Test REST API

Add filter:

curl -X PUT -H "Content-Type: application/json" \
--data '{"ip_proto":"udp","dst_ip":"10.0.2.2","src_port":"53"}' \
http://clab-ddos-router:8081/swp1/10

Show filters:

curl http://clab-ddos-router:8081/swp1

Remove filter:

curl -X DELETE http://clab-ddos-router:8081/swp1/10

Build an automated DDoS mitigation controller

The following sFlow-RT ddos.js script automatically detects and drops UDP amplification attacks:

var block_minutes = 1;
var thresh = 10000;

setFlow('udp_target',{keys:'ipdestination,udpsourceport',value:'frames'});

setThreshold('attack',{metric:'udp_target', value:thresh, byFlow:true, timeout:2});

var id = 10;
var controls = {};
setEventHandler(function(evt) {
  var key = evt.flowKey;
  if(controls[key]) return;

  var prt = ifName(evt.agent,evt.dataSource);
  if(!prt) return;

  var [dst_ip,src_port] = key.split(',');
  var filter = {
    // uncomment following line for hardware routers
    // 'skip_sw':'skip_sw',
    'ip_proto':'udp',
    'dst_ip':dst_ip,
    'src_port':src_port
  };
  var url = 'http://'+evt.agent+':8081/'+prt+'/'+id++;
  try {
    http(url,'put','application/json',JSON.stringify(filter));
  } catch(e) {
    logWarning(url + ' put failed');
  }
  controls[key] = {time:evt.timestamp, evt:evt, url:url};
  logInfo('block ' + evt.flowKey);
},['attack']);

setIntervalHandler(function(now) {
  for(var key in controls) {
    var control = controls[key];
    if(now - control.time < 1000 * 60 * block_minutes) continue;
    var evt = control.evt;
    if(thresholdTriggered(evt.thresholdID,evt.agent,evt.dataSource+'.'+evt.metric,evt.flowKey)) {
      // attack is ongoing - keep control
      continue;
    }
    try {
      http(control.url,'delete');
    } catch(e) {
      logWarning(control.url + ' delete failed');
    }
    delete controls[key];
    logInfo('allow '+control.evt.flowKey);
  }
});

See Writing Applications for more information on the script.

Run the controller script on the CONTAINERlab host using the sFlow-RT real-time analytics engine:

sudo docker run --network=host -v $PWD/ddos.js:/sflow-rt/ddos.js \
sflow/prometheus -Dscript.file=ddos.js

Verify that sFlow is being received by the checking the sFlow-RT status page, http://containerlab_ip:8008/

Test controller

Monitor for attack traffic on the victim:

sudo docker exec -it clab-ddos-victim sh
apk --update add tcpdump
tcpdump -n -i eth1 udp port 53

Start attack:

sudo docker exec -it clab-ddos-attacker \
hping3 --flood --udp -k -s 53 --rand-source 192.168.2.2

There should be a brief flurry of packets seen at the victim before the controller detects and blocks the attack. The entire period between launching the attack and the attack traffic being blocked is under a second.

Thursday, May 20, 2021

Linux as a network operating system


NVIDIA Linux Switch enables any standard Linux distribution to be used as the operating system on the NVIDIA Spectrum™ switches. Unlike network operating systems that are Linux based, where you are limited to a specific version of Linux and control of the hardware is restricted to vendor specific software modules, Linux Switch allows you to install an unmodified version of your favorite Linux distribution along with familiar Linux monitoring and orchestration tools. 

The key to giving Linux control of the switch hardware is the switchdev module - a standard part of recent Linux kernels. Linux switchdev is an in-kernel driver model for switch devices which offload the forwarding (data) plane from the kernel. Integrating switch ASIC drivers in the Linux kernel makes switch ports appear as additional Linux network interfaces that can be configured and managed using standard Linux tools.

The mlxsw wiki provides instructions for installing Linux using ONIE or PXE boot on Mellanox switch hardware, for example, on NVIDIA® Spectrum®-3 based SN4000 series switches, providing 1G - 400G port speeds to handle scale-out data center applications.

Major benefits of using standard Linux as the switch operating system include:

  • no licensing fees, feature restrictions, or license management complexity associated proprietary network operating systems
  • large ecosystem of open source and commercial software available for Linux
  • software updates and security patches available through Linux distribution
  • install same Linux distribution on the switches and servers to reduce operational complexity and leverage existing expertise
  • run instances of the Linux distribution as virtual machines or containers to test configurations and develop automation scripts
  • standard Linux APIs, and availability of Linux developers, lowers the barrier to customization, making it possible to tailor network behavior to address application / business requirements

The switchdev driver for NVIDIA Spectrum ASICs exposes advanced dataplane instrumentation through standard Linux APIs. This article will explore how the open source Host sFlow agent uses the standard Linux APIs to stream real-time telemetry from the ASIC using industry standard sFlow.

The diagram shows the elements of the solution. Host sFlow agents installed on servers and switches stream sFlow telemetry to an instance of the sFlow-RT real-time analytics engine. The analytics provide a comprehensive, up to the second, view of performance to drive automation.

Note: If you are unfamiliar with sFlow, or want to hear about the latest developments, Real-time network telemetry for automation provides an overview and includes a demonstration of monitoring and troubleshooting network and system performance of a GPU cluster.

Download the latest Host sFlow agent sources:

git clone https://github.com/sflow/host-sflow.git

INSTALL.Linux provides information on compiling Host sFlow on Linux. The following instructions assume a DEB based distrubution (Debian, Ubuntu):

cd host-sflow/
make deb FEATURES=DENT

It isn't necessary to install development tools on the switch. All major Linux distributions are available as Docker images. Select a Docker image that matches the operating system version on the switch and use it to build the package.

Copy the resulting hsflowd package to the switch and install:

sudo dpkg -i hsflowd_2.0.34-3_amd64.deb

Next, edit the /etc/hsflowd.conf file to configure the agent:

sflow {
  collector { ip=10.0.0.1 }
  systemd { }
  psample { group=1 egress=on }
  dropmon { group=1 start=on sw=off hw=on }
  dent { sw=off switchport=swp.* }
}

In this case, 10.0.0.1 is the address of the sFlow collector and swp.* is a regular expression used to identify front panel switch ports. The systemd{} module monitors services running on the switch - see Monitoring Linux services, the psample{} module receives randomly sampled packets from the switch ASIC - see Linux 4.11 kernel extends packet sampling support, the dropmon{} module receives dropped packet notifications - see Using sFlow to monitor dropped packets, and the dent{} module automaticallly configures packet sampling of traffic on front panel switch ports - see Packet Sampling.

Note: The same configuration file can be used for for every switch in the network, making configuration of the agents easy to automate. 

Enable and start the agent.

sudo systemctl enable hsflowd.service
sudo systemctl start hsflowd.service

Finally, use the pre-built sflow/prometheus Docker image to start a copy the sFlow-RT real-time analytics software on the collector host (10.0.0.1):

docker run -p 8008:8008 -p 6343:6343/udp -d sflow/prometheus

The web interface is accessible on port 8008.

The included Metric Browser application lets you explore the metrics that are being streamed. The chart update in real-time as data arrives and in this case identifies the interface in the network with the greatest utilization. The standard set of metrics exported by the Host sFlow agent include interface counters as well as host cpu, memory, disk and service performance metrics. Metrics lists the set of available metrics.

The included Flow Browser application provides an up to the second view traffic flows. Defining Flows describes the fields that can be used to break out traffic. 

Note: The NVIDIA Spectrum 2/3 ASIC includes packet transit delay, selected queue and queue depth with each sampled packet. This information is delivered via the Linux PSAMPLE netlink channel to the Host sFlow agent and included in the sFlow telemetry. These fields are accessible when defining flows in sFlow-RT. See Transit delay and queueing for details.

The included Discard Browser is used to explore packets that are being dropped in the network.

Note: The NVIDIA Spectrum 2/3 ASIC includes instrumentation to capture dropped packets and the reason they were dropped. The information is delivered via the Linux drop_monitor netlink channel to the Host sFlow agent and included in the sFlow telemetry. See Real-time trending of dropped packets for more information.

The included Prometheus application exports metrics to the Prometheus time series database where they can be used to drive Grafana dashboards (e.g. sFlow-RT Countries and Networks, sFlow-RT Health, and sFlow-RT Network Interfaces).

Linux as a network operating system is an exciting advancement if you are interested in simplifying network and system management. Using the Linux networking APIs as a common abstraction layer on servers and switches makes it possible to manage network and compute infrastructure as a unified system.

Monday, May 3, 2021

Cisco 8000 Series routers


Cisco 8000 Series routers are "400G optimized platforms that scale from 10.8 Tbps to 260 Tbps." The routers are built around Cisco Silicon One™ ASICs. The Silicon One ASIC includes the instrumentation needed to support industry standard sFlow real-time streaming telemetry.
Note: The Cisco 8000 Series routers also support Cisco Netflow. Rapidly detecting large flows, sFlow vs. NetFlow/IPFIX describes why you should choose sFlow if you are interested in real-time monitoring and control applications.
The following commands configure a Cisco 8000 series router to sample packets at 1-in-20,000 and stream telemetry to an sFlow analyzer (192.127.0.1) on UDP port 6343.
flow exporter-map SF-EXP-MAP-1
 version sflow v5
 !
 packet-length 1468
 transport udp 6343
 source GigabitEthernet0/0/0/1
 destination 192.127.0.1
 dfbit set
!

Configure the sFlow analyzer address in an exporter-map.

flow monitor-map SF-MON-MAP
 record sflow
 sflow options
  extended-router
  extended-gateway
  if-counters polling-interval 300
  input ifindex physical
  output ifindex physical
 !
 exporter SF-EXP-MAP-1
!

Configure sFlow options in a monitor-map.

sampler-map SF-SAMP-MAP
 random 1 out-of 20000
!

Define the sampling rate in a sampler-map.

interface GigabitEthernet0/0/0/3
 flow datalinkframesection monitor-map SF-MON-MAP sampler SF-SAMP-MAP ingress

Enable sFlow on each interface for complete visibilty into network traffic.

The above configuration instructions are for IOS-XR. Cisco goes SONiC on Cisco 8000 describes Cisco's suppport for the open source SONiC network operating system. SONiC describes how sFlow is implemented and configured on SONiC.

The diagram shows the general architecture of an sFlow monitoring deployment. All the switches stream sFlow telemetry to a central sFlow analyzer for network wide visibililty. Host sFlow agents installed on servers can extend visibilty into the compute infrastructure, and provide network visibility from virtual machines in the public cloud. In this instance, the sFlow-RT real-time analyzer provides an up to the second view of performance that can be used to drive operational dashboards and network automation. The recommended sFlow configuration settings are optimized for real-time monitoring of the large scale networks targetted by Cisco 8000 routers.

docker run -p 8008:8008 -p 6343:6343/udp sflow/prometheus

Getting started with sFlow-RT is very simple, for example, the above command uses the pre-built sflow/prometheus Docker image to start analyzing sFlow. Real-time DDoS mitigation using BGP RTBH and FlowSpec, Monitoring leaf and spine fabric performance, and Flow metrics with Prometheus and Grafana describe additional use cases for real-time sFlow analytics.

Note: There is a wide range of options for sFlow analysis. See sFlow Collectors for a list of open source and commercial software.

Cisco first introduced sFlow support in the Nexus 3000 Series in 2012. Today, there is a range of Cisco products that include sFlow support. The inclusion of sFlow instrumentation in Silicon One is likely expand support across the range of upcoming products based on these ASICs. The broad support for sFlow by Cisco and other leading vendors (e.g. A10, Arista, Aruba, Cumulus, Edge-Core, Extreme, Huawei,  Juniper, NEC, Netgear, Nokia, Quanta, and ZTE) makes sFlow an attractive option for multi-vendor network performance monitoring, particularly for those interested in real-time monitoring and automation.

Monday, April 5, 2021

CONTAINERlab

CONTAINERlab is a Docker orchestration tool for creating virtual network topologies. This article describes how to build and monitor the leaf and spine topology shown above.

Note: Docker testbed describes a simple testbed for experimenting with sFlow analytics using Docker Desktop, but it doesn't have the ability to construct complex topologies. 

multipass launch --cpus 2 --mem 4G --name containerlab
multipass shell containerlab

The above commands use the multipass command line tool to create an Ubuntu virtual machine and open shell access.

sudo apt update
sudo apt -y install docker.io
bash -c "$(curl -sL https://get-clab.srlinux.dev)"

Type the above commands into the shell to install CONTAINERlab.

Note: Multipass describes how to build a Mininet network emulator to experiment with software defined networking.

name: test
topology:
  nodes:
    leaf1:
      kind: linux
      image: sflow/frr
    leaf2:
      kind: linux
      image: sflow/frr
    spine1:
      kind: linux
      image: sflow/frr
    spine2:
      kind: linux
      image: sflow/frr
    h1:
      kind: linux
      image: alpine:latest
    h2:
      kind: linux
      image: alpine:latest
  links: 
    - endpoints: ["leaf1:eth1","spine1:eth1"]
    - endpoints: ["leaf1:eth2","spine2:eth1"]
    - endpoints: ["leaf2:eth1","spine1:eth2"]
    - endpoints: ["leaf2:eth2","spine2:eth2"]
    - endpoints: ["h1:eth1","leaf1:eth3"]
    - endpoints: ["h2:eth1","leaf2:eth3"]

The test.yml file shown above specifies the topology. In this case we are using FRRouting (FRR) containers for the leaf and spine switches and Alpine Linux containers for the two hosts.

sudo containerlab deploy --topo test.yml

The above command creates the virtual network and starts containers for each of the network nodes.

sudo containerlab inspect --name test

Type the command above to list the container instances in the topology.

The table shows each of the containers and the assigned IP addresses.

sudo docker exec -it clab-test-leaf1 vtysh

Type the command above to run the FRR VTY shell so that the switch can be configured.

leaf1# show running-config 
Building configuration...

Current configuration:
!
frr version 7.5_git
frr defaults datacenter
hostname leaf1
log stdout
!
interface eth3
 ip address 172.16.1.1/24
!
router bgp 65006
 bgp router-id 172.20.20.6
 bgp bestpath as-path multipath-relax
 bgp bestpath compare-routerid
 neighbor fabric peer-group
 neighbor fabric remote-as external
 neighbor fabric description Internal Fabric Network
 neighbor fabric capability extended-nexthop
 neighbor eth1 interface peer-group fabric
 neighbor eth2 interface peer-group fabric
 !
 address-family ipv4 unicast
  network 172.16.1.0/24
 exit-address-family
!
route-map ALLOW-ALL permit 100
!
ip nht resolve-via-default
!
line vty
!
end

The BGP configuration for leaf1 is shown above.

Note: We are using BGP unnumbered to simplify the configuration so peers are automatically discovered.

The other switches, leaf2spine1, and spine2 have similar configurations.

Next we need to configure the hosts.

sudo docker exec -it clab-test-h1 sh

Open a shell on h1

ip addr add 172.16.1.2/24 dev eth1
ip route add 172.16.2.0/24 via 172.16.1.1

Configure networking on h1. The other host, h2, has a similar configuration.

sudo docker exec -it clab-test-h1 ping 172.16.2.2
PING 172.16.2.2 (172.16.2.2): 56 data bytes
64 bytes from 172.16.2.2: seq=0 ttl=61 time=0.928 ms
64 bytes from 172.16.2.2: seq=1 ttl=61 time=0.160 ms
64 bytes from 172.16.2.2: seq=2 ttl=61 time=0.201 ms

Use ping to verify that there is connectivity between h1 and h2.

apk add iperf3

Install iperf3 on h1 and h2

iperf3 -s --bind 172.16.2.2

Run an iperf3 server on h2

iperf3 -c 172.16.2.2
Connecting to host 172.16.2.2, port 5201
[  5] local 172.16.1.2 port 52066 connected to 172.16.2.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.41 GBytes  12.1 Gbits/sec    0   1.36 MBytes       
[  5]   1.00-2.00   sec  1.41 GBytes  12.1 Gbits/sec    0   1.55 MBytes       
[  5]   2.00-3.00   sec  1.44 GBytes  12.4 Gbits/sec    0   1.55 MBytes       
[  5]   3.00-4.00   sec  1.44 GBytes  12.3 Gbits/sec    0   2.42 MBytes       
[  5]   4.00-5.00   sec  1.46 GBytes  12.6 Gbits/sec    0   3.28 MBytes       
[  5]   5.00-6.00   sec  1.42 GBytes  12.2 Gbits/sec    0   3.28 MBytes       
[  5]   6.00-7.00   sec  1.44 GBytes  12.4 Gbits/sec    0   3.28 MBytes       
[  5]   7.00-8.00   sec  1.28 GBytes  11.0 Gbits/sec    0   3.28 MBytes       
[  5]   8.00-9.00   sec  1.40 GBytes  12.0 Gbits/sec    0   3.28 MBytes       
[  5]   9.00-10.00  sec  1.25 GBytes  10.7 Gbits/sec    0   3.28 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  13.9 GBytes  12.0 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  13.9 GBytes  12.0 Gbits/sec                  receiver

Run an iperf3 test on h1

Now that we have a working test network, it's time to add some monitoring.

We will be installing sFlow agents on the switches and hosts that will stream telemetry to sFlow-RT analytics software which will provide a real-time network-wide view of performance.

sudo docker exec -it clab-test-leaf1 sh

Open a shell on leaf1

apk --update add libpcap-dev build-base linux-headers gcc git
git clone https://github.com/sflow/host-sflow.git
cd host-sflow/
make FEATURES="PCAP"
make install

Install Host sFlow agent on leaf1.

Note: The steps above could be included in a Dockerfile in order to create an image with built-in instrumentation.

vi /etc/hsflowd.conf

Edit the Host sFlow configuration file.

sflow {
  polling = 30
  sampling = 400
  collector { ip = 172.20.20.1 }
  pcap { dev = eth1 }
  pcap { dev = eth2 }
  pcap { dev = eth3 }
}

The above settings enable packet sampling on interfaces eth1, eth2 and eth3

sudo docker exec -d clab-test-leaf1 /usr/sbin/hsflowd -d

Start the Host sFlow agent on leaf1.

Install and run Host sFlow agents on the remaining switches and hosts, leaf2, spine1, spine2, h1, and h2.

sudo docker run --rm -d -p 6343:6343/udp -p 8008:8008 --name sflow-rt sflow/prometheus

Use the pre-built sflow/prometheus container to start an instance of sFlow-RT to collect and analyze the telemetry.

multipass list

List the multipass virtual machines.

containerlab            Running           192.168.64.7     Ubuntu 20.04 LTS
                                          172.17.0.1
                                          172.20.20.1

Use a web browser to connect to connect to the sFlow-RT web interface. In this case at http://192.168.64.7:8008 

The sFlow-RT dashboard verifies that telemetry is being received from 6 agents (the four switches and two hosts).

The screen capture shows a real-time view of traffic flowing across the network during an iperf3 test. 

The chart shows that the traffic flows via spine2. Repeated tests showed that traffic traffic was never taking the path via spine1, indicating that the ECMP hash function was not taking into account the TCP ports.

sudo docker exec clab-test-leaf1 sysctl -w net.ipv4.fib_multipath_hash_policy=1
sudo docker exec clab-test-leaf2 sysctl -w net.ipv4.fib_multipath_hash_policy=1

We are using a newer Linux kernel, so running the above commands changes the hashing algorithm to include the layer 4 headers, see Celebrating ECMP in Linux — part one and Celebrating ECMP in Linux — part two.

Topology describes how knowledge of network topology can be used to enhance the analytics capabilities of sFlow-RT.

{
  "links": {
    "link1": { "node1":"leaf1","port1":"eth1","node2":"spine1","port2":"eth1"},
    "link2": { "node1":"leaf1","port1":"eth2","node2":"spine2","port2":"eth1"},
    "link3": { "node1":"leaf2","port1":"eth1","node2":"spine1","port2":"eth2"},
    "link4": { "node1":"leaf2","port1":"eth2","node2":"spine2","port2":"eth2"}
  }
}

The links specification in the test.yml file can easily be converted into sFlow-RT's JSON format.

CONTAINERlab is a very promising tool for efficiently emulating complex networks. CONTAINERlab supports NokiaSR-Linux, Juniper vMX, Cisco IOS XRv9k and Arista vEOS, as well as Linux containers. Many of the proprietary network operating systems are only delivered as virtual machines and Vrnetlab integration makes it possible for CONTAINERlab to run these virtual machines. However, virtual machine nodes require considerably more resources than simple containers.

Linux with open source routing software (FRRouting) is an accessible alternative to vendor routing stacks (no registration / license required, no restriction on copying means you can share images on Docker Hub, no need for virtual machines).  FRRouting is popular in production network operating systems (e.g. Cumulus Linux, SONiC, DENT, etc.) and the VTY shell provides an industry standard CLI for configuration, so labs built around FRR allow realistic network configurations to be explored.

Monday, March 22, 2021

In-band Network Telemetry (INT)

The recent addition of in-band streaming telemetry (INT) measurements to the sFlow industry standard simplifies deployment by addressing the operational challenges of in-band monitoring.

The diagram shows the basic elements of In-band Network Telemetry (INT) in which the ingress switch is programmed to insert a header containing measurements to packets entering the network. Each switch in the path is programmed to append additional measurements to the packet header. The egress switch is programmed to remove the header so that the packet can be delivered to its destination. The egress switch is responsible for processing the measurements or sending them on to analytics software.

There are currently two competing specifications for in-band telemetry:

  1. In-band Network Telemetry (INT) Dataplane Specification
  2. Data Fields for In-situ OAM

Common telemetry attributes from both standards include:

  1. node id
  2. ingress port
  3. egress port
  4. transit delay (egress timestamp - ingress timestamp)
  5. queue depth

Visibility into network forwarding performance is very useful, however, there are practical issues that should be considered with the in-band telemetry approach for collecting the measurements:

  1. Transporting measurement headers is complex with different encapsulations for each transport protocol:  Geneve, VxLAN, GRE, UDP, TCP etc.
  2. Addition of headers increases the size of packets and risks causing traffic to be dropped downstream due to maximum transmission unit (MTU) restrictions.
  3. The number of measurements that can be added by each switch and the number of switches adding measurements in the path needs to be limited.
  4. In-band telemetry cannot be incrementally deployed. Ideally, all devices need to participate, or at a minimum, the ingress and egress devices need to be in-band telemetry aware.
  5. In-band telemetry transports data from the data plane to the control/management planes, providing a potential attack surface that could be exploited by crafting malicious packets with fake measurement headers.
  6. There is no standard mechanism for transporting measurements from the egress switch for analysis.
  7. There is no data model to link in-band telemetry to other sources of data (NETCONF, SNMP, etc.)

The sFlow Transit Delay Structures extension addresses these issues by defining how the in-band network telemetry attributes can be exported in real-time using the industry standard sFlow protocol.

The sFlow architecture, shown at the top of this article, provides an out of band alternative for transporting the per packet forwarding plane measurements. The switch ASIC attaches performance measurements as metadata to sampled packets sent to the sFlow Agent instead of adding the measurements to the egress packet. The sFlow Agent immediately forwards the additional packet metadata as part of the standard sFlow telemetry stream to a central sFlow analyzer. The sFlow Analyzer provides a real-time view of the performance of the entire network.

Using sFlow as the telemetry transport has a number of benefits:

  1. Simple to deploy since there is no modification of packets (no issues with encapsulations, MTU, number of measurements, path length, incremental deployment, etc.)
  2. Extensibility of sFlow protocol allows additional forwarding plane measurements to augment existing sFlow measurements, fully integrating the new measurements with sFlow data exported from other switches in the network (Arista, Aruba, Cisco, Dell, Huawei, Juniper, etc.)
  3. sFlow's is a unidirectional telemetry transport protocol originates from the device management plane, can be sent out of band, limiting possible attack surfaces.
  4. Measurements are delivered in real-time directly to the sFlow Analyzer.
  5. sFlow data model links telemetry to external data (SNMP, NETCONF, OpenConfig, etc.)

Transit delay and queueing describes the new sFlow measurements in more detail and demonstrates a working implementation. The instrumentation to support these measurements is widely available in current generation network ASICs. If you are interested in visibility into network performance, ask your network vendor about their plans to implement the sFlow Transit Delay Structures extension.

Wednesday, March 17, 2021

Transit delay and queueing


The recently finalized sFlow Transit Delay Structures extension provides visibility into the performance of packet forwarding in a switch or router using the industry standard sFlow protocol.

The diagram provides a logical representation of packet forwarding. A packet is received at an Ingress Port, the packet header is examined and a forwarding decision is made to add the packet to one of the queues associated with an Egress Port, finally the packet is removed from the queue and sent out the Egress Port to be received by the next device in the chain.

The time between sending and receiving a packet is the packet's transit delay. The transit delay is affected by the time it takes to make the forwarding decision and the time the packet spends in the queue. Identifying the specific queue selected and the number of bytes already in the queue fills out the set of performance metrics for the forwarding decision. The sFlow Transit Delay Structures extension adds these performance metrics to the metadata associated with each packet sample. 

The following output from sflowtool shows that data contained in a packet sample:

startSample ----------------------
sampleType_tag 0:1
sampleType FLOWSAMPLE
sampleSequenceNo 91159
sourceId 0:2216
meanSkipCount 400
samplePool 36463600
dropEvents 0
inputPort 2215
outputPort 2216
flowBlock_tag 0:1036
extendedType egress_queue
egress_queue_id 7
flowBlock_tag 0:1040
extendedType queue_depth
queue_depth_bytes 11354112
flowBlock_tag 0:1039
extendedType transit_delay
transit_delay_nS 839660224
flowBlock_tag 0:1
flowSampleType HEADER
headerProtocol 1
sampledPacketSize 1446
strippedBytes 4
headerLen 128
headerBytes 98-03-9B-8F-B5-CC-98-03-9B-94-C7-D5-08-00-45-16-05-94-12-C7-00-00-FE-11-B8-43-C0-00-02-02-C6-33-64-02-30-39-D4-31-05-80-D7-1D-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42-42
dstMAC 98039b8fb5cc
srcMAC 98039b94c7d5
IPSize 1428
ip.tot_len 1428
srcIP 192.0.2.2
dstIP 198.51.100.2
IPProtocol 17
IPTOS 22
IPTTL 254
IPID 50962
UDPSrcPort 12345
UDPDstPort 54321
UDPBytes 1408
endSample   ----------------------

The forwarding performance information is highlighted. The inputPort, outputPort, egress_queue_id, queue_depth_bytes, and transit_delay_nS values describe the performance observed by the sampled packet. The sampled packet header allows performance to be reported by specific hosts, protocols, ports, connections, etc.

Linux 4.11 kernel extends packet sampling support describes the Linux PSAMPLE interface used by the Host sFlow agent to receive packet samples. PSAMPLE has been extended in the Linux 5.13 kernel to add the performance metrics, PSAMPLE_ATTR_OUT_TC (egress queue), PSAMPLE_ATTR_OUT_TC_OCC (egress queue depth), and PSAMPLE_ATTR_LATENCY (transit delay) needed to populate the sFlow Transit Delay Structures and the Host sFlow agent now exports the performance data when it is available. The decoded sFlow record, above, was generated by Host sFlow running on a hardware switch and shows measurements made by the switch ASIC.

PSAMPLE and the Host sFlow agent are becoming the standard for sFlow monitoring of Linux based operating systems such as Cumulus Linux, DENT, and SONiC.  As ASIC vendors include the measurements in their device driver PSAMPLE support, they will automatically be included in the sFlow telemetry.

Support for the new extensions has also been added to the sFlow-RT real-time analytics engine. The open source sFlow-RT Flow Browser application shown in the screen shot above displays a real-time, up to the second, view of traffic based on the packet sample telemetry streaming from network devices (switches, routers, and hosts).

In the chart above, the value being plotted has been changed from Bits per Second and is now displaying flows with the highest transit delay (in nanoseconds). The specific device, ingress port, egress port, and egress queue are also identified. 

In the chart above, queue depth (in bytes) is displayed, showing that the nearly 12 Mbytes queue depth is responsible for the transit delay seen in the previous chart.

If the queue is full and the packet is dropped, the sFlow Dropped Packet Notification Structures extension allows the sFlow agent to report details of the dropped packet. Using sFlow to monitor dropped packets describes how the Host sFlow agent uses the Linux drop_monitor interface to implement the extension.

In the final chart above, the open source sFlow-RT Discard Browser application displays a sequence of packets being dropped by a switch as a host attempts, and fails, to establish a TCP connection. The reason for dropping the packets (an access control list) as well as device and ingress port where the packets were dropped are captured.

Transit delay and dropped packet monitoring leverage advanced instrumentation in the latest generation of network ASICs to provide valuable insight into network performance. Integration with industry standard streaming sFlow telemetry provides real-time network-wide visibility into traffic, performance, and errors.

Tuesday, March 9, 2021

InfluxDB 2.0 released


InfluxData advances possibilities of time series data with general availability of InfluxDB 2.0 announced the production release of InfluxDB 2.0. This article demonstrates how to import sFlow data into InfluxDB 2.0 using sFlow-RT in order to provide visibility into network traffic.

Real-time network and system metrics as a service describes how to use Docker Desktop to replay previously captured sFlow data. Follow the instructions in the article to start an instance of sFlow-RT.

Create a directory for InfluxDB to use to store data and configuration settings:
mkdir data
Now start InfluxDB using the pre-built influxdb image:
docker run --rm --name=influxdb -p 8086:8086 \
-v  $PWD/data:/var/lib/influxdb2 influxdb:alpine \
--nats-max-payload-bytes=10000000

Note: sFlow-RT is collecting metrics for all the sFlow agents embedded in switches, routers, and servers. The default value of nats-max-payload-bytes (1048576) may be too small to hold all the metrics returned when sFlow-RT is queried. The error,  nats: maximum payload exceeded, in InfluxDB logs indicates that the limit needs to be increased. In this example, the value has been increased to 10000000.

Now access the InfluxDB web interface at http://localhost:8086/

The screen capture above shows three scrapers configured in InfluxDB 2.0:
  1. sflow-analyzer
    URL: http://host.docker.internal:8008/prometheus/analyzer/txt
  2. sflow-metrics
    URL: http://host.docker.internal:8008/prometheus/metrics/ALL/ALL/txt
  3. sflow-flow-src-dst
    URL: http://host.docker.internal:8008/app/prometheus/scripts/export.js/flows/ALL/txt?metric=flow_src_dst_bps&key=ipsource,ipdestination&value=bytes&aggMode=max&maxFlows=100&minValue=1000&scale=8
The first collects metrics about the performance of the sFlow-RT analytics engine, the second, all the metrics exported by the sFlow agents, and the third, is a flow metric.
InfluxDB 2.0 now includes the data exploration and dashboard building capabilities that were previously in the separate Chronograf application. The screen capture above shows a simple chart trending the flow metric.

Monday, March 1, 2021

DDoS Mitigation with Juniper, sFlow, and BGP Flowspec

Real-time DDoS mitigation using BGP RTBH and FlowSpec, DDoS protection of local address space, Pushing BGP Flowspec rules to multiple routersMonitoring DDoS mitigation, and Docker DDoS testbed demonstrate how sFlow and BGP Flowspec are combined by the DDoS Protect application running on the sFlow-RT real-time analytics engine to automatically detect and block DDoS attacks.

This article discusses how to deploy the DDoS Protect application in a Juniper Networks environment. Juniper has a long history of supporting BGP Flowspec on their routing platforms and Juniper has added support for sFlow to their entire product range, see sFlow available on Juniper MX series routers.

First, Junos doesn't provide a way to connect to the non-standard BGP port (1179) that sFlow-RT uses by default. Allowing sFlow-RT to open the standard BGP port (179) requires that the service be given additional Linux capabilities. 

docker run --rm --net=host --sysctl net.ipv4.ip_unprivileged_port_start=0 \
sflow/ddos-protect -Dbgp.port=179

The above command launches the prebuilt sflow/ddos-protect Docker image. Alternatively, if sFlow-RT has been installed as a deb / rpm package, then the required permissions can be added to the service.

sudo systemctl edit sflow-rt.service
Type the above command to edit the service configuration and add the following lines:
[Service]
AmbientCapabilities=CAP_NET_BIND_SERVICE
Next, edit the sFlow-RT configuration file for the DDoS Protect application:
sudo vi /usr/local/sflow-rt/conf.d/ddos-protect.conf
and add the line:
bgp.port=179
Finally, restart sFlow-RT:
sudo systemctl restart sflow-rt
The application is now listening for BGP connections on TCP port 179.

Now configure the router to send sFlow telemetry to sFlow-RT - see Junos: sFlow Monitoring Technology
set protocols sflow collector 192.168.65.2 udp-port 6343
set protocols sflow polling-interval 20
set protocols sflow sample-rate ingress 1000
set protocols sflow interfaces ge-0/0/0
set protocols sflow interfaces ge-0/0/1
...
For example, the above commands enable sFlow monitoring on a Juniper MX router. See sFlow-RT Agents for recommended sFlow configuration settings.

Also configure a BGP Flowspec session with sFlow-RT - see Junos: Multiprotocol BGP.
policy-options {
    policy-statement ACCEPT_ALL {
        from protocol bgp;
        then accept;
    }
}
routing-options {
    router-id 1.1.1.1;
    autonomous-system 65000;
}
protocols {
    bgp {
        group sflow-rt {
            type internal;
            local-address 172.17.0.2;
            family inet {
                unicast;
                flow {
                    no-validate ACCEPT_ALL;
                }
            }
            family inet6 {
                unicast;
                flow {
                    no-validate ACCEPT_ALL;
                }
            }
            neighbor 192.168.65.2 {
                import ACCEPT_ALL;
                peer-as 65000;
            }
        }
    }
}
The above configuration establishes the BGP Flowspec session with sFlow-RT.

Real-time DDoS mitigation using BGP RTBH and FlowSpec describes how to simulate a DDoS UDP amplification attack in order to test the automated detection and control functionality.  
root@07358a106c21> show route table inetflow.0 detail    

inetflow.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
192.0.2.129,*,proto=17,srcport=53/term:N/A (1 entry, 0 announced)
        *BGP    Preference: 170/-101
                Next hop type: Fictitious, Next hop index: 0
                Address: 0x55653aae979c
                Next-hop reference count: 1
                Next hop: 
                State: <Active Int Ext SendNhToPFE>
                Local AS: 65000 Peer AS: 65000
                Age: 6 
                Validation State: unverified 
                Task: BGP_65000.192.168.65.2
                AS path: I 
                Communities: traffic-rate:0:0
                Accepted
                Localpref: 100
                Router ID: 0.6.6.6
Command line output from the router shown above verifies that a Flowspec control blocking the amplification attack has been received. The control will remain in place for 60 minutes (the configured timeout), after which it will be automatically withdrawn. If the attack is still in progress it will be immediately detected and the control reapplied.

DDoS Protect can mitigate a wide range of common attacks, including: NTP, DNS, Memcached, SNMP, and SSDP amplification attacks; IP, UDP, ICMP and TCP flood attacks; and IP fragmentation attacks. Mitigation options include: remote triggered black hole (RTBH), filtering, rate limiting, DSCP marking, and redirection. IPv6 is fully supported in detection and mitigation of each of these attack types.

Monday, January 25, 2021

Topology


Real-time network and system metrics as a service describes how to use data captured from the network shown above to explore the functionality of sFlow-RT real-time analytics software. This article builds on the previous article to show how knowledge of network topology can be used to enhance analytics, see Topology for documentation.

First, follow the instructions in the previous example and start an instance of sFlow-RT using the captured sFlow.  
curl -O https://raw.githubusercontent.com/sflow-rt/fabric-view/master/demo/topology.json
Then, download the topology file for the example.
curl -X PUT -H "Content-Type: application/json" -d @topology.json \
http://localhost:8008/topology/json
Install the topology using the sFlow-RT REST API.
curl http://localhost:8008/topology/json
Retrieve the topology.
{
 "version": 0,
 "links": {
  "L1": {
   "node2": "spine1",
   "node1": "leaf1",
   "port1": "swp1s0",
   "port2": "swp49"
  },
  "L2": {
   "node2": "spine1",
   "node1": "leaf1",
   "port1": "swp1s1",
   "port2": "swp50"
  },
  "L3": {
   "node2": "spine2",
   "node1": "leaf1",
   "port1": "swp1s2",
   "port2": "swp51"
  },
  "L4": {
   "node2": "spine2",
   "node1": "leaf1",
   "port1": "swp1s3",
   "port2": "swp52"
  },
  "L5": {
   "node2": "spine2",
   "node1": "leaf2",
   "port1": "swp1s0",
   "port2": "swp49"
  },
  "L6": {
   "node2": "spine2",
   "node1": "leaf2",
   "port1": "swp1s1",
   "port2": "swp50"
  },
  "L7": {
   "node2": "spine1",
   "node1": "leaf2",
   "port1": "swp1s2",
   "port2": "swp51"
  },
  "L8": {
   "node2": "spine1",
   "node1": "leaf2",
   "port1": "swp1s3",
   "port2": "swp52"
  }
 }
}
The JSON topology structure contains the eight links connecting the leaf and spine switches in the diagram, identifying the nodes and ports associated with each link.
curl -H "Content-Type:application/json" -X PUT \
--data '{"keys":"ipsource,ipdestination",value:"bytes"}' \
http://localhost:8008/flow/srcdst/json
Now define the srcdst flow metric described in the previous article.
curl "http://localhost:8008/activeflows/TOPOLOGY/srcdst/json?aggMode=edge"
Knowledge of topology opens up additional options when querying for flows. For example, the above command only considers devices that are part of the topology and sums flows entering edge device access ports, i.e. traffic entering the the leaf switches from the servers.
[
 {
  "flowN": 1,
  "value": 248800.14028768288,
  "key": "10.4.3.2,10.4.4.2"
 },
 {
  "flowN": 1,
  "value": 176879.3798722214,
  "key": "10.4.1.2,10.4.2.2"
 },
 {
  "flowN": 1,
  "value": 526.0366052656848,
  "key": "10.4.4.2,10.4.3.2"
 },
 {
  "flowN": 1,
  "value": 375.06686598182193,
  "key": "10.4.2.2,10.4.1.2"
 }
]
The result accurately reports the amount of traffic being exchanged between the servers, discarding duplicate data reported as traffic flows traverse the links between switches.
-Dbrowse-flows.agents=TOPOLOGY -Dbrowse-flows.aggMode=edge
Adding the above arguments to the end of the command line used to start sFlow-RT configures the Flow Browser application to use the topology de-duplication method.
Click on the link below to plot a graph of the top IP Protocols using the browse-flows application (screen capture shown above):
http://localhost:8008/app/browse-flows/html/index.html?keys=ipprotocol&value=bps
Note: No data will be shown until the topology is posted to sFlow-RT.
function print(label,obj) {
  logInfo(label+"="+JSON.stringify(obj));
}

setFlow('protocol',{keys:'ipprotocol',value:'bytes'});

setIntervalHandler(function() {
  print("locate_mac",topologyLocateHostMac('000AF725C062'));
  print("locate_ip",topologyLocateHostIP('10.4.3.2'));
  print("flow_max",activeFlows('ALL','protocol',5,0,'max'));
  print("flow_sum",activeFlows('ALL','protocol',5,0,'sum'));
  print("flow_edge",activeFlows('TOPOLOGY','protocol',5,0,'edge'));
});
The demo.js script shown above uses sFlow-RT's embedded scripting API, see Writing Applications. The script defines the flow called protocol that tracks top IP Protocols and prints out the top flows with different aggregation methods. The script also demonstrates an additional capability made possible when topology is known. The topologyLocateHostMac() and topologyLocateHostIP() function locates an addresses to the edge port connecting them to the network.
-Dscript.file=$PWD/demo.js
Run the script by adding the above argument to the end of the command line used to run sFlow-RT.
2021-01-22T17:08:35-08:00 INFO: locate_mac=[{"ipaddress":"10.4.3.2","node":"leaf1","agent":"192.168.0.11","ifindex":"38","port":"swp32s1","mac":"000AF725C062"}]
2021-01-22T17:08:35-08:00 INFO: locate_ip=[{"ipaddress":"10.4.3.2","node":"leaf1","agent":"192.168.0.11","ifindex":"38","port":"swp32s1","mac":"000AF725C062"}]
2021-01-22T17:08:35-08:00 INFO: flow_max=[{"flowN":16,"agent":"192.168.0.14","value":1208583376.782055,"dataSource":"54","key":"6"}]
2021-01-22T17:08:35-08:00 INFO: flow_sum=[{"flowN":16,"value":6615963204.827695,"key":"6"}]
2021-01-22T17:08:35-08:00 INFO: flow_edge=[{"flowN":4,"value":2104039983.2917378,"key":"6"}]
The output from the script shows that the addresses were located to leaf1 port swp32s1. The flow_max and flow_sum queries don't use the topology and combine data from all 16 data sources (switch ports) that are reporting traffic. The sum mode returns the largest value since traffic is added for every data source. The max mode finds the data source reporting the largest value for the flow and reports that value (agent: 192.168.0.14, dataSource: 54). The edge mode is equivalent to the REST query used earlier.

Mininet is a network emulator that you can run on your laptop in a virtual machine (e.g. using Multipass) that provides a useful platform for building virtual topologies and exploring topology related analytics. Mininet dashboardMininet weathermap, and Mininet, ONOS, and segment routing provide examples.


Ideally the network configuration and topology will be available in a centralized repository that can be queried to generate the information required by sFlow-RT. Alternatively, Link Layer Discovery Protocol (LLDP) data retrieved from network devices can be used to construct the topology. Fabric Visibility, Arista EOS CloudVision, and Fabric visibility with Cumulus Linux provide examples.