Monday, August 14, 2023

Containerlab dashboard

The GitHub sflow-rt/containerlab project contains example network topologies for the Containerlab network emulation tool that demonstrate real-time streaming telemetry in realistic data center topologies and network configurations. The examples use the same FRRouting (FRR) engine that is part of SONiC, NVIDIA Cumulus Linux, and DENT network operating systems. Containerlab can be used to experiment before deploying solutions into production. Examples include: tracing ECMP flows in leaf and spine topologies, EVPN visibility, and automated DDoS mitigation using BGP Flowspec and RTBH controls.
The screen capture at the top of this article shows a real-time dashboard displaying up to the second traffic analytics gathered from the 5 stage Clos fabric shown above. This article walks through the steps needed to run the example.
git clone https://github.com/sflow-rt/containerlab.git
cd containerlab
./run-clab
Run the above commands to download the project and run Containerlab on a system with Docker installed. Docker Desktop is a conventient way to run the labs on a laptop.
containerlab deploy -t clos5.yml
Start the emulation.
./topo.py clab-clos5
Post topology to sFlow-RT REST API. Connect to http://localhost:8008/app/containerlab-dashboard/html/ to access the Dashboard shown at the top of this article.
docker exec -it clab-clos5-h1 iperf3 -c 172.16.4.2
Each of the hosts in the network has an iperf3 server, so running the above command will test bandwidth between h1 and h4.
docker exec -it clab-clos5-h1 iperf3 -c 2001:172:16:4::2
Generate a large IPv6 flow between h1 and h4. The traffic flows should immediately appear in the Top Flows chart. You can check the accuracy by comparing the values reported by iperf3 with those shown in the chart.
Click on the Topology tab to see a real-time weathermap of traffic flowing over the topology. See how repeated iperf3 tests take different ECMP (equal-cost multi-path) routes across the network.
docker exec -it clab-clos5-leaf1 vtysh
Linux with open source routing software (FRRouting) is an accessible alternative to vendor routing stacks (no registration / license required, no restriction on copying means you can share images on Docker Hub, no need for virtual machines). FRRouting is popular in production network operating systems (e.g. Cumulus Linux, SONiC, DENT, etc.) and the VTY shell provides an industry standard CLI for configuration, so labs built around FRR allow realistic network configurations to be explored.
Connect to http://localhost:8008/ to access the main sFlow-RT status page, additional applications, and the REST API. See Getting Started for more information.
containerlab destroy -t clos5.yml
When you are finished, run the above command to stop the containers and free the resources associated with the emulation. Try out other topologies from the project to explore topics such as DDoS mitigation, BGP Flowspec, and EVPN.

Moving the monitoring solution from Containerlab to production is straightforward since sFlow is widely implemented in datacenter equipment from vendors including: A10, Arista, Aruba, Cisco, Edge-Core, Extreme, Huawei, Juniper, NEC, Netgear, Nokia, NVIDIA, Quanta, and ZTE. In addition, the open source Host sFlow agent makes it easy to extend visibility beyond the physical network into the compute infrastructure.

Tuesday, August 8, 2023

Grafana Network Weathermap

The screen capture above shows a simple network weathermap, displaying a network topology with links animated by real-time network analytics.
Hovering over a link in the weathermap pops up a trend chart showing traffic on the link over the last 30 minutes.

Deploy real-time network dashboards using Docker compose, describes how to quickly deploy a real-time network analytics stack that includes the sFlow-RT analytics engine, Prometheus time series database, and Grafana to create dashboards. This article describes how to extend the example using the Grafana Network Weathermap Plugin to display network topologies like the ones shown here.

First, add a dashboard panel and select the Network Weathermap visualization. Next define the three metrics shown above. The ifinoctets and ifoutoctets need to be scaled by 8 to convert from bytes per second to bits per second. Creating a custom legend entry makes it easier to select metrics to associate metric instances with weathermap links.
Add a color scale that will be used to color links by link utilization. Defining the scale first ensures that links will be displayed correctly when they are added later.
Add the nodes to the canvas and drag them to their desired locations. There is a large library of icons that can be used to indicate the node types. The Enable Node Grid Snapping makes it easier to line up nodes.
Add links to connect the nodes. Each link needs to be associated with in/out metrics and and a link speed. Setting the Side Anchor Point values correctly ensures a clean layout.

Network weathermaps are only one method of displaying network telemetry - work through the examples in Deploy real-time network dashboards using Docker compose to learn how to construct dashboards of trend charts and analyze traffic flows.

Thursday, July 13, 2023

Deploy real-time network dashboards using Docker compose


This article demonstrates how to use docker compose to quickly deploy a real-time network analytics stack that includes the sFlow-RT analytics engine, Prometheus time series database, and Grafana to create dashboards.
git clone https://github.com/sflow-rt/prometheus-grafana.git
cd prometheus-grafana
./start.sh
Download the sflow-rt/prometheus-grafana project from GitHub on a system with Docker installed and start the containers. The start.sh script runs docker compose to bring up the containers specified in the compose.yml file, passing in user information so that the containers have correct permission to  write data files in the prometheus and grafana directories.
All the Docker images in this example are available for both x86 and ARM processors, so this stack can be deployed on Intel/AMD platforms as well as Apple M1/M2 or Raspberry Pi. Raspberry Pi 4 real-time network analytics describes how to configure a Raspberry Pi 4 to run Docker and perform real-time network analytics and is a simple way to run this stack for smaller networks.

Configure sFlow Agents in network devices to stream sFlow telemetry to the host running the analytics stack. See Getting Started for information on how to verify that sFlow telemetry is being received.

Connect to the Grafana web interface on port 3000 using default user name and password (admin/admin). You will be promted to change the password.
Select the option to Import a new Dashboard.
Enter the code 11201 to import sFlow-RT Network Interfaces dashboard from Grafana.com and click on the Load button.
Select the sflow_rt_data Prometheus database and click on the Import button.
The dashboard should appear showing top interfaces by Utilization, Discards and Errors.
Repeat the steps to add the sFlow-RT Health dashboard, code 11096.

The sFlow-RT Countries and Networks dashboard is an example of a flow based metric, plotting information about source and destination countries and provider networks based on traffic analytics.

Prometheus has already been programmed to gather metrics for the previous two example, but to run this third example, we need to modify the Prometheus configuration to gather the flow based metrics needed for the dashboard.

  - job_name: 'sflow-rt-countries'
    metrics_path: /app/prometheus/scripts/export.js/flows/ALL/txt
    static_configs:
      - targets: ['sflow-rt:8008']
    params:
      metric: ['sflow_country_bps']
      key:
        - 'null:[country:ipsource:both]:unknown'
        - 'null:[country:ipdestination:both]:unknown'
      label: ['src','dst']
      value: ['bytes']
      scale: ['8']
      aggMode: ['sum']
      minValue: ['1000']
      maxFlows: ['100']

  - job_name: 'sflow-rt-asns'
    metrics_path: /app/prometheus/scripts/export.js/flows/ALL/txt
    static_configs:
      - targets: ['sflow-rt:8008']
    params:
      metric: ['sflow_asn_bps']
      key:
        - 'null:[asn:ipsource:both]:unknown'
        - 'null:[asn:ipdestination:both]:unknown'
      label: ['src','dst']
      value: ['bytes']
      scale: ['8']
      aggMode: ['sum']
      minValue: ['1000']
      maxFlows: ['100']
Edit the prometheus/prometheus.yml file and add the above lines to the end of the file.
docker restart prometheus
Restart the prometheus container to pick up the new configuration and start collecting the data.
Add dashboard 11146 to load the sFlow-RT Countries and Networks dashboard.

Getting Started describes how to use the sFlow-RT Flow Browser and Metrics Browser applications to explore the data that is available (the sFlow-RT web interface is exposed on port 8008). Once you have found a useful metric, add it to the set of metrics for Prometheus (the Prometheus web interface is exposed on port 9090) to collect and use Grafana to build dashboards that incorporate the new metrics. Flow metrics with Prometheus and Grafana describes how Prometheus can use sFlow-RT's REST API to define and retrieve traffic flow based metrics like the ones in the Countries and Networks dashboard. 

Sunday, June 11, 2023

Raspberry Pi 4 real-time network analytics

CanaKit Raspberry Pi 4 EXTREME Kit - Aluminum
This article describes how build an inexpensive Raspberry Pi 4 based server for real-time flow analytics of industry standard sFlow streaming telemetry. Support for sFlow is widely implemented in datacenter equipment from vendors including: A10, Arista, Aruba, Cisco, Edge-Core, Extreme, Huawei, Juniper, NEC, Netgear, Nokia, NVIDIA, Quanta, and ZTE.

In this example, we will use an 8G Raspberry Pi 4 running Raspberry Pi OS Lite (64-bit).  The easiest way to format a memory card and install the operating system is to use the Raspberry Pi Imager (shown above).
Click on the gear icon to set a user and password and enable ssh access. These initial settings allow the Rasberry Pi to be accessed over the network without having to attach a screen, keyboard, and mouse.

Next, follow instruction for installing Docker Engine (Raspberry Pi OS Lite is based on Debian 11).

The diagram shows how the sFlow-RT real-time analytics engine receives a continuous telemetry stream from industry standard sFlow instrumentation build into network, server and application infrastructure and delivers analytics through APIs and can easily be integrated with a wide variety of on-site and cloud, orchestration, DevOps and Software Defined Networking (SDN) tools.
docker run -p 6343:6343/udp -p 127.0.0.1:8008:8008 \
--name sflow-rt -d --restart unless-stopped sflow/prometheus
Run the pre-built sflow/prometheus Docker image. In this example access to the user interface is limited to local host in order prevent unauthorized access since no access controls are provided by sFlow-RT.
ssh -L 8008:127.0.0.1:8008 pp@192.168.4.163
Use ssh to connect to the Raspberry Pi (192.168.4.163) and tunnel port 8008 to your laptop.
Access the web interface at http://127.0.0.1:8008/. See Getting Started for instructions for enabling monitoring and browsing metrics. Python is installed by default on Raspberry Pi OS, making it convenient to experiment with the sFlow-RT REST API, see Writing Applications.
If you don't have immediate access to a network and want to experiment, follow the instructions in Leaf and spine network emulation on Mac OS M1/M2 systems to emulate the 5 stage leaf and spine network shown above using Containerlab.
docker stop sflow-rt
Note: If you are going to try the examples, first run the command above to stop the sflow-rt image to avoid port contention when Containerlab starts an instance of sFlow-RT.
The screen capture shows a real-time view of traffic flowing across the the emulated leaf and spine network during a series iperf3 tests. The emulated results are very close to those you can expect when monitoring production traffic on a physical network.

The Raspberry Pi 4 is surprisingly capable, this pocket-sized server can easily monitor hundreds of high speed (100G+) links, providing up to the second visibility into network flows.

Tuesday, May 23, 2023

Leaf and spine network emulation on Mac OS M1/M2 systems


The GitHub sflow-rt/containerlab project contains example network topologies for the Containerlab network emulation tool that demonstrate real-time streaming telemetry in realistic data center topologies and network configurations. The examples use the same FRRouting (FRR) engine that is part of SONiC, NVIDIA Cumulus Linux, and DENT network operating systems. Containerlab can be used to experiment before deploying solutions into production. Examples include: tracing ECMP flows in leaf and spine topologies, EVPN visibility, and automated DDoS mitigation using BGP Flowspec and RTBH controls.

The Containerlab project currently has limited support for Mac OS, stating "ARM-based Macs (M1/2) are not supported, and no binaries are generated for this platform. This is mainly due to the lack of network images built for arm64 architecture as of now." However, this argument doesn't apply to the Linux based images used in these examples.

First install Docker Desktop on your Apple silicon based Mac (select the Apple Chip option).

mkdir clab
cd clab
docker run --rm -it --privileged \
  --network host --pid="host" \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /run/netns:/run/netns \
  -v $(pwd):$(pwd) -w $(pwd) \
  sflow/clab bash

Run Containerlab by typing the above commands in a terminal. This command uses a pre-built multi-architecture sflow/clab image. If you are running on an x86 platform, follow the official Containerlab Installation instructions.

git clone https://github.com/sflow-rt/containerlab.git

Download the Containerlab topologies from the sflow-rt/containerlab project.

containerlab deploy -t containerlab/clos5.yml

Start the 5 stage leaf and spine topology shown at the top of this page. The initial launch may take a couple of minutes as the container images are downloaded for the first time. Once the images are downloaded, the topology deploys in around 10 seconds.

An instance of the sFlow-RT real-time analytics engine receives industry standard sFlow telemetry from all the switches in the network. All of the switches in the topology are configured to send sFlow to the sFlow-RT instance. In this case, Containerlab is running the pre-built sflow/prometheus image which packages sFlow-RT with useful applications for exploring the data.

Connect to the web interface, http://localhost:8008. The sFlow-RT dashboard verifies that telemetry is being received from 10 agents (the 10 switches in the Clos fabric). See the sFlow-RT Quickstart guide for more information.

The screen capture shows a real-time view of traffic flowing across the network during a series iperf3 tests. Click on the sFlow-RT Apps menu and select the browse-flows application, or click here for a direct link to a chart with the settings shown above.
docker exec -it clab-clos5-h1 iperf3 -c 172.16.4.2

Each of the hosts in the network has an iperf3 server, so running the above command will test bandwidth between h1 and h4.

docker exec -it clab-clos5-leaf1 vtysh

Linux with open source routing software (FRRouting) is an accessible alternative to vendor routing stacks (no registration / license required, no restriction on copying means you can share images on Docker Hub, no need for virtual machines). FRRouting is popular in production network operating systems (e.g. Cumulus Linux, SONiC, DENT, etc.) and the VTY shell provides an industry standard CLI for configuration, so labs built around FRR allow realistic network configurations to be explored.

containerlab destroy -t containerlab/clos5.yml

When you are finished, run the above command to stop the containers and free the resources associated with the emulation. Try out other topologies from the project to explore topics such as DDoS mitigation, BGP Flowspec, and EVPN.

Moving the monitoring solution from Containerlab to production is straightforward since sFlow is widely implemented in datacenter equipment from vendors including: A10, Arista, Aruba, Cisco, Edge-Core, Extreme, Huawei, Juniper, NEC, Netgear, Nokia, NVIDIA, Quanta, and ZTE. In addition, the open source Host sFlow agent makes it easy to extend visibility beyond the physical network into the compute infrastructure.

Monday, April 10, 2023

VyOS DDoS mitigation

Real-time flow analytics on VyOS describes how to install real-time analytics based on sFlow and the sFlow-RT analytics engine. This article extends the example to show how to automatically mitigate DDoS attacks using flow analytics combined with BGP Remotely Triggered Black Hole (RTBH) / Flowspec.
vyos@vyos:~$ add container image sflow/ddos-protect
First, download the sflow/ddos-protect image.
vyos@vyos:~$ mkdir -m 777 /config/sflow-rt
Create a directory to store persistent container state.
set container network sflowrt prefix 192.168.1.0/24
Define an internal network to connect to container. Currently VyOS BGP does not allow direct connections to local addresses (e.g. 127.0.0.1), so we need to put controller on its own network so the router can connect and receive DDoS mitigation BGP RTBH / Flowspec controls.
set container name sflow-rt image sflow/ddos-protect
set container name sflow-rt host-name sflow-rt
set container name sflow-rt arguments '-Dddos_protect.router=192.168.1.1 -Dddos_protect.enable.flowspec=yes'
set container name sflow-rt environment RTMEM value 200M
set container name sflow-rt memory 0
set container name sflow-rt volume store source /config/sflow-rt
set container name sflow-rt volume store destination /sflow-rt/store
set container name sflow-rt network sflowrt address 192.168.1.2

Configure a container to run the image. The -Dddos_protect.router argument sets the BGP neighbor address, 192.168.1.1.

vyos@vyos:~$ ifconfig podman-sflowrt
podman-sflowrt: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.1  netmask 255.255.255.0  broadcast 192.168.1.255
        ether be:9e:69:f4:d0:4e  txqueuelen 1000  (Ethernet)
        RX packets 28  bytes 2662 (2.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 27  bytes 8032 (7.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
Connections to containers on sflowrt container network appear to originate from 192.168.1.1, the address assigned to VyOS interface podman-sflowrt.
set system sflow interface eth0
set system sflow interface eth1
set system sflow interface eth2
set system sflow polling 30
set system sflow sampling-rate 1000
set system sflow drop-monitor-limit 50
set system sflow server 192.168.1.2
Configure sFlow and send to sflow-rt container address 192.168.1.2.
set protocols bgp system-as 64500
set protocols bgp neighbor 192.168.1.2 port 1179
set protocols bgp neighbor 192.168.1.2 remote-as 65000
set protocols bgp neighbor 192.168.1.2 address-family ipv4-unicast
set protocols bgp neighbor 192.168.1.2 address-family ipv4-flowspec
Configure sflow-rt as BGP neighbor. Documentation ASN 64500 should be replaced by your ASN. The private ASN 65000 is a DDoS Protect default and can be changed with the -Dddos_protect.as argument.
ssh -L 8008:192.168.1.2:8008 vyos@router.example
Use ssh tunnel to connect to the container network and access web interface at http://localhost:8008.
Real-time DDoS mitigation using BGP RTBH and FlowSpec describes how to configure the DDoS protect application. The screen capture above shows the Charts page after a couple of simulated DDoS attacks on an address, 198.51.100.129, protected by the VyOS router. The charts show two ip_flood and a single udp_amplification attack - see DDoS attacks and BGP Flowspec responses for information on simulating different types of DDoS attack to test mitigation responses.
The Controls page shows three active controls. The table shows the targeted address, administrative address group, attack type, protocol, detection time, mitigation action and status of each active DDoS attack.
vyos@vyos:~$ show bgp ipv4
BGP table version is 0, local router ID is 192.168.1.1, vrf id 0
Default local pref 100, local AS 64500
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

    Network          Next Hop            Metric LocPrf Weight Path
    198.51.100.129/32
                    192.0.2.1                              0 65000 i

Displayed  1 routes and 1 total paths
The show command verifies that a Remotely Triggered Black Hole (RTBH) rule has been received for the drop mitigation actions. Advertising a black hole route risks collateral damage since it drops all traffic to the targetted host in order to protect network bandwidth and services provided by other hosts. 
vyos@vyos:~$ show bgp ipv4 flowspec detail 
BGP flowspec entry: (flags 0x418)
        Destination Address 198.51.100.129/32
        IP Protocol = 17 
        Source Port = 53 
        FS:rate 0.000000
        received for 00:00:12
        not installed in PBR
The show command verifies that a Flowspec rule has been received for the filter mitigation action. Using Flowspec to filter traffic is more targetted than using black hole routes. In this case only UDP traffic (IP Protocol 17) with Source Port 53 (DNS) is dropped, all other services provided by the targetted host are still accessible.
vyos@vyos:~$ show log container sflow-rt 
2023-04-08T00:24:14Z INFO: Starting sFlow-RT 3.0-1681
2023-04-08T00:24:16Z INFO: Version check, running latest
2023-04-08T00:24:17Z INFO: Listening, BGP port 1179
2023-04-08T00:24:18Z INFO: Listening, sFlow port 6343
2023-04-08T00:24:19Z INFO: Listening, HTTP port 8008
2023-04-08T00:24:19Z INFO: DNS server 1.1.1.1
2023-04-08T00:24:19Z INFO: app/ddos-protect/scripts/ddos.js started
2023-04-08T00:24:19Z INFO: app/prometheus/scripts/export.js started
2023-04-08T00:24:19Z INFO: app/browse-drops/scripts/top.js started
2023-04-08T00:24:19Z INFO: app/browse-flows/scripts/top.js started
2023-04-08T00:26:11Z INFO: BGP open 192.168.1.1 51252
2023-04-08T14:37:36Z INFO: DDoS drop ip_flood 198.51.100.129 local 47
2023-04-08T14:38:19Z INFO: DDoS filter udp_amplification 198.51.100.129 local 53
2023-04-08T14:38:19Z INFO: DDoS drop ip_flood 198.51.100.129 local 17
Attacks are recorded in the container log. Monitoring DDoS mitigation describes how to use Prometheus / Elasticsearch / Grafana to monitor DDoS activity and build dashboards.

This is only a partial configuration. Peering sessions with upstream routers need to be configured to propagate controls so that DDoS attack traffic can be blocked before it saturates the upstream link. The limited scrubbing capacity of the VyOS software router isn't a factor since traffic will be dropped in hardware upstream. The flexibility of the VyOS router is an advantage in providing visibility and analytics to quickly trigger mitigation actions.

Tuesday, April 4, 2023

Real-time flow analytics on VyOS

VyOS with Host sFlow agent describes support for streaming sFlow telemetry added to the open source VyOS router operating system. This article describes how to install analytics software on a VyOS router by configuring a container.
vyos@vyos:~$ add container image sflow/ddos-protect
First, download the sflow/ddos-protect image.
vyos@vyos:~$ mkdir -m 777 /config/sflow-rt
Create a directory to store persistent container state.
set container name sflow-rt image sflow/ddos-protect
set container name sflow-rt allow-host-networks
set container name sflow-rt arguments '-Dhttp.hostname=10.0.0.240'
set container name sflow-rt environment RTMEM value 200M
set container name sflow-rt memory 0
set container name sflow-rt volume store source /config/sflow-rt
set container name sflow-rt volume store destination /sflow-rt/store
Configure a container to run the image. The RMEM environment variable setting limits the amount of memory that the container will use to 200M bytes. The -Dhttp.hostname argument sets the internal web server to listen on management address, 10.0.0.240, assigned to eth0 on this router. The container has is no built-in authentication, so access needs to be limited using an ACL or through a reverse proxy - see Download and install.
set system sflow interface eth0
set system sflow interface eth1
set system sflow interface eth2
set system sflow polling 30
set system sflow sampling-rate 1000
set system sflow drop-monitor-limit 50
set system sflow server 127.0.0.1
Next, configure sFlow agent to send to localhost (127.0.0.1).
Finally connect to the web interface on the router at port 8008. The status page verifies that the sFlow-RT analytics engine is receiving sFlow from 1 sFlow Agent (the VyOS router). See Getting started for more information.
The included Flow Browser application provides an up to the second view traffic flows. Defining Flows describes the fields that can be used to break out traffic.
VyOS dropped packet notifications describes how to configure and monitor sFlow dropped packet notifications. The included Discard Browser provides an up to the second view of dropped packets.
The included Metric Browser application lets you explore the metrics that are being streamed. The chart updates in real-time as data arrives and in this case shows CPU utilization on the VyOS router. The standard set of metrics exported by the Host sFlow agent include interface counters as well as host cpu, memory, network and disk performance metrics. Metrics lists the set of available metrics.
Flow metrics with Prometheus and Grafana describes how integrate flow analytics into operational dashboards. The included Prometheus application exposes flow analytics in the standard Prometheus scrape format so that they can be logged in time series databases.
DDoS protection quickstart guide describes how to use real-time sFlow analytics with BGP Flowspec / RTBH to automatically mitigate DDoS attacks. The included DDoS Protect application detects common volumetric attacks and can apply automated responses. The screen capture shows traffic associated with a series of simulated DDoS attacks against hosts behind the VyOS router, see DDoS attacks and BGP Flowspec responses.
The embedded sFlow-RT analytics engine exposes a REST API that can be used to program flow analytics, set thresholds, monitor events, and gather statistics. In addition, the applications shown in this article were all written using sFlow-RT's embedded scripting API. See Writing Applications for more information.

Monday, April 3, 2023

Dropped packet reason codes in VyOS

The article VyOS with Host sFlow agent describes how to use industry standard sFlow telemetry to monitor network traffic flows and statistics in the latest VyOS rolling releases. VyOS dropped packet notifications describes how sFlow also provides visibility into network packet drops and Dropped packet reason codes in Linux 6+ kernels describes how newer kernels are able to provide specific reasons for dropping packets. 
vyos@vyos:~$ uname -r
6.1.22-amd64-vyos

The latest VyOS rolling release runs on a Linux 6.1 kernel and the latest release of VyOS now provides enhanced visibility into dropped packets using kernel reason codes.

vyos@vyos:~$ show version
Version:          VyOS 1.4-rolling-202303310716
Release train:    current

Built by:         autobuild@vyos.net
Built on:         Fri 31 Mar 2023 07:16 UTC
Build UUID:       1a7448d9-d53c-48a0-8644-ed1970c1abb8
Build commit ID:  75c9311fba375e

Architecture:     x86_64
Boot via:         installed image
System type:       guest

Hardware vendor:  innotek GmbH
Hardware model:   VirtualBox
Hardware S/N:     0
Hardware UUID:    da75808d-ff60-1d4c-babd-84a7fa341053

Copyright:        VyOS maintainers and contributors
Verify that the version of of VyOS is VyOS 1.4-rolling-202303310716 or later.

In the previous article, VyOS dropped packet notifications,  two tests were performed, the first a failed attempt to connect to the VyOS router using telnet (telnet has been disabled in the router config), and the second a traceroute test between two hosts connected to the router. The sFlow drop reason codes reported for these two tests were unknown_l4 and unknown_l3 respectively. The Linux kernel functional names weren't much more specific, tcp_v4_rcv and ip_forward respectively. However, in this case, the Linux 6.1 kernel instrumentation allows more specific sFlow drop reasons to be reported, as shown in the chart at the top of this article.

  • port_unreachable This sFlow drop reason code is defined by reference to RFC 1812 section 5.2.7.1 and is defined as "Port Unreachable - generated if the designated transport protocol (e.g., UDP) is unable to demultiplex the datagram in the transport layer of the final destination but has no protocol mechanism to inform the sender"
  • ip_1_parsing This sFlow drop reason code is defined by reference to Devlink Trap and is defined as "Traps packets dropped due to an error in the first IP header parsing. This packet trap could include packets which do not pass an IP checksum check, a header length check (a minimum of 20 bytes), which might suffer from packet truncation thus the total length field exceeds the received packet length etc."
The detailed reasons make it easier to identify the root causes of packet drops, particularly when combined with information from the dropped packet's header that is also included in the sFlow Dropped Packet Notification messages.

Thursday, March 30, 2023

Dropped packet reason codes in Linux 6+ kernels

Using sFlow to monitor dropped packets describes support for standard sFlow Dropped Packet Notications in the open source Host sFlow agent. This article describes additional capabilities in Linux 6+ kernels that clarify reasons why packets are dropped in the kernel.

The recent addition of dropreason.h in Linux 6+ kernels provides detailed reasons for packet drops. The netlink drop_monitor API has been extended to include the NET_DM_ATTR_REASON attribute to report the drop reason, see net_dropmon.h.

The following example illustrates the value of the reason code in explaining Linux packet drops.

tcp_v4_rcv+0x7c/0xef0
The value of NET_DM_ATTR_SYMBOL shown above indicates that the packet was dropped in the tcp_v4_rcv function in Linux kernel at memory location 0x7c/0xef0. While this information is helpful, there are many reasons why a TCP packet may be dropped.
NO_SOCKET
In this case, the value of NET_DM_ATTR_REASON shown above indicates that the TCP packet was dropped because no application had opened a socket and so there was nowhere to deliver the packet.

In the case of Linux-based hardware switches or smart network adapters, where packet processing is offloaded to hardware, the netlink drop_monitor events include NET_DM_ATTR_HW_TRAP_GROUP_NAME and NET_DM_ATTR_HW_TRAP_NAME attributes and packet header information supplied by the hardware driver, see Devlink Trap.

The latest version of the open source Host sFlow agent includes adds support for the NET_DM_ATTR_REASON attribute to improve the accuracy of the sFlow drop_reason.

port_unreachable
In our example, the Host sFlow is now able to report port_unreachable as the reason for the dropped packet, rather than a generic unknown_l4 reason reported for older kernels.

The screen capture at the top of this article shows dropped packet information displayed in real-time using the Discard Browser application running on the sFlow-RT analytics engine. The chart demonstrates how the combination of information from the header of the dropped packet along with the reason for dropping the packet quickly gets to the root cause of the packet drop. In this case an attempt has been made from 172.16.1.174 to connect to 172.16.1.1 via telnet (tcp port 23) and telnet has not been enabled on the server so the packet was dropped - as it should be since telnet is not a secure method of connecting.

docker run --name sflow-rt -p 8008:8008 -p 6343:6343/udp -d sflow/prometheus

A quick way to experiment with sFlow is to run the pre-built sflow/prometheus image using Docker. The bundled Discard Browser with the settings shown in the screen capture can be launched by clicking here.

Monday, March 27, 2023

VyOS dropped packet notifications

VyOS with Host sFlow agent describes how to configure and analyze industry standard sFlow telemetry recently added to the VyOS open source router platform. This article discusses sFlow dropped packet notifications support added to the latest release.

Dropped packets have a profound impact on network performance and availability. Packet discards due to congestion can significantly impact application performance. Dropped packets due to black hole routes, expired TTLs, MTU mismatches, etc. can result in insidious connection failures that are time consuming and difficult to diagnose. Visibility into dropped packets offers significant benefits for network troubleshooting, providing real-time network-wide visibility into the specific packets that were dropped as well the reason the packet was dropped. This visibility instantly reveals the root cause of drops and the impacted connections.

vyos@vyos:~$ show version
Version:          VyOS 1.4-rolling-202303260914
Release train:    current

Built by:         autobuild@vyos.net
Built on:         Sun 26 Mar 2023 09:14 UTC
Build UUID:       72b34f74-bfcd-4b51-9b95-544319c2dac5
Build commit ID:  d68bda6a295ba9

Architecture:     x86_64
Boot via:         installed image
System type:       guest

Hardware vendor:  innotek GmbH
Hardware model:   VirtualBox
Hardware S/N:     0
Hardware UUID:    df0a2b79-b8c4-8342-a27f-76aa3e52ad6d

Copyright:        VyOS maintainers and contributors

Verify that the version of of VyOS is VyOS 1.4-rolling-202303260914 or later.

On VyOS dropped packet monitoring relies on instrumentation built into recent Linux kernels and exposed through the netlink drop_monitor API. Enabling drop_monitor in VyOS kernel configuration allows the Host sFlow agent to capture and export information on dropped packets.
set system sflow interface eth0
set system sflow interface eth1
set system sflow interface eth2
set system sflow polling 30
set system sflow sampling-rate 1000
set system sflow drop-monitor-limit 50
set system sflow server 10.0.0.30 port 6343
The drop-monitor-limit configuration entry enables dropped packet monitoring and sets a rate limit of 50 dropped packets notifications per second.
docker run --name sflow-rt -p 8008:8008 -p 6343:6343/udp -d sflow/prometheus

A quick way to experiment with sFlow is to run the pre-built sflow/prometheus image using Docker on the sFlow server (in this case on 10.0.0.30). The chart at the top of the page uses the Discard Browser application to display an up to the second view of packets dropped by the VyOS router, click on this link to open the application with the settings shown.

The chart shows the results of two tests, the first a failed attempt to connect to the VyOS router using telnet (telnet has been disabled in the router config), and the second a traceroute test between two hosts connected to the router. The reason field reports the sFlow drop reason code and the function reports the linux kernel function that dropped the packet. With the telnet test, the packet was dropped in the tcp_v4_rcv function and is reported as an unknown_l4 sFlow reason. In the case of the traceroute test, 3 packets were dropped in the ip_forward function and are reported as unknown_l3 reason.

Enabling sFlow dropped packet notifications on all switches, routers, and hosts provides end-to-end visibility into dropped packets, rapidly identifying the location and reason for packet drops as well as identifying the impacted services.

Dropped packet monitoring complements sFlow's existing counter polling and packet sampling mechanisms and shares a common data model so that all three sources of data can be correlated. For example, if packets are being discarded because of buffer exhaustion, the discard records don't necessarily tell the whole story. The discarded packets may represent mice flows that are victims of an elephant flow. Packet samples will reveal the traffic that isn't being dropped and provide a more complete picture. Counter data adds additional information such as CPU load, interface speed, link utilization, packet and discard rates that further completes the picture.

Friday, March 17, 2023

VyOS with Host sFlow agent

VyOS described deficiencies with the embedded sFlow implementation in the open source VyOS router operating system and suggested that the open source Host sFlow agent be installed as an alternative. The VyOS developer community embraced the suggestion and has been incredibly responsive,  integrating, and releasing a version of VyOS with Host sFlow support within a week.
vyos@vyos:~$ show version
Version:          VyOS 1.4-rolling-202303170317
Release train:    current

Built by:         autobuild@vyos.net
Built on:         Fri 17 Mar 2023 03:17 UTC
Build UUID:       45391302-1240-4cc7-95a8-da8ee6390765
Build commit ID:  e887f582cfd7de

Architecture:     x86_64
Boot via:         installed image
System type:       guest

Hardware vendor:  innotek GmbH
Hardware model:   VirtualBox
Hardware S/N:     0
Hardware UUID:    871dd0f0-c4ec-f147-b1a7-ed536511f141

Copyright:        VyOS maintainers and contributors
Verify that the version of of VyOS is VyOS 1.4-rolling-202303170317 or later
set system sflow interface eth0
set system sflow interface eth1
set system sflow interface eth2
set system sflow polling 30
set system sflow sampling-rate 1000
set system sflow server 10.0.0.30 port 6343
The above commands configure sFlow export in the VyOS CLI using the embedded Host sFlow agent.
docker run --name sflow-rt -p 8008:8008 -p 6343:6343/udp -d sflow/prometheus
A quick way to experiment with sFlow is to run the pre-built sflow/prometheus image using Docker on the sFlow server (in this case on 10.0.0.30). The chart at the top of the page uses the Flow Browser application to display an up to the second view of the largest tcp flows through the VyOS router, click on this link to open the application with the settings shown.
Flow metrics with Prometheus and Grafana describes how integrate flow analytics into operational dashboards.
DDoS protection quickstart guide describes how to use real-time sFlow analytics with BGP Flowspec / RTBH to automatically mitigate DDoS attacks.

Saturday, March 11, 2023

VyOS

VyOS is an open source router operating system based on Linux. This article discusses how to improve network traffic visibility on VyOS based routers using the open source Host sFlow agent.

VyOS claims sFlow support, so why is it necessary to install an alternative sFlow agent? The following experiment demonstrates that there are significant issues with the VyOS sFlow implementation.

vyos@vyos:~$ show version
Version:          VyOS 1.4-rolling-202301260317
Release train:    current

Built by:         autobuild@vyos.net
Built on:         Thu 26 Jan 2023 03:17 UTC
Build UUID:       a95385b7-12f9-438d-b49c-b91f47ea7ab7
Build commit ID:  d5ea780295ef8e

Architecture:     x86_64
Boot via:         installed image
System type:      KVM guest

Hardware vendor:  innotek GmbH
Hardware model:   VirtualBox
Hardware S/N:     0
Hardware UUID:    6988d219-49a6-0a4a-9413-756b0395a73d

Copyright:        VyOS maintainers and contributors
Install a recent version of VyOS under VirtualBox and configure routing between two Linux virtual machines connected to eth1 and eth2 on the router. Out of band management is configured on eth0.
set system flow-accounting disable-imt
set system flow-accounting sflow agent-address 10.0.0.50
set system flow-accounting sflow sampling-rate 1000
set system flow-accounting sflow server 10.0.0.30 port 6343
set system flow-accounting interface eth0
set system flow-accounting interface eth1
set system flow-accounting interface eth2
The above commands configure sFlow monitoring on VyOS using the native sFlow agent.
The sflow/sflow-test tool is used to test the sFlow implementation while generating traffic consisting of a series of iperf3 tests (each generating approximately 50Mbps). The test fails in a number of significant ways:
  1. The implementation of sFlow is incomplete, omitting required interface counter export
  2. The peak traffic reported (3Mbps) is a fraction of the traffic generated by iperf3
  3. There is an inconsistency in the packet size reported in the sFlow messages
  4. Tests comparing counters and flow data fail because of missing counter export (1)
Fortunately, VyOS is a Linux based operating system, so we can install the Host sFlow agent as an alternative to the native sFlow implementation to provide traffic visibility.
delete system flow-accounting
First, disable the native VyOS sFlow agent.
wget https://github.com/sflow/host-sflow/releases/download/v2.0.38-1/hsflowd-ubuntu20_2.0.38-1_amd64.deb
sudo dpkg -i hsflowd-ubuntu20_2.0.38-1_amd64.deb
Next, download and install the Host sFlow agent by typing the above commands in VyOS shell.
# hsflowd configuration file
# http://sflow.net/host-sflow-linux-config.php

sflow {
  collector { ip=10.0.0.30 }
  pcap { dev = eth0 }
  pcap { dev = eth1 }
  pcap { dev = eth2 }
}
Edit the /etc/hsflowd.conf file.
systemctl restart hsflowd
Restart the sFlow agent to pick up the new configuration.
Rerunnig sflow-test shows that the implementation now passes. The peaks shown in the trend graph are consistent with the traffic generated by iperf3 and with traffic levels reported in interface counters.
The sflow/sflow-test Docker image also includes the Flow Browser application that can be used to monitor traffic flows in real-time. The screen shot above shows traffic from a single iperf3 test.
The sflow/sflow-test Docker image also includes the Metric Browser application that can be used to monitor counters in real-time. The screen shot above shows cpu_utilization.

The sFlow Test, Browse Flows and Browse Metrics applications run on the sFlow-RT analytics engine. Additional examples include Flow metrics with Prometheus and Grafana and DDoS protection quickstart guide.