Tuesday, October 12, 2021

Grafana Cloud


Grafana Cloud is a cloud hosted version of Grafana, Prometheus, and Loki. The free tier makes it easy to try out the service and has enough capability to satisfy simple use cases. In this article we will explore how metrics based on sFlow streaming telemetry can be pushed into Grafana Cloud.

The diagram shows the elements of the solution. Agents in host and network devices are configured to stream sFlow telemetry to an sFlow-RT real-time analytics engine instance. The Grafana Agent queries sFlow-RT's REST API for metrics and pushes them to Grafana Cloud.
docker run -p 8008:8008 -p 6343:6343/udp --name sflow-rt -d sflow/prometheus
Use Docker to run the pre-built sflow/prometheus image which packages sFlow-RT with the sflow-rt/prometheus application. Configure sFlow agents to stream data to this instance.
Create a Grafana Cloud account. Click on the Agent button on the home page to get the configuration settings for the Grafana Agent.
Click on the Prometheus button to get the configuration to forward metrics from the Grafana Agent.
Enter a name and click on the Create API key button to generate configuration settings that include a URL, username, and password that will be used in the Grafana Agent configuration.
server:
  log_level: info
  http_listen_port: 12345
prometheus:
  wal_directory: /tmp/wal
  global:
    scrape_interval: 15s
  configs:
    - name: agent
      host_filter: false
      scrape_configs:
        - job_name: 'sflow-rt-analyzer'
          metrics_path: /prometheus/analyzer/txt
          static_configs:
            - targets: ['host.docker.internal:8008']
        - job_name: 'sflow-rt-metrics'
          metrics_path: /prometheus/metrics/ALL/ALL/txt
          static_configs:
            - targets: ['host.docker.internal:8008']
          metric_relabel_configs:
            - source_labels: ['agent', 'datasource']
              separator: ':'
              target_label: instance
        - job_name: 'sflow-rt-countries'
          metrics_path: /app/prometheus/scripts/export.js/flows/ALL/txt
          static_configs:
            - targets: ['host.docker.internal:8008']
          params:
            metric: ['sflow_country_bps']
            key: ['null:[country:ipsource:both]:unknown','null:[country:ipdestination:both]:unknown']
            label: ['src','dst']
            value: ['bytes']
            scale: ['8']
            aggMode: ['sum']
            minValue: ['1000']
            maxFlows: ['100']
        - job_name: 'sflow-rt-asns'
          metrics_path: /app/prometheus/scripts/export.js/flows/ALL/txt
          static_configs:
            - targets: ['host.docker.internal:8008']
          params:
            metric: ['sflow_asn_bps']
            key: ['null:[asn:ipsource:both]:unknown','null:[asn:ipdestination:both]:unknown']
            label: ['src','dst']
            value: ['bytes']
            scale: ['8']
            aggMode: ['sum']
            minValue: ['1000']
            maxFlows: ['100']
      remote_write:
        - url: API_URL
          basic_auth:
            username: API_USERID
            password: API_KEY
Create an agent.yaml configuration file. Substitute the API_URL, API_USERID, and API_KEY with values from the API Key settings obtained previosly.
docker run -v $PWD/data:/etc/agent/data -v $PWD/agent.yaml:/etc/agent/agent.yaml \
--name grafana-agent -d grafana/agent
Use Docker to run the Grafana Agent.
Data should start appearing in Grafana Cloud. Install the sFlow-RT Health, sFlow-RT Countries and Networks, and sFlow-RT Network Interfaces dashboards to view the data. For example, the Countries and Networks dashboard above shows traffic entering and leaving your network broken out by network and country. Flow metrics with Prometheus and Grafana describes how to build Prometheus scrape_configs that will cause sFlow-RT to export custom traffic flow metrics. 
There are important scaleability and cost advantages to placing the sFlow-RT analytics engine in front of the metrics collection service. For example, in large scale cloud environments the metrics for each member of a dynamic pool isn't necessarily worth trending since virtual machines / containers are frequently added and removed. Instead, sFlow-RT can be instructed to track all the members of the pool, calculates summary statistics for the pool, and log the summary statistics. This pre-processing can significantly reduce storage requirements, lowering costs and increasing query performance. 
Host, Docker, Swarm and Kubernetes monitoring describes how to deploy sFlow agents to monitor compute infrastructure.
The sFlow-RT Prometheus Exporter application exposes a REST API that allows metrics to be summarized, filtered, and synthesized. Exposing these capabilities through a REST API allows Prometheus scrape_configs to control the behavior of the sFlow-RT analytics pipeline and retrieve a small set of hight value metrics tailored to your requirements.

Thursday, October 7, 2021

DDoS protection quickstart guide

DDoS Protect is an open source denial of service mitigation tool that uses industry standard sFlow telemetry from routers to detect attacks and automatically deploy BGP remotely triggered blackhole (RTBH) and BGP Flowspec filters to block attacks within seconds.

This document pulls together links to a number of articles that describe how you can quickly try out DDoS Protect and get it running in your environment:

DDoS Protect is a lightweight solution that uses standard telemetry and control (sFlow and BGP) capabilities of routers to automatically block disruptive volumetric denial of service attacks. You can quickly evaluate the technology on your laptop or in a test lab. The solution leverages standard features of modern routing hardware to scale easily to large high traffic networks.

Monday, September 20, 2021

Containernet

Containernet is a fork of the Mininet network emulator that uses Docker containers as hosts in emulated network topologies.

Multipass describes how build a Mininet testbed that provides real-time traffic visbility using sFlow-RT. This article adapts the testbed for Containernet.

multipass launch --name=containernet bionic
multipass exec containernet -- sudo apt update
multipass exec containernet -- sudo apt -y install ansible git aptitude default-jre
multipass exec containernet -- git clone https://github.com/containernet/containernet.git
multipass exec containernet -- sudo ansible-playbook -i "localhost," -c local containernet/ansible/install.yml
multipass exec containernet -- sudo /bin/sh -c "cd containernet; make develop"
multipass exec containernet -- wget https://inmon.com/products/sFlow-RT/sflow-rt.tar.gz
multipass exec containernet -- tar -xzf sflow-rt.tar.gz
multipass exec containernet -- ./sflow-rt/get-app.sh sflow-rt mininet-dashboard

Run the above commands in a terminal to create the Containernet virtual machine. 

multipass list

List the virtual machines

Name                    State             IPv4             Image
primary                 Stopped           --               Ubuntu 20.04 LTS
containernet            Running           192.168.64.12    Ubuntu 18.04 LTS
                                          172.17.0.1

Find the IP address of the mininet virtual machine we just created (192.168.64.12).

multipass exec containernet -- ./sflow-rt/start.sh

Start sFlow-RT. Use a web browser to connect to the VM and access the Mininet Dashboad application running on sFlow-RT, in this case http://192.168.64.12:8008/app/mininet-dashboard/html/

Open a second shell.

multipass shell containernet

Connet to the Containernet virtual machine.

cp containernet/examples/containernet_example.py .

Copy the Containernet Get started example script.

#!/usr/bin/python
"""
This is the most simple example to showcase Containernet.
"""
from mininet.net import Containernet
from mininet.node import Controller
from mininet.cli import CLI
from mininet.link import TCLink
from mininet.log import info, setLogLevel
setLogLevel('info')

exec(open("./sflow-rt/extras/sflow.py").read())

net = Containernet(controller=Controller)
info('*** Adding controller\n')
net.addController('c0')
info('*** Adding docker containers\n')
d1 = net.addDocker('d1', ip='10.0.0.251', dimage="ubuntu:trusty")
d2 = net.addDocker('d2', ip='10.0.0.252', dimage="ubuntu:trusty")
info('*** Adding switches\n')
s1 = net.addSwitch('s1')
s2 = net.addSwitch('s2')
info('*** Creating links\n')
net.addLink(d1, s1)
net.addLink(s1, s2, cls=TCLink, delay='100ms', bw=1)
net.addLink(s2, d2)
info('*** Starting network\n')
net.start()
info('*** Testing connectivity\n')
net.ping([d1, d2])
info('*** Running CLI\n')
CLI(net)
info('*** Stopping network')
net.stop()

Edit the copy and add the highlighted line to enable sFlow monitoring:

sudo python3 containernet_example.py

Run the Containernet example script.

Finally, the network topology will appear under the Mininet Dashboard topology tab.

Tuesday, August 31, 2021

Netdev 0x15


The recent Netdev 0x15 conference included a number of papers diving into the technology behind Linux as a network operating system. Slides and videos are now available on the conference web site.
Network wide visibility with Linux networking and sFlow describes the Linux switchdev driver used to integrate network hardware with Linux. The talk focuses on network telemetry, showing how standard Linux APIs are used to configure hardware instrumentation and stream telemetry using the industry standard sFlow protocol for data center wide visibility.
Switchdev in the wild describes Yandex's experience of deploying Linux switchdev based switches in production at scale. The diagram from the talk shows the three layer leaf and spine network architecture used in their data centers. Yandex operates multiple data centers, each containing up to 100,000 servers.
Switchdev Offload Workshop provides updates about the latest developments in the switchdev community. 
FRR Workshop discusses the latest development in the FRRouting project, the open source routing software that is now a defacto standard on Linux network operating systems.

Wednesday, August 18, 2021

Nokia Service Router Linux


Nokia Service Router Linux (SR-Linux) is an open source network operating system running on Nokia's merchant silicon based data center switches.

The following commands configure SR-Linux to sample packets at 1-in-10000, poll counters every 20 seconds and stream standard sFlow telemetry to an analyzer (192.168.10.20) using the default sFlow port 6343:

system {
    sflow {
        admin-state enable
        sample-rate 10000
        collector 1 {
            collector-address 192.168.10.20
            network-instance default
            source-address 192.168.1.1
            port 6343
        }
    }
}

For each interface:

interface ethernet-1/1 {
    admin-state enable
    sflow {
        admin-state enable
    }
}

Enable sFlow on all switches and ports in the data center fabric for comprehensive visibility.

An instance of the sFlow-RT real-time analytics software converts the raw sFlow telemetry into actionable measurements to drive operational dashboards and automation (e.g. DDoS mitigation, traffic engineering, etc.).
docker run --name sflow-rt -p 8008:8008 -p 6343:6343/udp -d sflow/prometheus
A simple way to get started is to run the Docker sflow/prometheus image on the sFlow analyzer host (192.168.10.20 in the example config) to run sFlow-RT with useful applications to explore the telemetry. Access the web interface at http://192.168.10.20:8008.

Tuesday, June 15, 2021

DDoS mitigation using a Linux switch

Linux as a network operating system describes the benefits of using standard Linux as a network operating system for hardware switches. A key benefit is that the behavior of the physical network can be efficiently emulated using standard Linux virtual machines and/or containers.

In this article, CONTAINERlab will be used to create a simple testbed that can be used to develop a real-time DDoS mitigation controller. This solution is highly scaleable. Each hardware switch can monitor and filter terabits per second of traffic and a single controller instance can monitor and control hundreds of switches.

Create test network

The following ddos.yml file specifies the testbed topology (shown in the screen shot at the top of this article):

name: ddos
topology:
  nodes:
    router:
      kind: linux
      image: sflow/frr
    attacker:
      kind: linux
      image: sflow/hping3
    victim:
      kind: linux
      image: alpine:latest
  links:
    - endpoints: ["router:swp1","attacker:eth1"]
    - endpoints: ["router:swp2","victim:eth1"]

Run the following command to run the emulation:

sudo containerlab deploy ddos.yml

Configure interfaces on router:

interface swp1
 ip address 192.168.1.1/24
!
interface swp2
 ip address 192.168.2.1/24
!

Configure attacker interface:

ip addr add 192.168.1.2/24 dev eth1
ip route add 192.168.2.0/24 via 192.168.1.1

Configure victim interface:

ip addr add 192.168.2.2/24 dev eth1
ip route add 192.168.1.0/24 via 192.168.2.1

Verify connectivity between the attacker and the victim:

sudo docker exec -it clab-ddos-attacker ping 192.168.2.2
PING 192.168.2.1 (192.168.2.1): 56 data bytes
64 bytes from 192.168.2.1: seq=0 ttl=64 time=0.069 ms

Install visibility and control applications on router

The advantage of using Linux as a network operating system is that you can develop and install software to tailor the network to address specific requirements. In this case, for DDoS mitigation, we need real-time visibility to detect DDoS attacks and real-time control to filter out the attack traffic.

Open a shell on router:

sudo docker exec -it clab-ddos-router sh

Install and configure Host sFlow agent:

apk --update add build-base linux-headers openssl-dev dbus-dev gcc git
git clone https://github.com/sflow/host-sflow.git
cd host-sflow
make FEATURES="DENT"
make install

Edit /etc/hsflowd.conf

sflow {
  agent = eth0
  collector { ip=172.20.20.1 udpport=6343 }
  dent { sw=on switchport=swp.* }
}

Note: On a hardware switch, set sw=off to offload packet sampling to hardware.

Start hsflowd:

hsflowd

Download and run tc_server Python script for adding and removing tc flower filters using a REST API:

wget https://raw.githubusercontent.com/sflow-rt/tc_server/master/tc_server
nohup python3 tc_server > /dev/null &

The following command shows the Linux tc filters used in this example:

# tc filter show dev swp1 ingress
filter protocol all pref 1 matchall chain 0 
filter protocol all pref 1 matchall chain 0 handle 0x1 
  not_in_hw
	action order 1: sample rate 1/10000 group 1 trunc_size 128 continue
	 index 3 ref 1 bind 1

filter protocol ip pref 14 flower chain 0 
filter protocol ip pref 14 flower chain 0 handle 0x1 
  eth_type ipv4
  ip_proto udp
  dst_ip 192.168.2.2
  src_port 53
  not_in_hw
	action order 1: gact action drop
	 random type none pass val 0
	 index 1 ref 1 bind 1

The output shows the standard Linux tc-matchall and tc-flower filters used to monitor and drop traffic on the router. The Host sFlow agent automatically installs a matchall rule on each interface in order to sample packets. The tc_server script adds and removes a flower filters to drop unwanted traffic. On a hardware router, the filters are offloaded by the Linux switchdev driver to the router ASIC for line rate performance.

Test REST API

Add filter:

curl -X PUT -H "Content-Type: application/json" \
--data '{"ip_proto":"udp","dst_ip":"10.0.2.2","src_port":"53"}' \
http://clab-ddos-router:8081/swp1/10

Show filters:

curl http://clab-ddos-router:8081/swp1

Remove filter:

curl -X DELETE http://clab-ddos-router:8081/swp1/10

Build an automated DDoS mitigation controller

The following sFlow-RT ddos.js script automatically detects and drops UDP amplification attacks:

var block_minutes = 1;
var thresh = 10000;

setFlow('udp_target',{keys:'ipdestination,udpsourceport',value:'frames'});

setThreshold('attack',{metric:'udp_target', value:thresh, byFlow:true, timeout:2});

var id = 10;
var controls = {};
setEventHandler(function(evt) {
  var key = evt.flowKey;
  if(controls[key]) return;

  var prt = ifName(evt.agent,evt.dataSource);
  if(!prt) return;

  var [dst_ip,src_port] = key.split(',');
  var filter = {
    // uncomment following line for hardware routers
    // 'skip_sw':'skip_sw',
    'ip_proto':'udp',
    'dst_ip':dst_ip,
    'src_port':src_port
  };
  var url = 'http://'+evt.agent+':8081/'+prt+'/'+id++;
  try {
    http(url,'put','application/json',JSON.stringify(filter));
  } catch(e) {
    logWarning(url + ' put failed');
  }
  controls[key] = {time:evt.timestamp, evt:evt, url:url};
  logInfo('block ' + evt.flowKey);
},['attack']);

setIntervalHandler(function(now) {
  for(var key in controls) {
    var control = controls[key];
    if(now - control.time < 1000 * 60 * block_minutes) continue;
    var evt = control.evt;
    if(thresholdTriggered(evt.thresholdID,evt.agent,evt.dataSource+'.'+evt.metric,evt.flowKey)) {
      // attack is ongoing - keep control
      continue;
    }
    try {
      http(control.url,'delete');
    } catch(e) {
      logWarning(control.url + ' delete failed');
    }
    delete controls[key];
    logInfo('allow '+control.evt.flowKey);
  }
});

See Writing Applications for more information on the script.

Run the controller script on the CONTAINERlab host using the sFlow-RT real-time analytics engine:

sudo docker run --network=host -v $PWD/ddos.js:/sflow-rt/ddos.js \
sflow/prometheus -Dscript.file=ddos.js

Verify that sFlow is being received by the checking the sFlow-RT status page, http://containerlab_ip:8008/

Test controller

Monitor for attack traffic on the victim:

sudo docker exec -it clab-ddos-victim sh
apk --update add tcpdump
tcpdump -n -i eth1 udp port 53

Start attack:

sudo docker exec -it clab-ddos-attacker \
hping3 --flood --udp -k -s 53 --rand-source 192.168.2.2

There should be a brief flurry of packets seen at the victim before the controller detects and blocks the attack. The entire period between launching the attack and the attack traffic being blocked is under a second.

Thursday, May 20, 2021

Linux as a network operating system


NVIDIA Linux Switch enables any standard Linux distribution to be used as the operating system on the NVIDIA Spectrum™ switches. Unlike network operating systems that are Linux based, where you are limited to a specific version of Linux and control of the hardware is restricted to vendor specific software modules, Linux Switch allows you to install an unmodified version of your favorite Linux distribution along with familiar Linux monitoring and orchestration tools. 

The key to giving Linux control of the switch hardware is the switchdev module - a standard part of recent Linux kernels. Linux switchdev is an in-kernel driver model for switch devices which offload the forwarding (data) plane from the kernel. Integrating switch ASIC drivers in the Linux kernel makes switch ports appear as additional Linux network interfaces that can be configured and managed using standard Linux tools.

The mlxsw wiki provides instructions for installing Linux using ONIE or PXE boot on Mellanox switch hardware, for example, on NVIDIA® Spectrum®-3 based SN4000 series switches, providing 1G - 400G port speeds to handle scale-out data center applications.

Major benefits of using standard Linux as the switch operating system include:

  • no licensing fees, feature restrictions, or license management complexity associated proprietary network operating systems
  • large ecosystem of open source and commercial software available for Linux
  • software updates and security patches available through Linux distribution
  • install same Linux distribution on the switches and servers to reduce operational complexity and leverage existing expertise
  • run instances of the Linux distribution as virtual machines or containers to test configurations and develop automation scripts
  • standard Linux APIs, and availability of Linux developers, lowers the barrier to customization, making it possible to tailor network behavior to address application / business requirements

The switchdev driver for NVIDIA Spectrum ASICs exposes advanced dataplane instrumentation through standard Linux APIs. This article will explore how the open source Host sFlow agent uses the standard Linux APIs to stream real-time telemetry from the ASIC using industry standard sFlow.

The diagram shows the elements of the solution. Host sFlow agents installed on servers and switches stream sFlow telemetry to an instance of the sFlow-RT real-time analytics engine. The analytics provide a comprehensive, up to the second, view of performance to drive automation.

Note: If you are unfamiliar with sFlow, or want to hear about the latest developments, Real-time network telemetry for automation provides an overview and includes a demonstration of monitoring and troubleshooting network and system performance of a GPU cluster.

Download the latest Host sFlow agent sources:

git clone https://github.com/sflow/host-sflow.git

INSTALL.Linux provides information on compiling Host sFlow on Linux. The following instructions assume a DEB based distrubution (Debian, Ubuntu):

cd host-sflow/
make deb FEATURES=DENT

It isn't necessary to install development tools on the switch. All major Linux distributions are available as Docker images. Select a Docker image that matches the operating system version on the switch and use it to build the package.

Copy the resulting hsflowd package to the switch and install:

sudo dpkg -i hsflowd_2.0.34-3_amd64.deb

Next, edit the /etc/hsflowd.conf file to configure the agent:

sflow {
  collector { ip=10.0.0.1 }
  systemd { }
  psample { group=1 egress=on }
  dropmon { group=1 start=on sw=off hw=on }
  dent { sw=off switchport=swp.* }
}

In this case, 10.0.0.1 is the address of the sFlow collector and swp.* is a regular expression used to identify front panel switch ports. The systemd{} module monitors services running on the switch - see Monitoring Linux services, the psample{} module receives randomly sampled packets from the switch ASIC - see Linux 4.11 kernel extends packet sampling support, the dropmon{} module receives dropped packet notifications - see Using sFlow to monitor dropped packets, and the dent{} module automaticallly configures packet sampling of traffic on front panel switch ports - see Packet Sampling.

Note: The same configuration file can be used for for every switch in the network, making configuration of the agents easy to automate. 

Enable and start the agent.

sudo systemctl enable hsflowd.service
sudo systemctl start hsflowd.service

Finally, use the pre-built sflow/prometheus Docker image to start a copy the sFlow-RT real-time analytics software on the collector host (10.0.0.1):

docker run -p 8008:8008 -p 6343:6343/udp -d sflow/prometheus

The web interface is accessible on port 8008.

The included Metric Browser application lets you explore the metrics that are being streamed. The chart update in real-time as data arrives and in this case identifies the interface in the network with the greatest utilization. The standard set of metrics exported by the Host sFlow agent include interface counters as well as host cpu, memory, disk and service performance metrics. Metrics lists the set of available metrics.

The included Flow Browser application provides an up to the second view traffic flows. Defining Flows describes the fields that can be used to break out traffic. 

Note: The NVIDIA Spectrum 2/3 ASIC includes packet transit delay, selected queue and queue depth with each sampled packet. This information is delivered via the Linux PSAMPLE netlink channel to the Host sFlow agent and included in the sFlow telemetry. These fields are accessible when defining flows in sFlow-RT. See Transit delay and queueing for details.

The included Discard Browser is used to explore packets that are being dropped in the network.

Note: The NVIDIA Spectrum 2/3 ASIC includes instrumentation to capture dropped packets and the reason they were dropped. The information is delivered via the Linux drop_monitor netlink channel to the Host sFlow agent and included in the sFlow telemetry. See Real-time trending of dropped packets for more information.

The included Prometheus application exports metrics to the Prometheus time series database where they can be used to drive Grafana dashboards (e.g. sFlow-RT Countries and Networks, sFlow-RT Health, and sFlow-RT Network Interfaces).

Linux as a network operating system is an exciting advancement if you are interested in simplifying network and system management. Using the Linux networking APIs as a common abstraction layer on servers and switches makes it possible to manage network and compute infrastructure as a unified system.