Tuesday, May 12, 2020

Real-time network and system metrics as a service

The sFlow-RT real-time analytics engine receives industry standard sFlow telemetry as a continuous stream from network and host devices and coverts the raw data into useful measurements that can be be queried through a REST API. A single sFlow-RT instance can monitor the entire data center, providing a comprehensive view of performance, not just of the individual components, but of the data center as a whole.

This article is an interactive tutorial intended to familiarize the reader with the REST API. The examples can be run on a laptop using recorded data so that access to a live network is not required.

The data was captured from the leaf and spine test network shown above (described in Fabric View).
curl -O https://raw.githubusercontent.com/sflow-rt/fabric-view/master/demo/ecmp.pcap
First, download the captured sFlow data.

You will need to have a system with Java or Docker to run the sFlow-RT software.
curl -O https://inmon.com/products/sFlow-RT/sflow-rt.tar.gz
tar -xzf sflow-rt.tar.gz
./sflow-rt/get-app.sh sflow-rt browse-metrics
./sflow-rt/get-app.sh sflow-rt browse-flows
./sflow-rt/get-app.sh sflow-rt prometheus
./sflow-rt/start.sh -Dsflow.file=$PWD/ecmp.pcap
The above commands download and run sFlow-RT, with browse-metrics, browse-flows, and prometheus applications on a system with Java 1.8+ installed.
docker run --rm -v $PWD/ecmp.pcap:/sflow-rt/ecmp.pcap \
-p 8008:8008 --name sflow-rt sflow/prometheus -Dsflow.file=ecmp.pcap
Alternatively, the above command runs sFlow-RT and applications using Docker.
The REST API is documented using OpenAPI. Use a web browser to access the REST API explorer at http://localhost:8008/api/index.html.

Each of the examples can be run in a terminal window using curl, or you can simply click on the link to see the results in your web browser.
Measurements are represented by sFlow-RT in the form of a logical table. Each agent is a device on the network and is uniquely identified by an IP address. Each agent may have one or more datasources that represent a logical source of measurements. For example, a network switch will have a data source for each port.
curl http://localhost:8008/agents/json
List the sFlow agents.
curl http://localhost:8008/metrics/json
List the names of the metrics being exported by the agents. The available metrics depend on the types of agent streaming data to sFlow-RT. For a list of supported sFlow metrics, see Metrics.
curl http://localhost:8008/metric/ALL/max:ifinutilization,max:ifoututilization/json
Find the switch ports with the highest input and output utilization.

The metric query walks the table and returns a value that summarizes each metric in a comma separated list. The following summary statistics are supported:
  • max: Maximum value
  • min: Smallest value
  • sum: Total value
  • avg: Average value
  • var: Variance
  • sdev: Standard deviation
  • med: Median value
  • q1: First quartile
  • q2: Second quartile (same as med:)
  • q3: Third quartile
  • iqr: Inter-quartile range (i.e. q3 - q1)
The browse-metrics application makes use of the metric REST API and can be used to query and trend metrics.
Click on the link below to plot a graph of the switch port with the highest input utilization (screen capture shown above):
http://localhost:8008/app/browse-metrics/html/index.html?metric=ifinutilization
The following examples show how to retrieve metric values without summarization.
curl http://localhost:8008/table/ALL/ifinutilization,ifoututilization/json
Get a table of input and output utilization for every switch port. The table query doesn't summarize metrics. Instead, the query returns rows from the logical table that include the metrics specified in the query.
curl http://localhost:8008/dump/ALL/ALL/json
Dump all metric values for all agents. The dump query is similar to the table query, but instead of walking the table row by row, individual metrics are traversed in their internal order.
curl http://localhost:8008/prometheus/metrics/ALL/ALL/txt
Dump all metric values in Prometheus Exporter format. For example, the Grafana sFlow-RT Network Interfaces dashboard makes use of the query to populate the Prometheus time series database.

There are two types of measurement carried by sFlow: periodically exported counters and randomly sampled packets. So far the examples have been querying metrics derived from the counters.
curl http://localhost:8008/flowkeys/json
Get the list of attributes that are being extracted from sampled packet headers. The available attributes depend on the type of traffic flowing in the network. For a list of supported packet attributes, see Defining Flows.
curl -H "Content-Type:application/json" -X PUT \
--data '{"keys":"ipsource,ipdestination",value:"bytes"}' \
http://localhost:8008/flow/srcdst/json
Define a new "flow" metric called srcdst that calculates the bytes per second between each pair of communicating IP addresses on the network.
curl http://localhost:8008/metric/ALL/max:srcdst/json
Find the maximum value of the newly defined srcdst flow metric, i.e. the switch port on the network observing the highest bandwidth flow of packets.
[{
 "agent": "192.168.0.12",
 "metricName": "max:srcdst",
 "topKeys": [
  {
   "lastUpdate": 1274,
   "value": 3.392739739066506E8,
   "key": "10.4.3.2,10.4.4.2"
  },
  {
   "lastUpdate": 2352,
   "value": 2.155296894816872E8,
   "key": "10.4.1.2,10.4.2.2"
  }
 ],
 "metricN": 10,
 "lastUpdate": 1275,
 "lastUpdateMax": 2031,
 "metricValue": 3.392739739066506E8,
 "dataSource": "4",
 "lastUpdateMin": 1267
}]
In addition to providing a metric value, the result also includes topKeys, showing the top flows seen at the switch port.
Click on the link below to trend the srcdst metric (screen capture shown above):
http://localhost:8008/app/browse-metrics/html/index.html?metric=srcdst
There are additional queries specific to flow metrics.
curl http://localhost:8008/activeflows/ALL/srcdst/json
Find the largest flows gathered from all the interfaces in the network.
[
 {
  "flowN": 7,
  "agent": "192.168.0.12",
  "value": 5.537867023642346E8,
  "dataSource": "4",
  "key": "10.4.1.2,10.4.2.2"
 },
 {
  "flowN": 6,
  "agent": "192.168.0.11",
  "value": 5.1034569007443213E8,
  "dataSource": "38",
  "key": "10.4.3.2,10.4.4.2"
 },
 {
  "flowN": 6,
  "agent": "192.168.0.11",
  "value": 1469003.6788768284,
  "dataSource": "4",
  "key": "10.4.4.2,10.4.3.2"
 },
 {
  "flowN": 7,
  "agent": "192.168.0.12",
  "value": 1306006.2405022713,
  "dataSource": "37",
  "key": "10.4.2.2,10.4.1.2"
 }
]
Each flow returned identifies the number of locations it was observed and the port with the maximum value. For example, the largest flow from 10.4.1.2 to 10.4.2.2 was seen by 7 data sources and its maximum value 5.5e8 was observed by data source 4 on agent 192.168.0.12.
Click on the link below to plot a graph of the top flows using the browse-flows application (screen capture shown above):
http://localhost:8008/app/browse-flows/html/index.html?keys=ipsource,ipdestination&value=bps
Note how quickly the graph changes as it tracks new elephant flows in real time.

See RESTflow for a more detailed discussion of sFlow-RT's flow REST API.

This tutorial has just scratches the surface of the capabilities of sFlow-RT's analytics engine. The Writing Applications tutorial provides further examples and a discussion of how to build applications using Python and JavaScript, see Real-time DDoS mitigation using BGP RTBH and FlowSpecFabric View and Flow metrics with Prometheus and Grafana for examples of sFlow-RT applications.

Seeing your own data is more interesting than a canned demonstration. Network Equipment lists devices that support sFlow. Ubuntu 18.04 and CentOS 8 describe how to install the open source Host sFlow agent on popular Linux distributions, extending visibility into compute and cloud infrastructure. The Host sFlow agent is also available as a Docker image for easy deployment with container orchestration systems, see Host, Docker, Swarm and Kubernetes monitoring.

Even if you don't have access to a production environment, the Docker testbed and Kubernetes testbed examples show how to build a virtual testbed using Docker Desktop. Alternatively, Mininet flow analytics and Mininet dashboard provide starting points if you want to experiment with software defined networking (SDN).

Finally, join the sFlow-RT community to ask questions and share solutions and operational experience.

Tuesday, May 5, 2020

NVIDIA, Mellanox, and Cumulus

Recent press releases, Riding a Cloud: NVIDIA Acquires Network-Software Trailblazer Cumulus and NVIDIA Completes Acquisition of Mellanox, Creating Major Force Driving Next-Gen Data Centers, describe NVIDIA's moves to provide high speed data center networks to connect compute clusters that use of their GPUs to accelerate big data workloads, including: deep learning, climate modeling, animation, data visualization, physics, molecular dynamics etc.

Real-time visibility into compute, network, and GPU infrastructure is required manage and optimize the unified infrastructure. This article explores how the industry standard sFlow technology supported by all three vendors can deliver comprehensive visibility.

Cumulus Linux simplifies operations, providing the same operating system, Linux, that runs on the servers. Cumulus Networks and Mellanox have a long history of working with the Linux community to integrate support for switches. The latest Linux kernels now include native support for network ASICs, seamlessly integrating with standard Linux routing (FRR, Quagga, Bird, etc), configuration (Puppet, Chef, Ansible, etc) and monitoring (collectd, netstat, top, etc) tools.

Linux 4.11 kernel extends packet sampling support describes enhancements to the Linux kernel to support industry standard sFlow instrumentation in network ASICs. Cumulus Linux and Mellanox both support the new Linux APIs. Cumulus Linux uses the open source Host sFlow agent to stream telemetry gathered from the hardware, Linux operating system, and applications to a remote collector.

Ubuntu 18.04 and CentOS 8 describe how to install the Host sFlow agent on popular host Linux distributions. The Host sFlow agent is also available as a Docker image for easy deployment with container orchestration systems, see Host, Docker, Swarm and Kubernetes monitoring. Extending network visibility to the host allows network traffic to be associated with applications running on the host as well as providing details about the resources consumed by the applications and the network quality of service being delivered to the applications.

The Host sFlow agent also supports the sFlow NVML GPU Structures extension to export key metrics from NVIDIA GPUs using the NVIDIA Management Library (NVML), see GPU performance monitoring.

Enabling sFlow across the network, compute, and GPU stack provides a real-time, data center wide, view of performance. The sFlow-RT real-time analytics engine offers a convenient method of integrating sFlow analytics with popular orchestration, DevOps and SDN tools, examples include: Cumulus Networks, sFlow and data center automationFlow metrics with Prometheus and GrafanaECMP visibility with Cumulus LinuxFabric View, and Troubleshooting connectivity problems in leaf and spine fabrics.

Friday, April 24, 2020

Monitoring DDoS mitigation

Real-time DDoS mitigation using BGP RTBH and FlowSpec and Pushing BGP Flowspec rules to multiple routers describe how to deploy the ddos-protect application. This article focuses on how to monitor DDoS activity and control actions.

The diagram shows the elements of the solution. Routers stream standard sFlow telemetry to an instance of the sFlow-RT real-time analytics engine running the ddos-protect application. The instant a DDoS attack is detected, RTBH and / or Flowspec actions are pushed via BGP to the routers to mitigate the attack. Key metrics are published using the Prometheus exporter format over HTTP and events are sent using the standard syslog protocol.
The sFlow-RT DDoS Protect dashboard, shown above, makes use of the Prometheus time series database and the Grafana metrics visualization tool to track DDoS attack mitigation actions.
The sFlow-RT Countries and Networks dashboard, shown above, breaks down traffic by origin network and country to provide an indication of the source of attacks.  Flow metrics with Prometheus and Grafana describes how to build additional dashboards to provide additional insight into network traffic.
In this example, syslog events are directed to an Elasticsearch, Logstash, and Kibana (ELK) stack where they are archived, queried, and analyzed. Grafana can be used to query Elasticsearch to incorporate event data in dashboards. The Grafana dashboard example above trends DDoS events and displays key information in a table below.

The tools demonstrated in this article are not the only ones that can be used. If you already have monitoring for your infrastructure then it makes sense to leverage the existing tools rather than stand up a new monitoring system. Syslog events are a standard that are widely supported by on-site (e.g. Splunk) and cloud based (e.g. Solarwinds Loggly) SIEM tools. Similarly, the Prometheus metrics export protocol widely supported (e.g. InfluxDB).

Wednesday, April 15, 2020

Pushing BGP Flowspec rules to multiple routers

Real-time DDoS mitigation using BGP RTBH and Flowspec describes the open source DDoS Protect application. The software runs on the sFlow-RT real-time analytics engine, which receives industry standard sFlow telemetry from routers and pushes controls using BGP. A recent enhancement to the application pushes controls to multiple routers in order to protect networks with redundant edge routers.
ddos_protect.router=10.0.0.96,10.0.0.97
Configuring multiple BGP connections is simple, the ddos_protect.router configuration option has been extended to accept a comma separated list of IP addresses for the routers that will be connecting to the controller.
Alternatively, a BGP Flowspec/RTBH reflector can be used to propagate the controls. Flowspec is a recent addition to open source BGP software, FRR and Bird, and it should be possible to use this software to reflect Flowspec controls. A reflector can be a useful place to implement policies that direct controls to specific enforcement devices.

Support for multiple BGP connections in the DDoS Protect application reduces the complexity of simple deployments by removing the requirement for a reflector. Controls are pushed to all devices, but differentiated policies can still be implemented by configuring each device's response to controls.

Tuesday, March 24, 2020

Kubernetes testbed

The sFlow-RT real-time analytics platform receives a continuous telemetry stream from sFlow Agents embedded in network devices, hosts and applications and converts the raw measurements into actionable metrics, accessible through open APIs, see Writing Applications.

Application development is greatly simplified if you can emulate the infrastructure you want to monitor on your development machine. Docker testbed describes a simple way to develop sFlow based visibility solutions. This article describes how to build a Kubernetes testbed to develop and test configurations before deploying solutions into production.
Docker Desktop provides a convenient way to set up a single node Kubernetes cluster, just select the Enable Kubernetes setting and click on Apply & Restart.

Create the following sflow-rt.yml file:
apiVersion: v1
kind: Service
metadata:
  name: sflow-rt-sflow
spec:
  type: NodePort
  selector:
    name: sflow-rt
  ports:
    - protocol: UDP
      port: 6343
---
apiVersion: v1
kind: Service
metadata:
  name: sflow-rt-rest
spec:
  type: LoadBalancer
  selector:
    name: sflow-rt
  ports:
    - protocol: TCP
      port: 8008
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sflow-rt
spec:
  replicas: 1
  selector:
    matchLabels:
      name: sflow-rt
  template:
    metadata:
      labels:
        name: sflow-rt
    spec:
      containers:
      - name: sflow-rt
        image: sflow/prometheus:latest
        ports:
          - name: http
            protocol: TCP
            containerPort: 8008
          - name: sflow
            protocol: UDP
            containerPort: 6343
Run the following command to deploy the service:
kubectl apply -f sflow-rt.yml
Now create the following host-sflow.yml file:
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: host-sflow
spec:
  selector:
    matchLabels:
      name: host-sflow
  template:
    metadata:
      labels:
        name: host-sflow
    spec:
      restartPolicy: Always
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: host-sflow
        image: sflow/host-sflow:latest
        env:
          - name: COLLECTOR
            value: "sflow-rt-sflow"
          - name: SAMPLING
            value: "10"
          - name: NET
            value: "flannel"
        volumeMounts:
          - mountPath: /var/run/docker.sock
            name: docker-sock
            readOnly: true
      volumes:
        - name: docker-sock
          hostPath:
            path: /var/run/docker.sock
Run the following command to deploy the service:
kubectl apply -f host-sflow.yml
In this case, there is only one node, but the command will deploy an instance of Host sFlow on every node in a Kubernetes cluster to provide a comprehensive view of network, server, and application performance.

Note: The single node Kubernetes cluster uses the Flannel plugin for Cluster Networking. Setting the sflow/host-sflow environment variable NET to flannel instruments the cni0 bridge used by Flannel to connect Kubernetes pods. The NET and SAMPLING settings will likely need to be changed when pushing the configuration into a production environment, see sflow/host-sflow for options.

Run the following command to verify that the Host sFlow and sFlow-RT pods are running:
kubectl get pods
The following output:
NAME                        READY   STATUS    RESTARTS   AGE
host-sflow-lp4db            1/1     Running   0          34s
sflow-rt-544bff645d-kj4km   1/1     Running   0          21h
The following command displays the network services:
kubectl get services
Generating the following output:
NAME             TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
kubernetes       ClusterIP      10.96.0.1       <none>        443/TCP          13d
sflow-rt-rest    LoadBalancer   10.110.89.167   localhost     8008:31317/TCP   21h
sflow-rt-sflow   NodePort       10.105.87.169   <none>        6343:31782/UDP   21h
Access to the sFlow-RT REST API is available via localhost port 8008.
The sFlow-RT web interface confirms that telemetry is being received from 1 sFlow agent (the Host sFlow instance monitoring the Kubernetes node).
ab -c 4 -n 10000 -b 500 -l http://10.0.0.73:8008/dump/ALL/ALL/json
The command above uses the ab - Apache HTTP server benchmarking tool to generate network traffic by repeatedly querying the sFlow-RT instance using the Kubernetes node IP address (10.0.0.73).
The screen capture above shows the sFlow-RT Flow Browser application reporting traffic in real-time.
#!/usr/bin/env python
import requests

requests.put(
  'http://10.0.0.73:8008/flow/elephant/json',
  json={'keys':'ipsource,ipdestination', 'value':'bytes'}
)
requests.put(
  'http://10.0.0.73:8008/threshold/elephant_threshold/json',
  json={'metric':'elephant', 'value': 10000000/8, 'byFlow':True, 'timeout': 1}
)
eventurl = 'http://10.0.0.73:8008/events/json'
eventurl += '?thresholdID=elephant_threshold&maxEvents=10&timeout=60'
eventID = -1
while 1 == 1:
  r = requests.get(eventurl + '&eventID=' + str(eventID))
  if r.status_code != 200: break
  events = r.json()
  if len(events) == 0: continue

  eventID = events[0]['eventID']
  events.reverse()
  for e in events:
    print(e['flowKey'])
The above elephant.py script is modified from the version in Docker testbed to reference the Kubernetes node IP address (10.0.0.73).
./elephant.py     
10.1.0.72,192.168.65.3
The output above is generated immediately when traffic is generated using the ab command. The IP addresses correspond to those displayed in the Flow Browser chart.
curl http://10.0.0.73:8008/prometheus/metrics/ALL/ALL/txt
Run the above command to metrics from the Kubernetes cluster exported using Prometheus export format.

This article was focussed on using Docker Desktop to move sFlow real-time analytics solutions into a Kubernetes production environment. Docker testbed describes how to use Docker Desktop to create an environment to develop the applications.

Thursday, March 19, 2020

SFMIX San Francisco shelter in place

A shelter in place order restricted San Francisco residents to their homes beginning at 12:01 a.m. on March 17, 2020. Many residents work for Bay Area technology companies such as Salesforce, Facebook, Twitter, Google, Netflix and Apple. Employees from these companies are able to, and have been instructed to, work remotely from their homes. In addition, other housebound residents are making use of social networking to keep in touch with friends and family as well as streaming media and online gaming for entertainment.

The traffic trend chart above from the San Francisco Metropolitan Internet Exchange (SFMIX) shows the change in network traffic that has resulted from the shelter in place order. Peak traffic has increased by around 10Gbit/s (a 25% increase) and continues throughout the day (whereas peaks previously occurred in the evenings).

The SFMIX network directly connects a number of data centers in the Bay Area and the member organizations that peer from those data centers.  Peering through the exchange network keeps traffic local by directly connecting companies with their employees and customers and avoiding potentially congested service provider networks.
SFMIX recently finished a network upgrade to 100Gbit/s Arista switches and all fiber optic connections and so is easily able to handle the increase in traffic.

Network visibility is critical to being able to quickly respond to unexpected changes in network usage. The sFlow measurement technology built into high speed switches is a standard method of monitoring Internet Exchanges (IXs) - Internet Exchange (IX) Metrics is a measurement tool developed with SFMIX.

Every organization depends on their networks and visibility is critical to manage the challenges posed by a rapidly changing environment. sFlow is an industry standard that is widely implemented by network vendors.  Enable sFlow telemetry from existing network equipment and deploy an sFlow analytics tool to gain visibility into your network traffic. sFlowTrend is a free tool that can be installed and running in minutes. Flow metrics with Prometheus and Grafana describes how to integrate network traffic visibility into existing operational dashboards.

Thursday, March 12, 2020

Ubuntu 18.04

Ubuntu 18.04 comes with Linux kernel version 4.15. This version of the kernel includes efficient in-kernel packet sampling that can be used to provide network visibility for production servers running network heavy workloads, see Berkeley Packet Filter (BPF).
This article provides instructions for installing and configuring the open source Host sFlow agent to remotely monitor servers using the industry standard sFlow protocol. The sFlow-RT real-time analyzer is used to demonstrate the capabilities of sFlow telemetry.

Find the latest Host sFlow version on the Host sFlow download page.
wget https://github.com/sflow/host-sflow/releases/download/v2.0.25-3/hsflowd-ubuntu18_2.0.25-3_amd64.deb
sudo dpkg -i hsflowd-ubuntu18_2.0.25-3_amd64.deb
sudo systemctl enable hsflowd
The above commands download and install the software.
sflow {
  collector { ip=10.0.0.30 }
  pcap { speed=1G-1T }
  tcp { }
  systemd { }
}
Edit the /etc/hsflowd.conf file. The above example sends sFlow to a collector at 10.0.0.30, enables packet sampling on all network adapters, adds TCP performance information, and exports metrics for Linux services. See Configuring Host sFlow for Linux for the complete set of configuration options.
sudo systemctl restart hsflowd
Restart the Host sFlow daemon to start streaming telemetry to the collector.
sflow {
  dns-sd { domain=.sf.inmon.com }
  pcap { speed=1G-1T }
  tcp { }
  systemd { }
}
Alternatively, if you have control over a DNS domain, you can use DNS SRV records to advertise sFlow collector address(es). In the above example, the sf.inmon.com domain will be queried for collectors.
sflow-rt          A       10.0.0.30
_sflow._udp   60  SRV     0 0 6343  sflow-rt
The above entries from the sf.inmon.com zone file directs the sFlow to sflow-rt.sf.inmon.com (10.0.0.30). If you change the collector, all Host sFlow agents will pick up the change within 60 seconds (the DNS time to live specified in the SRV entry).

Now that the Host sFlow agent has been configured, it's time to install an sFlow collector on server 10.0.0.30, which we will assume is also running Ubuntu 18.04.

First install Java.
sudo apt install openjdk-11-jre-headless
Next, install the latest version of sFlow-RT along with browse-metrics, browse-flows and prometheus applications.
LATEST=`wget -qO - https://inmon.com/products/sFlow-RT/latest.txt`
wget https://inmon.com/products/sFlow-RT/sflow-rt_$LATEST.deb
sudo dpkg -i sflow-rt_$LATEST.deb
sudo /usr/local/sflow-rt/get-app.sh sflow-rt browse-metrics
sudo /usr/local/sflow-rt/get-app.sh sflow-rt browse-flows
sudo /usr/local/sflow-rt/get-app.sh sflow-rt prometheus
sudo systemctl enable sflow-rt
sudo systemctl start sflow-rt
Finally, allow sFlow and HTTP requests through the firewall.
sudo ufw allow 6343/udp
sudo ufw allow 8008/tcp
System Properties describes configuration options that can be set in the /usr/local/sflow-rt/conf.d/sflow-rt.conf file. See Download and install for instructions on securing access to sFlow-RT as well as links to additional applications.
Use a web browser to connect to http://10.0.0.30:8008/ to access the sFlow-RT web interface (shown above). The Status page confirms that sFlow is being received.
curl http://10.0.0.30:8008/prometheus/metrics/ALL/ALL/txt
The above command retrieves metrics for all the hosts in Prometheus export format.

Configure Prometheus or InfluxDB to periodically retrieve and store metrics. The following examples demonstrate the use of Grafana to query a Prometheus database to populate dashboards: sFlow-RT Network Interfaces, sFlow-RT Countries and Networks, and sFlow-RT Health.

Tuesday, March 10, 2020

Docker testbed

The sFlow-RT real-time analytics platform receives a continuous telemetry stream from sFlow Agents embedded in network devices, hosts and applications and converts the raw measurements into actionable metrics, accessible through open APIs, see Writing Applications.

Application development is greatly simplified if you can emulate the infrastructure you want to monitor on your development machine. Mininet flow analyticsMininet dashboard, and Mininet weathermap describe how to use the open source Mininet network emulator to simulate networks and generate a live stream of standard sFlow telemetry data.

This article describes how to use Docker containers as a development platform. Docker Desktop provides a convenient method of running Docker on Mac and Windows desktops. These instructions assume you have already installed Docker.

First, find your host address (e.g. hostname -I, ifconfig en0, etc. depending on operating system), then open a terminal window and set the shell variable MY_IP:
MY_IP=10.0.0.70
Start a Host sFlow agent using the pre-built sflow/host-sflow image:
docker run --rm -d -e "COLLECTOR=$MY_IP" -e "SAMPLING=10" \
--net=host -v /var/run/docker.sock:/var/run/docker.sock:ro \
--name=host-sflow sflow/host-sflow
Note: Host, Docker, Swarm and Kubernetes monitoring describes how to deploy Host sFlow agents to monitor large scale container environments.

Start an iperf3 server using the pre-built sflow/iperf3 image:
docker run --rm -d -p 5201:5201 --name iperf3 sflow/iperf3 -s
In a separate terminal window, run the following command to start sFlow-RT:
docker run --rm -p 8008:8008 -p 6343:6343/udp --name sflow-rt sflow/prometheus
Note: The sflow/prometheus image is based on sflow/sflow-rt, adding applications for browsing and exporting sFlow analytics. The sflow/sflow-rt page provides instructions for packaging your own applications with sFlow-RT.

It is helpful to run sFlow-RT in foreground during development so that you can see the log messages:
2020-03-09T23:31:23Z INFO: Starting sFlow-RT 3.0-1477
2020-03-09T23:31:23Z INFO: Version check, running latest
2020-03-09T23:31:24Z INFO: Listening, sFlow port 6343
2020-03-09T23:31:24Z INFO: Listening, HTTP port 8008
2020-03-09T23:31:24Z INFO: DNS server 192.168.65.1
2020-03-09T23:31:24Z INFO: app/prometheus/scripts/export.js started
2020-03-09T23:31:24Z INFO: app/browse-flows/scripts/top.js started
The web user interface can be accessed at http://localhost:8008/.
The sFlow Agents count (at the top left) verifies that sFlow is being received from the Host sFlow agent. Access the pre-installed Browse Flows application at http://localhost:8008/app/browse-flows/html/index.html?keys=ipsource%2Cipdestination&value=bps
Run the following command in the original terminal to initiate a test and generate traffic:
docker run --rm sflow/iperf3 -c $MY_IP
You should immediately see a spike in traffic like that shown in the Flow Browser screen capture. See RESTflow for an overview of the sFlow-RT flow analytics architecture and  Defining Flows for a detailed description of the options available when defining flows.

The ability to rapidly detect and act on traffic flows addresses many important challenges, for example: Real-time DDoS mitigation using BGP RTBH and FlowSpecTriggered remote packet capture using filtered ERSPAN, Exporting events using syslogBlack hole detection. and Troubleshooting connectivity problems in leaf and spine fabrics.

The following elephant.js script uses the embedded JavaScript API to detect and log the start of flows greater than 10Mbits/second:
setFlow('elephant',
  {keys:'ipsource,ipdestination',value:'bytes'});
setThreshold('elephant_threshold',
  {metric:'elephant', value: 10000000/8, byFlow: true, timeout: 1});
setEventHandler(function(evt)
  { logInfo(evt.flowKey); }, ['elephant_threshold']);
Use control+c to stop the sFlow-RT instance and run the following command to include the elephant.js script:
docker run --rm -v $PWD/elephant.js:/sflow-rt/elephant.js \
-p 8008:8008 -p 6343:6343/udp --name sflow-rt \
sflow/prometheus -Dscript.file=elephant.js
Run the iperf3 test again and you should immediately see the flows logged:
2020-03-10T05:30:15Z INFO: Starting sFlow-RT 3.0-1477
2020-03-10T05:30:16Z INFO: Version check, running latest
2020-03-10T05:30:16Z INFO: Listening, sFlow port 6343
2020-03-10T05:30:17Z INFO: Listening, HTTP port 8008
2020-03-10T05:30:17Z INFO: DNS server 192.168.65.1
2020-03-10T05:30:17Z INFO: elephant.js started
2020-03-10T05:30:17Z INFO: app/browse-flows/scripts/top.js started
2020-03-10T05:30:17Z INFO: app/prometheus/scripts/export.js started
2020-03-10T05:30:25Z INFO: 172.17.0.4,192.168.1.242
2020-03-10T05:30:26Z INFO: 172.17.0.1,172.17.0.2
Alternatively, the following elephant.py script used the REST API to perform the same function:
#!/usr/bin/env python
import requests

requests.put(
  'http://localhost:8008/flow/elephant/json',
  json={'keys':'ipsource,ipdestination', 'value':'bytes'}
)
requests.put(
  'http://localhost:8008/threshold/elephant_threshold/json',
  json={'metric':'elephant', 'value': 10000000/8, 'byFlow':True, 'timeout': 1}
)
eventurl = 'http://localhost:8008/events/json'
eventurl += '?thresholdID=elephant_threshold&maxEvents=10&timeout=60'
eventID = -1
while 1 == 1:
  r = requests.get(eventurl + '&eventID=' + str(eventID))
  if r.status_code != 200: break
  events = r.json()
  if len(events) == 0: continue

  eventID = events[0]['eventID']
  events.reverse()
  for e in events:
    print(e['flowKey'])
Run the Python script and run another iperf3 test:
./elephant.py 
172.17.0.4,192.168.1.242
172.17.0.1,172.17.0.2
Another option is to replay sFlow telemetry captured in the form of a PCAP file. The Fabric View application contains an example that can be extracted:
curl -O https://raw.githubusercontent.com/sflow-rt/fabric-view/master/demo/ecmp.pcap
Now run sFlow-RT:
docker run --rm -v $PWD/ecmp.pcap:/sflow-rt/sflow.pcap \
-p 8008:8008 --name sflow-rt sflow/prometheus -Dsflow.file=sflow.pcap
Run the elephant.py script:
./elephant.py                                                            
10.4.1.2,10.4.2.2
10.4.1.2,10.4.2.2
Note: Fabric View contains a detailed description of the captured data.

Data from a production network can be captured using tcpdump:
tcpdump -i eth0 -s 0 -c 10000 -w sflow.pcap udp port 6343
For example, the above command captures 10000 sFlow datagrams (UDP port 6343) from Ethernet interface eth0 and stores them in file sflow.pcap

The sFlow-RT analytics engine converts raw sFlow telemetry into useful metrics that can be imported into a time series database.
curl http://localhost:8008/prometheus/metrics/ALL/ALL/txt
The above command retrieves metrics for all the hosts in Prometheus export format. Prometheus exporter describes how run the Prometheus time series database and build Grafana dashboards using metrics retrieved from sFlow-RT. Flow metrics with Prometheus and Grafana extends the example to include packet flow data. InfluxDB 2.0 shows how to import and query metrics using InfluxDB.

It only takes a few minutes to try out these examples. Work through Writing Applications to learn about the capabilities of the sFlow-RT analytics engine and how to package applications. Publish applications on GitHub and use the sflow/sflow-rt Docker image to deploy them in production. Join the sFlow-RT Community to ask questions and post information new applications.    

Friday, March 6, 2020

CentOS 8

CentOS 8 / RHEL 8 come with Linux kernel version 4.18. This version of the kernel includes efficient in-kernel packet sampling that can be used to provide network visibility for production servers running network heavy workloads, see Berkeley Packet Filter (BPF).
This article provides instructions for installing and configuring the open source Host sFlow agent to remotely monitor servers using the industry standard sFlow protocol. The sFlow-RT real-time analyzer is used to demonstrate the capabilities of sFlow telemetry.

Find the latest Host sFlow version on the Host sFlow download page.
wget https://github.com/sflow/host-sflow/releases/download/v2.0.26-3/hsflowd-centos8-2.0.26-3.x86_64.rpm
sudo rpm -i hsflowd-centos8-2.0.26-3.x86_64.rpm
sudo systemctl enable hsflowd
The above commands download and install the software.
sflow {
  collector { ip=10.0.0.30 }
  pcap { speed=1G-1T }
  tcp { }
  systemd { }
}
Edit the /etc/hsflowd.conf file. The above example sends sFlow to a collector at 10.0.0.30, enables packet sampling on all network adapters, adds TCP performance information, and exports metrics for Linux services. See Configuring Host sFlow for Linux for the complete set of configuration options.
sudo systemctl restart hsflowd
Restart the Host sFlow daemon to start streaming telemetry to the collector.
sflow {
  dns-sd { domain=.sf.inmon.com }
  pcap { speed=1G-1T }
  tcp { }
  systemd { }
}
Alternatively, if you have control over a DNS domain, you can use DNS SRV records to advertise sFlow collector address(es). In the above example, the sf.inmon.com domain will be queried for collectors.
sflow-rt          A       10.0.0.30
_sflow._udp   60  SRV     0 0 6343  sflow-rt
The above entries from the sf.inmon.com zone file directs the sFlow to sflow-rt.sf.inmon.com (10.0.0.30). If you change the collector, all Host sFlow agents will pick up the change within 60 seconds (the DNS time to live specified in the SRV entry).

Now that the Host sFlow agent has been configured, it's time to install an sFlow collector on server 10.0.0.30, which we will assume is also running CentOS 8.

First install Java.
sudo yum install java-11-openjdk-headless
Next, install the latest version of sFlow-RT along with browse-metrics, browse-flows and prometheus applications.
LATEST=`wget -qO - https://inmon.com/products/sFlow-RT/latest.txt`
wget https://inmon.com/products/sFlow-RT/sflow-rt-$LATEST.noarch.rpm
sudo rpm -i sflow-rt-$LATEST.noarch.rpm
sudo /usr/local/sflow-rt/get-app.sh sflow-rt browse-metrics
sudo /usr/local/sflow-rt/get-app.sh sflow-rt browse-flows
sudo /usr/local/sflow-rt/get-app.sh sflow-rt prometheus
sudo systemctl enable sflow-rt
sudo systemctl start sflow-rt
Finally, allow sFlow and HTTP requests through the firewall.
sudo firewall-cmd --permanent --zone=public --add-port=6343/udp
sudo firewall-cmd --permanent --zone=public --add-port=8008/tcp
sudo firewall-cmd --reload
System Properties describes configuration options that can be set in the /usr/local/sflow-rt/conf.d/sflow-rt.conf file. See Download and install for instructions on securing access to sFlow-RT as well as links to additional applications.
Use a web browser to connect to http://10.0.0.30:8008/ to access the sFlow-RT web interface (shown above). The Status page confirms that sFlow is being received.
curl http://10.0.0.30:8008/prometheus/metrics/ALL/ALL/txt
The above command retrieves metrics for all the hosts in Prometheus export format.

Configure Prometheus or InfluxDB to periodically retrieve and store metrics. The following examples demonstrate the use of Grafana to query a Prometheus database to populate dashboards: sFlow-RT Network InterfacessFlow-RT Countries and Networks, and sFlow-RT Health.

Wednesday, February 19, 2020

SONiC

SONiC is part of the Open Compute Project (OCP), creating "an open source network operating system based on Linux that runs on switches from multiple vendors and ASICs." The latest SONiC.201911 release of the open source SONiC network operating system adds sFlow support.
SONiC: sFlow High Level Design
The diagram shows the elements of the implementation.
  1. The open source Host sFlow agent running in the sFlow container monitors the Redis database (in the Database container) for sFlow related configuration changes.
  2. The syncd container monitors the configuration database and pushes hardware settings (packet sampling) to the ASIC using the SAI (Switch Abstraction Inteface) driver (see SAI 1.5).
  3. The ASIC driver hands sampled packet headers and associated metadata captured by the ASIC to user space via the Linux PSAMPLE netlink channel (see Linux 4.11 kernel extends packet sampling support).
  4. The Host sFlow agent receives the PSAMPLE messages and forwards them to configured sFlow collector(s) as standard sFlow packet samples.
  5. In addition, the Host sFlow agent streams telemetry (interface counters and host metrics gathered from the Redis database and Linux kernel) to the collector(s) as standard sFlow counter records.
The following CLI commands enable sFlow on all switch ports and send to sFlow collector 10.0.0.70:
sflow collector add 10.0.0.70
sflow polling-interval 30
sflow interface enable all
sflow enable
The open source sflowtool command line utility is a simple way of confirming that sFlow is being received at the collector host (10.0.0.70). The Docker sflow/sflowtool image provides a convenient method of running sflowtool:
docker run -p 6343:6343/udp sflow/sflowtool
If sFlow is arriving at the host, you will see the contents of the sFlow records printed in the console.
Running the Docker sflow/prometheus image exposes network-wide telemetry and flow data in the Prometheus export format so that it can be imported into a time series database:
docker run -p 8008:8008 -p 6343:6343/udp -d sflow/prometheus
See InfluxDB 2.0, Prometheus exporter and Flow metrics with Prometheus and Grafana for examples.

The Prometheus exporter is an application running on the sFlow-RT real-time analytics engine, which has the scaleability to handle large SONiC data center deployments. In addition to delivering metrics, sFlow-RT applications are available that address trouble-shooting, traffic engineering, DDoS mitigation, and security challenges of managing the large scale leaf and spine networks where SONiC is typically deployed.

Saturday, February 15, 2020

Real-time DDoS mitigation using BGP RTBH and FlowSpec

DDoS Protect is a recently released open source application running on the sFlow-RT real-time analytics engine. The software uses streaming analytics to rapidly detect and characterize DDoS flood attacks and automatically applies BGP remote triggered black hole (RTBH) and/or FlowSpec controls to mitigate their impact. The total time to detect and mitigate an attack is in the order of a second.

The combination of multi-vendor standard telemetry (sFlow) and control (BGP FlowSpec) provide the real-time visibility and control needed to quickly and automatically adapt the network to address a range of challenging problems, including: DDoS, traffic engineering, and security.

Solutions are deployable today: Arista BGP FlowSpec describes the recent addition of BGP FlowSpec support to Arista EOS (EOS has long supported sFlow), and sFlow available on Juniper MX series routers describes the release of sFlow support on Juniper MX routers (which have long had BGP FlowSpec support). This article demonstrates DDoS mitigation using Arista EOS. Similar configurations should work with any router that supports sFlow and BGP FlowSpec.
The diagram shows a typical deployment scenario in which an instance of sFlow-RT (running the DDoS Protect application) receives sFlow from the site router (ce-router). A DNS amplification attack is launched against the site. Analysis of the sFlow telemetry immediately recognizes the amplification attack, identifying UDP source port (53) and targeted IP address (192.0.2.129). This information is used to create a BGP FlowSpec rule which propagates via the site router to the transit provider router (sp-router) where the traffic is quickly dropped, ensuring that WAN link bandwidth is protected from the denial of service attack.

There are a number of possible variations on this example. FlowSpec controls can be implemented on the site router to filter out smaller attacks and RTBH controls sent to the transit provider to block large attacks. If more than one site router is involved, an instance of sFlow-RT can be associated with each of the routers, or a route reflector can be set up to distribute controls from sFlow-RT to all the site routers. If your are a service provider, you can use the software to provide DDoS mitigation as a service to your customers.

The following partial configuration enables sFlow and BGP on an Arista EOS device (EOS 4.22 or later):
!
service routing protocols model multi-agent
!
sflow sample 16384
sflow polling-interval 30
sflow destination 10.0.0.70
sflow run
!
interface Ethernet1
   flow-spec ipv4 ipv6
!
interface Management1
   ip address 10.0.0.96/24
!
ip routing
!
ipv6 unicast-routing
!
router bgp 64497
   router-id 192.0.2.1
   neighbor 10.0.0.70 remote-as 65070
   neighbor 10.0.0.70 transport remote-port 1179
   neighbor 10.0.0.70 allowas-in 3
   neighbor 10.0.0.70 send-community extended
   neighbor 10.0.0.70 maximum-routes 12000 
   !
   address-family flow-spec ipv4
      neighbor 10.0.0.70 activate
   !
   address-family flow-spec ipv6
      neighbor 10.0.0.70 activate
   !
   address-family ipv4
      neighbor 10.0.0.70 activate
   !
   address-family ipv6
      neighbor 10.0.0.70 activate
!
DDoS Protect is packaged with sFlow-RT in the sflow/ddos-protect Docker image. Running the following command on host 10.0.0.70 launches the controller:
% docker run --net=host sflow/ddos-protect \
-Dddos_protect.router=10.0.0.96 \
-Dddos_protect.as=65070 \
-Dddos_protect.enable.ipv6=yes \
-Dddos_protect.enable.flowspec=yes \
-Dddos_protect.enable.flowspec6=yes
2020-02-14T13:54:58-08:00 INFO: Starting sFlow-RT 3.0-1466
2020-02-14T13:54:59-08:00 INFO: Version check, running latest
2020-02-14T13:54:59-08:00 INFO: License installed, swkey.json
2020-02-14T13:55:00-08:00 INFO: Listening, BGP port 1179
2020-02-14T13:55:00-08:00 INFO: Listening, sFlow port 7343
2020-02-14T13:55:00-08:00 INFO: Listening, HTTP port 8008
2020-02-14T13:55:00-08:00 INFO: DNS server 8.8.8.8
2020-02-14T13:55:00-08:00 INFO: DNS server 8.8.4.4
2020-02-14T13:55:00-08:00 INFO: app/flow-trend/scripts/top.js started
2020-02-14T13:55:00-08:00 INFO: app/ddos-protect/scripts/ddos.js started
2020-02-14T13:55:37-08:00 INFO: BGP open 10.0.0.96 33917
The last log line confirms that the router has successfully opened the BGP connection to the controller. Now it's time to configure the controller.
In this example, the controller is configured to detect UDP amplification attacks, applying a FlowSpec filter when traffic exceeds 10,000 packets per second (this threshold has been set low so we can test the controller using a simulated attack). In addition, a new Address Group, site, has been created containing the CIDR to be protected, 192.0.2.0/24.
sudo hping3 --flood --udp --rand-source -k -s 53 192.0.2.129
The above command simulates DNS amplification attack using hping3.
The DDoS Protect Charts tab provides an up to the second trend chart for each of the attack types being monitored (see screen capture at top of this article). In this case, the udp_amplification chart shows the simulated attack targeting 192.0.2.129 exceeded the 10,000 Packets per Second threshold, triggering an automated response.
localhost>show bgp flow-spec ipv4
BGP Flow Specification rules for VRF default
Router identifier 10.0.0.98, local AS number 65096
Rule status codes: # - not installed, M - received from multiple peers

   Matching Rule                                                Actions
   192.0.2.129/32;*;IP:17;SP:53;                                Drop
Command line output from the site router shown above verifies that a FlowSpec control blocking the amplification attack has been received. The control will remain in place for 60 minutes (the configured timeout), after which it will be automatically withdrawn. If the attack is still in progress it will be immediately detected and the control reapplied.

DDoS Protect can mitigate a wide range of common attacks, including: NTP, DNS, Memcached, SNMP, and SSDP amplification attacks; IP, UDP, ICMP and TCP flood attacks; and IP fragmentation attacks. Mitigation options include: remote triggered black hole (RTBH), filtering, rate limiting, and DSCP marking. IPv6 is fully supported in detection and mitigation of each of these attack types.

The standard sFlow/BGP support built into routers provides a low cost, simple to deploy, method of efficiently removing DDoS traffic. Follow the steps described in this article to try out the solution on your network.

Thursday, January 30, 2020

SAI 1.5

The Open Compute Project (OCP), "is a rapidly growing community of engineers around the world whose mission is to design and enable the delivery of the most efficient server, storage and data center hardware designs available for scalable computing."

The OCP SAI (Switch Abstraction Interface) Project is an important part of the networking effort, defining "a vendor-independent way of controlling forwarding elements, such as a switching ASIC, an NPU or a software switch in a uniform manner." SAI 1.5 Release Notes describe enhancements to existing sFlow API, in particular adding support for the Linux psample netlink channel, see  Linux 4.11 kernel extends packet sampling support. Supporting the standard Linux interface for packet sampling simplifies the implementation of sFlow agents (e.g. Host sFlow) and ensures consistent behavior across hardware platforms to deliver real-time network-wide visibility using industry standard sFlow protocol.