Thursday, November 21, 2019

Real-time monitoring at terabit speeds

The Flow Trend chart above shows a real-time, up to the second, view of nearly 3 terabits per second of traffic flowing across the SCinet network, described as the fastest, most powerful volunteer-built network in the world. The network is build each year to support The International Conference for High Performance Computing, Networking, Storage, and Analysis. The SC19 conference is currently underway in Denver, Colorado.
The diagram shows the Joint Big Data Testbed generating the traffic in the chart. The Caltech demonstration is described in NRE-19: SC19 Network Research Exhibition: Caltech Booth 543 Demonstrations Hosting NRE-13, NRE-19, NRE-20, NRE-22, NRE-23, NRE-24, NRE-35:
400GE First Data Networks: Caltech, Starlight/NRL, USC, SCinet/XNET, Ciena, Mellanox, Arista, Dell, 2CRSI, Echostreams, DDN and Pavilion Data, as well as other supporting optical, switch and server vendor partners will demonstrate the first fully functional 3 X400GE local ring network as well as 400GE wide area network ring, linking the Starlight and Caltech booths and Starlight in Chicago. This network will integrate storage using NVMe over Fabric, the latest high throughput methods, in-depth monitoring and realtime flow steering. As part of these demonstrations, we will make use of the latest DWDM, Waveserver Ai, and 400GE as well as 200GE switch and network interfaces from Arista, Dell, Mellanox and Juniper as part of this core set of demonstrations.
Industry standard sFlow telemetry from the Arista, Dell, Mellanox, and Extreme switches in the testbed is being processed by an instance of the sFlow-RT real-time analytics engine running the embedded Flow Trend application (as well as a number of other application, including:  SC19 SCinet: Grafana network traffic dashboard).

This example demonstrates the scalability of sFlow monitoring, leveraging instrumentation built into switch ASICs to deliver comprehensive line rate visibility into the 400 Gigabit per second traffic flows generated by the testbed.

Tuesday, November 19, 2019

SC19 SCinet: Grafana network traffic dashboard

The Grafana sFlow-RT Countries and Networks dashboard above shows traffic on the SCinet network, described as the fastest, most powerful volunteer-built network in the world. The network is build each year to support The International Conference for High Performance Computing, Networking, Storage, and Analysis. The SC19 conference is currently underway in Denver, Colorado and the screen capture is live data from the conference network.
The high speed switches and routers used to construct the SCinet network support industry standard sFlow streaming telemetry. In this case an instance of the sFlow-RT analytics engine receives the telemetry stream and generates flow analytics that are scraped every 15 seconds by an instance of the Prometheus time series database. The Prometheus database is in turn queried by an instance of Grafana which generated the dashboard shown at the top of the page.
In addition, sFlow-RT is running an embedded application that generates a real-time, up to the second, view of the traffic over the last 5 minutes.
This solution is extremely scalable. A single sFlow-RT instance, allocated only 1G of memory, easily monitors 158 network devices, while supporting 11 different applications (including the real-time dashboard and Prometheus export applications shown above).

Wednesday, October 30, 2019

Observability in Data Center Networks


Observability in Data Center Networks: In this session, you’ll learn how the sFlow protocol provides broad visibility in modern data center environments as they migrate to highly meshed topologies. Our data center workloads are shifting to take advantage of higher speeds and bandwidth, so visibility to east-west traffic within the data center is becoming more important. Join Peter Phaal—one of the inventors of sFlow—and Joe Reves from SolarWinds product management as they discuss how sFlow differs from other flow instrumentation to deliver visibility in the switching fabric.
THWACKcamp is SolarWinds’ free, annual, worldwide virtual IT learning event connecting thousands of skilled IT professionals with industry experts and SolarWinds technical staff. This video was one of the sessions.

Wednesday, October 9, 2019

InfluxDB 2.0

Introducing the Next-Generation InfluxDB 2.0 Platform mentions that InfluxDB 2.0 will be able to scrape Prometheus exporters. Get started with InfluxDB provides instructions for running an alpha version of the new software using Docker:
docker run --name influxdb -p 9999:9999 quay.io/influxdb/influxdb:2.0.0-alpha
Prometheus exporter describes an application that runs on the sFlow-RT analytics platform that converts real-time streaming telemetry from industry standard sFlow agents. Host, Docker, Swarm and Kubernetes monitoring describes how to deploy agents on popular container orchestration platforms.
The screen capture above shows three scrapers configured in InfluxDB 2.0:
  1. sflow-rt-analyzer,
    URL: http://10.0.0.70:8008/prometheus/analyzer/txt
  2. sflow-rt-dump,
    URL: http://10.0.0.70:8008/prometheus/metrics/ALL/ALL/txt
  3. sflow-rt-flow-src-dst,
    URL: http://10.0.0.70:8008/app/prometheus/scripts/export.js/flows/ALL/txt?metric=flow_src_dst_bps&key=ipsource,ipdestination&value=bytes&aggMode=max&maxFlows=100&minValue=1000&scale=8
The first collects metrics about the performance of the sFlow-RT analytics engine, the second, all the metrics exported by the sFlow agents, and the third, is a flow metric (see Flow metrics with Prometheus and Grafana).

Updated 19 October 2019, native support for Prometheus export added to sFlow-RT, URLs 1 and 2 modified to reflect new API.
InfluxDB 2.0 now includes the data exploration and dashboard building capabilities that were previously in the separate Chronograf application. The screen capture above shows a simple chart trending ifinoctets across a number of switch ports.

Note: There are a number of articles on this blog that demonstrate how to push metrics from sFlow-RT into InfluxDB 1.0 using its REST API. The ability to scrape metrics from a Prometheus exporter simplifies the integration.

Tuesday, October 1, 2019

Flow metrics with Prometheus and Grafana

The Grafana dashboard above shows real-time network traffic flow metrics. This article describes how to define and collect flow metrics using the Prometheus time series database and build Grafana dashboards using those metrics.
Prometheus exporter describes an application that runs on the sFlow-RT analytics platform that converts real-time streaming telemetry from industry standard sFlow agents. Host, Docker, Swarm and Kubernetes monitoring describes how to deploy agents on popular container orchestration platforms.

The latest version of the Prometheus exporter application adds flow export.
global:
  scrape_interval:     15s
  evaluation_interval: 15s

rule_files:
  # - "first.rules"
  # - "second.rules"

scrape_configs:
  - job_name: 'sflow-rt-metrics'
    metrics_path: /prometheus/metrics/ALL/ALL/txt
    static_configs:
      - targets: ['10.0.0.70:8008']
  - job_name: 'sflow-rt-src-dst-bps'
    metrics_path: /app/prometheus/scripts/export.js/flows/ALL/txt
    static_configs:
      - targets: ['10.0.0.70:8008']
    params:
      metric: ['ip_src_dst_bps']
      key: ['ipsource','ipdestination']
      label: ['src','dst']
      value: ['bytes']
      scale: ['8']
      minValue: ['1000']
      maxFlows: ['100']
  - job_name: 'sflow-rt-countries-bps'
    metrics_path: /app/prometheus/scripts/export.js/flows/ALL/txt
    static_configs:
      - targets: ['10.0.0.70:8008']
    params:
      metric: ['ip_countries_bps']
      key: ['null:[country:ipsource]:unknown','null:[country:ipdestination]:unknown']
      label: ['src','dst']
      value: ['bytes']
      scale: ['8']
      aggMode: ['sum']
      minValue: ['1000']
      maxFlows: ['100']
The above prometheus.yml file extends the previous example to add two additional scrape jobs, sflow-rt-src-dst-bps and sflow-rt-countries-bps, that return flow metrics. Defining flows describes the attributes and settings available to build a flow definition. The metric: setting names the Prometheus metric and the label: setting is used to map corresponding sFlow-RT flow keys into Prometheus labels.

Updated 19 October 2019, native support for Prometheus export added to sFlow-RT, sflow-rt-metrics job modified reflect new API.
The first step in building a Grafana dashboard panel to display flow data is to construct a query:
topk(10, sum(ip_src_dst_bps) by (src))
In this case, the query sums the flows by source address and return the top 10 values for each interval in the graph.

The query for the Top Source Countries chart is a little more complex:
topk(10,sum(ip_countries_bps{src!="unknown"}) by (src))
In this case unknown source country values (the value set in the prometheus.yml file to use when a country lookup fails on an ipsource address) are excluded in the query.
In the visualization settings, Null value: null as zeroTooltip Mode: Single, label the Left Y axis, and Legend Show disabled.
Finally, give the chart a title.
The Prometheus exporter application on sFlow-RT (accessible on port 8008) has a REST API explorer, above, that can be used to experiment with flow settings before configuring a Prometheus scraper job. When testing the settings, the first query will not return any data since the flow hasn't been programmed. Click the Execute button a second time to see data. Also consider using the sflow/flow-trend application as a way to gain familiarity with sFlow-RT's flow analytics engine.

Wednesday, September 25, 2019

Host, Docker, Swarm and Kubernetes monitoring

The open source Host sFlow agent incorporates technologies that address the challenges of microservice monitoring; leveraging recent enhancements to Berkeley Packet Filter (BPF) in the Linux kernel to randomly sample packets, and  Asynchronous Docker metrics to track rapidly changing workloads. The continuous stream of real-time telemetry from all compute nodes, transported using the industry standard sFlow protocol, provides comprehensive real-time cluster-wide visibility into all services and the traffic flowing between them.

The Host sFlow agent is available as pre-packaged rpm/deb files that can be downloaded and installed on each node in a cluster.
sflow {
  collector { ip=10.0.0.70 }
  docker { }
  pcap { dev=docker0 }
  pcap { dev=docker_gwbridge } 
}
The above /etc/hsflowd.conf file, see Configuring Host sFlow for Linux via /etc/hsflowd.conf, enables the docker {} and pcap {} modules for detailed visibility into container metrics and network traffic flows, and streams telemetry to an sFlow collector (10.0.0.70). The configuration is the same for every node making it simple to install and configure Host sFlow on all nodes using orchestration software such as Puppet, Chef, Ansible, etc.

The agent is also available as the pre-build sflow/host-sflow image, providing a simple method of instrumenting nodes running container workloads.
docker run \
--detach \
--name=host-sflow \
--env "COLLECTOR=10.0.0.70" \
--net=host \
--volume /var/run/docker.sock:/var/run/docker.sock:ro \
sflow/host-sflow
Execute above command to install and run the Host sFlow agent on a Docker node.
docker service create \
--mode global \
--name host-sflow \
--env "COLLECTOR=10.0.0.70" \
--network host \
--mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock,readonly \
sflow/host-sflow
Install and run an instance of the Host sFlow agent on each node in a Docker Swarm cluster.

Deploying Host sFlow under Kubernetes is a little more complicated.
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: host-sflow
spec:
  selector:
    matchLabels:
      name: host-sflow
  template:
    metadata:
      labels:
        name: host-sflow
    spec:
      hostNetwork: true
      containers:
      - name: host-sflow
        image: sflow/host-sflow:latest
        env:
          - name: COLLECTOR
            value: "10.0.0.70"
          - name: NET
            value: "host"
        volumeMounts:
          - mountPath: /var/run/docker.sock
            name: docker-sock
            readOnly: true
      volumes:
        - name: docker-sock
          hostPath:
            path: /var/run/docker.sock
First, create a deployment description file like the host-sflow.yml file above.
kubectl apply -f host-sflow.yml
Install and run an instance of the Host sFlow agent on each node in the Kubernetes cluster.
docker run -p 6343:6343/udp sflow/sflowtool
Run the command above on the collector (10.0.0.70) to verify that sFlow is arriving, see Running sflowtool using Docker.
docker run -p 6343:6343/udp -p 8008:8008 sflow/sflow-rt
Run the sflow/sflow-rt image to access real-time cluster performance metrics and network traffic flows through a REST API. Forwarding using sFlow-RT describes how to copy sFlow telemetry streams for additional tools.
Install sFlow-RT applications to export metrics to Prometheus, block DDoS attacks, visualize flows, etc. Writing Applications describes how to use APIs to build your own applications to integrate analytics with automation and monitoring tools.

Monday, September 9, 2019

Packet analysis using Docker

Why use sFlow for packet analysis? To rephrase the Heineken slogan, sFlow reaches the parts of the network that other technologies cannot reach. Industry standard sFlow is widely supported by switch vendors, embedding wire-speed packet monitoring throughout the network. With sFlow, any link or group of links can be remotely monitored. The alternative approach of physically attaching a probe to a SPAN/Mirror port is becoming much less feasible with increasing network sizes (10's of thousands of switch ports) and link speeds (10, 100, and 400 Gigabits). Using sFlow for packet capture doesn't replace traditional packet analysis, instead sFlow extends the capabilities of existing packet capture tools into the high speed switched network.

This article describes the sflow/tcpdump  and sflow/tshark Docker images, which provide a convenient way to analyze packets captured using sFlow.

Run the following command to analyze packets using tcpdump:
$ docker run -p 6343:6343/udp -p 8008:8008 sflow/tcpdump

19:06:42.000000 ARP, Reply 10.0.0.254 is-at c0:ea:e4:89:b0:98 (oui Unknown), length 64
19:06:42.000000 IP 10.0.0.236.548 > 10.0.0.70.61719: Flags [P.], seq 3380015689:3380015713, ack 515038158, win 41992, options [nop,nop,TS val 1720029042 ecr 904769627], length 24
19:06:42.000000 IP 10.0.0.236.548 > 10.0.0.70.61719: Flags [P.], seq 149816:149832, ack 510628, win 41992, options [nop,nop,TS val 1720029087 ecr 904770068], length 16
19:06:42.000000 IP 10.0.0.236.548 > 10.0.0.70.61719: Flags [P.], seq 149816:149832, ack 510628, win 41992, options [nop,nop,TS val 1720029087 ecr 904770068], length 16
The normal tcpdump options can be used. For example, to select DNS packets:
$ docker run -p 6343:6343/udp -p 8008:8008 sflow/tcpdump -vv port 53
reading from file -, link-type EN10MB (Ethernet)
19:08:49.000000 IP (tos 0x0, ttl 64, id 22316, offset 0, flags [none], proto UDP (17), length 65)
    10.0.0.70.43801 > dns.google.53: [udp sum ok] 35941+ A? clients2.google.com. (37)
19:09:00.000000 IP (tos 0x0, ttl 255, id 16813, offset 0, flags [none], proto UDP (17), length 66)
    10.0.0.64.50675 > 10.0.0.1.53: [udp sum ok] 57874+ AAAA? p49-imap.mail.me.com. (38)
The following command selects TCP SYN packets:
$ docker run -p 6343:6343/udp sflow/tcpdump 'tcp[tcpflags] == tcp-syn'
reading from file -, link-type EN10MB (Ethernet)
19:10:37.000000 IP 10.0.0.30.46786 > 10.0.0.162.1179: Flags [S], seq 2993962362, win 29200, options [mss 1460,sackOK,TS val 20531427 ecr 0,nop,wscale 9], length 0
Capture 10 packets to a file and then exit:
$ docker run -v $PWD:/pcap -p 6343:6343/udp sflow/tcpdump -w /pcap/packets.pcap -c 10
reading from file -, link-type EN10MB (Ethernet)
A tcpdump Tutorial with Examples — 50 Ways to Isolate Traffic provides an overview of the capabilities of tcpdump with useful examples.

Run the following command to analyze packets using tshark - a terminal based version of Wireshark:
$ docker run -p 6343:6343/udp -p 8008:8008 sflow/tshark
Capturing on '-'
    1   0.000000   10.0.0.236 → 10.0.0.70    AFP 1518 [Reply without query?]
    2   0.000000   10.0.0.236 → 10.0.0.70    AFP 1518 [Reply without query?]
    3   0.000000   10.0.0.114 → 10.0.0.72    SSH 1518 Server: Encrypted packet (len=1448)
Packets can be filtered using Display Filters. For example, the following command selects DNS traffic:
$ docker run -p 6343:6343/udp -p 8008:8008 sflow/tshark -Y 'dns'
Capturing on '-'
  328  22.000000      8.8.8.8 → 10.0.0.70    DNS 136 Standard query response 0xfce4 AAAA img.youtube.com CNAME ytimg.l.google.com AAAA
  472  36.000000    10.0.0.52 → 10.0.0.1     DNS 79 Standard query 0x173e AAAA www.nytimes.com
Print ip source, destination, protocol and packet lengths:
$ docker run -p 6343:6343/udp -p 8008:8008 sflow/tshark -T fields -e ip.src -e ip.dst -e ip.proto -e ip.len
Capturing on '-'
10.0.0.70 10.0.0.236 6 1500
10.0.0.236 10.0.0.70 6 52
10.0.0.70 10.0.0.236 6 1500
10.0.0.236 10.0.0.70 6 52
10.0.0.70 10.0.0.236 6 1500
Capture 100 packets and print summary of the protocols:
$ docker run -p 6343:6343/udp -p 8008:8008 sflow/tshark -q -z io,phs -c 100
Capturing on '-'
100 packets captured

===================================================================
Protocol Hierarchy Statistics
Filter: 

eth                                      frames:100 bytes:85721
  ip                                     frames:99 bytes:85657
    tcp                                  frames:97 bytes:85119
      dsi                                frames:61 bytes:82122
        _ws.short                        frames:54 bytes:77180
        afp                              frames:6 bytes:4856
          _ws.short                      frames:5 bytes:4766
      _ws.short                          frames:15 bytes:1050
      http                               frames:1 bytes:499
        _ws.short                        frames:1 bytes:499
      iscsi                              frames:1 bytes:118
        iscsi.flags                      frames:1 bytes:118
          scsi                           frames:1 bytes:118
            _ws.short                    frames:1 bytes:118
    ipv6                                 frames:2 bytes:538
      tcp                                frames:2 bytes:538
        tls                              frames:2 bytes:538
          _ws.short                      frames:2 bytes:538
  arp                                    frames:1 bytes:64
    _ws.short                            frames:1 bytes:64
===================================================================
Capture 100 packets and print a summary of the IP traffic by address:
$ docker run -p 6343:6343/udp -p 8008:8008 sflow/tshark -q -z endpoints,ip -c 100
Capturing on '-'
100 packets captured

================================================================================
IPv4 Endpoints
Filter:
                       |  Packets  | |  Bytes  | | Tx Packets | | Tx Bytes | | Rx Packets | | Rx Bytes |
10.0.0.70                     95         81713         44           25507          51           56206   
10.0.0.236                    91         80820         50           55956          41           24864   
10.0.0.30                      6          2369          2            1508           4             861   
10.0.0.16                      1           587          1             587           0               0   
10.0.0.28                      1           587          0               0           1             587   
10.0.0.160                     1          1258          0               0           1            1258   
10.0.0.172                     1           218          1             218           0               0   
================================================================================
The following command prints packet decodes as JSON:
$ docker run -p 6343:6343/udp -p 8008:8008 sflow/tshark -T json
Capturing on '-'
[
  {
    "_index": "packets-2019-09-06",
    "_type": "pcap_file",
    "_score": null,
    "_source": {
      "layers": {
        "frame": {
          "frame.interface_id": "0",
          "frame.interface_id_tree": {
            "frame.interface_name": "-"
          },
          "frame.encap_type": "1",
          "frame.time": "Sep  6, 2019 19:41:12.000000000 UTC",
          "frame.offset_shift": "0.000000000",
          "frame.time_epoch": "1567798872.000000000",
          "frame.time_delta": "0.000000000",
          "frame.time_delta_displayed": "0.000000000",
          "frame.time_relative": "0.000000000",
          "frame.number": "1",
          "frame.len": "64",
          "frame.cap_len": "60",
          "frame.marked": "0",
          "frame.ignored": "0",
          "frame.protocols": "eth:ethertype:arp"
        },
        "eth": {
          "eth.dst": "70:10:6f:d8:13:30",
          "eth.dst_tree": {
            "eth.dst_resolved": "HewlettP_d8:13:30",
            "eth.addr": "70:10:6f:d8:13:30",
            "eth.addr_resolved": "HewlettP_d8:13:30",
            "eth.lg": "0",
            "eth.ig": "0"
          },
          "eth.src": "98:4b:e1:03:4a:61",
          "eth.src_tree": {
            "eth.src_resolved": "HewlettP_03:4a:61",
            "eth.addr": "98:4b:e1:03:4a:61",
            "eth.addr_resolved": "HewlettP_03:4a:61",
            "eth.lg": "0",
            "eth.ig": "0"
          },
          "eth.type": "0x00000806",
          "eth.padding": "00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00"
        },
        "arp": {
          "arp.hw.type": "1",
          "arp.proto.type": "0x00000800",
          "arp.hw.size": "6",
          "arp.proto.size": "4",
          "arp.opcode": "1",
          "arp.src.hw_mac": "98:4b:e1:03:4a:61",
          "arp.src.proto_ipv4": "10.0.0.30",
          "arp.dst.hw_mac": "00:00:00:00:00:00",
          "arp.dst.proto_ipv4": "10.0.0.232"
        },
        "_ws.short": "[Packet size limited during capture: Ethertype truncated]"
      }
    }
  },
The tshark -T ek option formats the JSON output as a single line per packet making the output easy to parse in scripts. For example, the following emerging.py script downloads the Emerging Threats compromised IP address database, parses the JSON records, checks to see if source and destination addresses can be found in the database, and prints out information on any matches:
#!/usr/bin/env python

from sys import stdin
from json import loads
from requests import get

blacklist = set()
r = get('https://rules.emergingthreats.net/blockrules/compromised-ips.txt')
for line in r.iter_lines():
  blacklist.add(line)

for line in stdin:
  msg = loads(line)
  try:
    time = msg['timestamp']
    layers = msg['layers']
    ip = layers["ip"]
    src = ip["ip_ip_src"]
    dst = ip["ip_ip_dst"]
    if src in blacklist or dst in blacklist:
      print "%s %s %s" % (time,src,dst)
  except KeyError:
    pass
The following command runs the script:
$ docker run -p 6343:6343/udp -p 8008:8008 sflow/tshark -T ek | ./tshark.py
See the TShark man page for more options.

Forwarding using sFlow-RT describes how to set up and tear down sFlow streams using the sFlow-RT analytics engine. This is a simple way to direct a stream of sFlow to a desktop running sflowtool. For example, suppose sflowtool is running on host 10.0.0.30 and sFlow-RT is running on host 10.0.0.1, the following command would start a session:
curl -H "Content-Type:application/json" -X PUT --data '{"address":"10.0.0.30"}' \
http://10.0.0.1:8008/forwarding/tcpdump/json
and the following command would end the session:
curl -X DELETE http://10.0.0.1:8008/forwarding/tcpdump/json
Note: The sflow/sflow-rt Docker image is a convenient way to run sFlow-RT:
docker run -p 8008:8008 -p 6343:6343/udp sflow/sflow-rt
Finally, Triggered remote packet capture using filtered ERSPAN, shows how the broad visibility provided by sFlow can be combined with hardware filtering to trigger full packet capture of selected traffic.