Tuesday, May 12, 2020

Real-time network and system metrics as a service

The sFlow-RT real-time analytics engine receives industry standard sFlow telemetry as a continuous stream from network and host devices and coverts the raw data into useful measurements that can be be queried through a REST API. A single sFlow-RT instance can monitor the entire data center, providing a comprehensive view of performance, not just of the individual components, but of the data center as a whole.

This article is an interactive tutorial intended to familiarize the reader with the REST API. The examples can be run on a laptop using recorded data so that access to a live network is not required.

The data was captured from the leaf and spine test network shown above (described in Fabric View).
curl -O https://raw.githubusercontent.com/sflow-rt/fabric-view/master/demo/ecmp.pcap
First, download the captured sFlow data.

You will need to have a system with Java or Docker to run the sFlow-RT software.
curl -O https://inmon.com/products/sFlow-RT/sflow-rt.tar.gz
tar -xzf sflow-rt.tar.gz
./sflow-rt/get-app.sh sflow-rt browse-metrics
./sflow-rt/get-app.sh sflow-rt browse-flows
./sflow-rt/get-app.sh sflow-rt prometheus
./sflow-rt/start.sh -Dsflow.file=$PWD/ecmp.pcap
The above commands download and run sFlow-RT, with browse-metrics, browse-flows, and prometheus applications on a system with Java 1.8+ installed.
docker run --rm -v $PWD/ecmp.pcap:/sflow-rt/ecmp.pcap \
-p 8008:8008 --name sflow-rt sflow/prometheus -Dsflow.file=ecmp.pcap
Alternatively, the above command runs sFlow-RT and applications using Docker.
The REST API is documented using OpenAPI. Use a web browser to access the REST API explorer at http://localhost:8008/api/index.html.

Each of the examples can be run in a terminal window using curl, or you can simply click on the link to see the results in your web browser.
Measurements are represented by sFlow-RT in the form of a logical table. Each agent is a device on the network and is uniquely identified by an IP address. Each agent may have one or more datasources that represent a logical source of measurements. For example, a network switch will have a data source for each port.
curl http://localhost:8008/agents/json
List the sFlow agents.
curl http://localhost:8008/metrics/json
List the names of the metrics being exported by the agents. The available metrics depend on the types of agent streaming data to sFlow-RT. For a list of supported sFlow metrics, see Metrics.
curl http://localhost:8008/metric/ALL/max:ifinutilization,max:ifoututilization/json
Find the switch ports with the highest input and output utilization.

The metric query walks the table and returns a value that summarizes each metric in a comma separated list. The following summary statistics are supported:
  • max: Maximum value
  • min: Smallest value
  • sum: Total value
  • avg: Average value
  • var: Variance
  • sdev: Standard deviation
  • med: Median value
  • q1: First quartile
  • q2: Second quartile (same as med:)
  • q3: Third quartile
  • iqr: Inter-quartile range (i.e. q3 - q1)
The browse-metrics application makes use of the metric REST API and can be used to query and trend metrics.
Click on the link below to plot a graph of the switch port with the highest input utilization (screen capture shown above):
The following examples show how to retrieve metric values without summarization.
curl http://localhost:8008/table/ALL/ifinutilization,ifoututilization/json
Get a table of input and output utilization for every switch port. The table query doesn't summarize metrics. Instead, the query returns rows from the logical table that include the metrics specified in the query.
curl http://localhost:8008/dump/ALL/ALL/json
Dump all metric values for all agents. The dump query is similar to the table query, but instead of walking the table row by row, individual metrics are traversed in their internal order.
curl http://localhost:8008/prometheus/metrics/ALL/ALL/txt
Dump all metric values in Prometheus Exporter format. For example, the Grafana sFlow-RT Network Interfaces dashboard makes use of the query to populate the Prometheus time series database.

There are two types of measurement carried by sFlow: periodically exported counters and randomly sampled packets. So far the examples have been querying metrics derived from the counters.
curl http://localhost:8008/flowkeys/json
Get the list of attributes that are being extracted from sampled packet headers. The available attributes depend on the type of traffic flowing in the network. For a list of supported packet attributes, see Defining Flows.
curl -H "Content-Type:application/json" -X PUT \
--data '{"keys":"ipsource,ipdestination",value:"bytes"}' \
Define a new "flow" metric called srcdst that calculates the bytes per second between each pair of communicating IP addresses on the network.
curl http://localhost:8008/metric/ALL/max:srcdst/json
Find the maximum value of the newly defined srcdst flow metric, i.e. the switch port on the network observing the highest bandwidth flow of packets.
 "agent": "",
 "metricName": "max:srcdst",
 "topKeys": [
   "lastUpdate": 1274,
   "value": 3.392739739066506E8,
   "key": ","
   "lastUpdate": 2352,
   "value": 2.155296894816872E8,
   "key": ","
 "metricN": 10,
 "lastUpdate": 1275,
 "lastUpdateMax": 2031,
 "metricValue": 3.392739739066506E8,
 "dataSource": "4",
 "lastUpdateMin": 1267
In addition to providing a metric value, the result also includes topKeys, showing the top flows seen at the switch port.
Click on the link below to trend the srcdst metric (screen capture shown above):
There are additional queries specific to flow metrics.
curl http://localhost:8008/activeflows/ALL/srcdst/json
Find the largest flows gathered from all the interfaces in the network.
  "flowN": 7,
  "agent": "",
  "value": 5.537867023642346E8,
  "dataSource": "4",
  "key": ","
  "flowN": 6,
  "agent": "",
  "value": 5.1034569007443213E8,
  "dataSource": "38",
  "key": ","
  "flowN": 6,
  "agent": "",
  "value": 1469003.6788768284,
  "dataSource": "4",
  "key": ","
  "flowN": 7,
  "agent": "",
  "value": 1306006.2405022713,
  "dataSource": "37",
  "key": ","
Each flow returned identifies the number of locations it was observed and the port with the maximum value. For example, the largest flow from to was seen by 7 data sources and its maximum value 5.5e8 was observed by data source 4 on agent
Click on the link below to plot a graph of the top flows using the browse-flows application (screen capture shown above):
Note how quickly the graph changes as it tracks new elephant flows in real time.

See RESTflow for a more detailed discussion of sFlow-RT's flow REST API.

This tutorial has just scratches the surface of the capabilities of sFlow-RT's analytics engine. The Writing Applications tutorial provides further examples and a discussion of how to build applications using Python and JavaScript, see Real-time DDoS mitigation using BGP RTBH and FlowSpecFabric View and Flow metrics with Prometheus and Grafana for examples of sFlow-RT applications.

Seeing your own data is more interesting than a canned demonstration. Network Equipment lists devices that support sFlow. Ubuntu 18.04 and CentOS 8 describe how to install the open source Host sFlow agent on popular Linux distributions, extending visibility into compute and cloud infrastructure. The Host sFlow agent is also available as a Docker image for easy deployment with container orchestration systems, see Host, Docker, Swarm and Kubernetes monitoring.

Even if you don't have access to a production environment, the Docker testbed and Kubernetes testbed examples show how to build a virtual testbed using Docker Desktop. Alternatively, Mininet flow analytics and Mininet dashboard provide starting points if you want to experiment with software defined networking (SDN).

Finally, join the sFlow-RT community to ask questions and share solutions and operational experience.

No comments:

Post a Comment