Thursday, March 29, 2018

Mininet weathermap

Mininet dashboard is a real-time dashboard displaying traffic information from Mininet virtual networks. The screen capture demonstrates the real-time network weather map capability that was recently added to the dashboard. The torus topology is displayed and link widths are updated every second to reflect traffic. In this example a large flow between switches s1x1 and s3x3 is routed via s1x3.

The network was created using the following Mininet command:
sudo mn --custom=sflow-rt/extras/sflow.py --link tc,bw=10 \
--topo torus,3,3 --switch ovsbr,stp=1 --test iperf
In the screen capture above you can clearly see the large flow traversing switches, s4, s3, s2, s1, s9, s13, and s15 in a tree topology. The network was created using the following command:
sudo mn --custom sflow-rt/extras/sflow.py --link tc,bw=10 \
--topo tree,depth=4,fanout=2 --test iperf
The screen capture above shows a large flow traversing switches s1, s2, s3, and s4 in a linear topology. The network was created using the following command:
sudo mn --custom sflow-rt/extras/sflow.py --link tc,bw=10 \
--topo linear,4 --test iperf
It's also easy to create Custom Topologies. The following command creates the example custom topology, topo-2sw-2host.py, that ships with Mininet:
sudo mn --custom ~/mininet/custom/topo-2sw-2host.py,sflow-rt/extras/sflow.py \
--link tc,bw=10 --topo mytopo --test iperf
The final screen capture shows all three tests, identifying top flows, busiest switch ports, and showing the diameter of the network topology. This test run demonstrates the efficiency of using Mininet; creating, testing, and destroying three virtual networks, toroid (9 switches, 9 hosts, 27 links), tree (15 switches, 16 hosts, 30 links), linear (4 switches, 4 hosts, 7 links), in a few minutes, on a laptop.

For anyone curious about the technologies underpinning these examples, Mininet is a Python program that uses Open vSwitch to connect lightweight Linux containers representing hosts and switches. Open vSwitch includes industry standard sFlow instrumentation that efficiently streams telemetry to sFlow-RT real-time analytics software running the mininet-dashboard application. Mininet-dashboard uses the vis.js library to create the dynamic network diagrams.

Mininet's support of sFlow provides the same measurement technology used in physical switches, from low cost edge switches to high end chassis routers, including: A10, Aerohive, ALUe, Allied Telesis, Arista, Aruba, Big Switch, Cisco, Cumulus, Dell, D-Link, Edge-Core, Extreme, F5, Fortinet, Huawei, IP Infusion, Juniper, Netgear, OpenSwitch, Pica8, Proxim, Quanta, SMC, ZTE, and ZyXEL. A common measurement protocol makes it easy to move between emulated and physical networks, for example, Large flow detection describes how to scale sFlow configuration settings from emulated 10Mbit/s Mininet networks to 10G, 40G and 100G physical networks, and SC16 live real-time weathermaps uses the technologies described in this article to monitor a large 100Gbit/s network.

Thursday, March 22, 2018

OCP Summit 2018

Network telemetry was a popular topic at the recent OCP U.S. Summit 2018 in San Jose, California, with an entire afternoon track of the two day conference devoted to the subject. Videos of the talks should soon be posted on the conference web site.

The following articles on this blog cover related topics:
In addition, there were a couple of live sFlow telemetry demonstration in the conference exhibit hall.
The first was a demonstration of leaf and spine fabric visibility using white box switches running the open source Linux Foundation OpenSwitch network operating system. OpenSwitch describes how the open source Host sFlow agent enables standard sFlow instrumentation in merchant silicon based white box switches using OpenSwitch Control Plane Services (CPS), which in turn programs the silicon using the OCP Switch Abstraction Interface (SAI).

The rack in the booth contains a two spine, five leaf network. Each of the switches in the network is streaming real-time sFlow telemetry to an instance of Fabric View which is displaying real-time (up to the second) flow analytics on the monitor to the right of the picture.
The second demonstration shows a Marvell top of rack (ToR) switch with an ARM based management CPU running Linux, DPDK and SONiC. In this example, the switch is streaming sFlow telemetry to a free version of sFlowTrend which is displaying top flows in the trend chart.

Monday, March 19, 2018

Prometheus and Grafana

Prometheus is an open source time series database optimized to collect large numbers of metrics from cloud infrastructure. This article will explore how industry standard sFlow telemetry streaming supported by network devices and Host sFlow agents (Linux, Windows, FreeBSD, AIX, Solaris, Docker, Systemd, Hyper-V, KVM, Nutanix AHV, Xen) can be integrated with Prometheus.

The diagram above shows the elements of the solution: sFlow telemetry streams from hosts and switches to an instance of sFlow-RT. The sFlow-RT analytics software converts the raw measurements into metrics that are accessible through a REST API.

The following prometheus.php script mediates between the Prometheus metrics export protocol and the sFlow-RT REST API.  HTTP queries from Prometheus are translated into calls to the sFlow-RT REST API and JSON responses are converted into Prometheus metrics.
<?php
header('Content-Type: text/plain');
if(isset($_GET['labels'])) {
  $keys = htmlspecialchars($_GET["labels"]);
}
$vals = htmlspecialchars($_GET["values"]);
if(isset($keys)) {
  $cols = $keys.','.$vals;
} else {
  $cols = $vals;
}
$key_arr = explode(",",$keys);
$result = file_get_contents('http://localhost:8008/table/ALL/'.$cols.'/json');
$obj = json_decode($result,true);
foreach ($obj as $row) {
  unset($labels);
  foreach ($row as $cell) {
    if(!isset($labels)) {
      $labels = 'agent="'.$cell['agent'].'",datasource="'.$cell['dataSource'].'"';
    }
    $name = $cell['metricName'];
    $val = $cell['metricValue'];
    if(in_array($name,$key_arr)) {
      $labels .= ','.$name.'="'.$val.'"';
    } else {
      print $name."{".$labels."} ".$val."\n";
    }
  }
}
?>
Install prometheus.php under the web server home directory (e.g. /var/www/html) on the system running sFlow-RT and verify functionality using cURL:
$ curl "http://localhost/prometheus.php?labels=host_name&values=load_one"
load_one{agent="10.0.0.86",datasource="2.1",host_name="server3"} 0.06
load_one{agent="10.0.0.85",datasource="2.1",host_name="server2"} 0
load_one{agent="10.0.0.82",datasource="2.1",host_name="spine1"} 0
load_one{agent="10.0.0.83",datasource="2.1",host_name="spine2"} 0
load_one{agent="10.0.0.80",datasource="2.1",host_name="leaf1"} 0
load_one{agent="10.0.0.81",datasource="2.1",host_name="leaf2"} 0
Define metrics "scraping" jobs in the Prometheus configuration file, prometheus.yml. In this example sFlow-RT is running on host 10.0.0.162 and two sets of metrics have been defined using prometheus.php interface. The sflow-rt-hosts job retrieves host metrics labeled by host_name and the sflow-rt-ifstats job retrieves network interface metrics labeled by host_name and ifname. The metric values in this example are a small selection from the extensive set of standard sFlow metrics available from sFlow-RT, see Metrics.
global:
  scrape_interval:     15s
  evaluation_interval: 15s

rule_files:
  # - "first.rules"
  # - "second.rules"

scrape_configs:
  - job_name: 'sflow-rt-hosts'
    scrape_interval: 30s
    metrics_path: /prometheus.php
    params:
      labels: ['host_name']
      values: ['load_one,cpu_utilization,proc_run']
    static_configs:
      - targets: ['10.0.0.162']
  - job_name: 'sflow-rt-ifstats'
    scrape_interval: 30s
    metrics_path: /prometheus.php
    params:
      labels: ['host_name,ifname']
      values: ['ifinutilization,ifoututilization,ifindiscards,ifoutdiscards']
    static_configs:
      - targets: ['10.0.0.162']
Start Prometheus. For example, the following command shows how to run Prometheus under Docker:
docker run --name prometheus --rm -v $PWD/data:/prometheus \
-v $PWD/prometheus.yml:/etc/prometheus/prometheus.yml \
-p 9090:9090 prom/prometheus
The screen capture above shows the Prometheus web interface, accessible on port 9090.
Grafana is open source time series analysis software. The ability to pull data from many data sources and the extensive range of charting options makes Grafana an attractive tool for building operations dashboards.

The following command shows how to run Grafana under Docker:
docker run --name grafana --rm -v $PWD/data:/var/lib/grafana \
-p 3000:3000 grafana/grafana
Access the Grafana web interface on port 3000, configure a data source for the Prometheus database, and start building dashboards. The screen capture above shows the same chart built earlier using the native Prometheus interface.

Standard sFlow telemetry provides a unified method of monitoring large scale network and compute infrastructure. This example focussed on sFlow counter metrics, but sFlow also provides real-time flow information that can be used to generate flow metrics, for example, reporting on interactions between microservices.

Additional examples on this blog include: