Wednesday, December 16, 2015

Environmental metrics with Cumulus Linux

Custom metrics with Cumulus Linux describes how to extend the set of metrics exported by the sFlow agent and used the export of BGP metrics as an example. This article demonstrates how environmental metrics (power supplies, temperatures, fan speeds etc.) can be exported.

The smonctl command can be used to dump sensor data as JSON formatted text:
cumulus@cumulus$ smonctl -j
[
    {
        "pwm_path": "/sys/devices/soc.0/ffe03100.i2c/i2c-1/1-004d", 
        "all_ok": "1", 
        "driver_hwmon": [
            "fan1"
        ], 
        "min": 2500, 
        "cpld_path": "/sys/devices/ffe05000.localbus/ffb00000.CPLD", 
        "state": "OK", 
        "prev_state": "OK", 
        "msg": null, 
        "input": 8998, 
        "type": "fan", 
        "pwm1": 121, 
        "description": "Fan1", 
        "max": 29000, 
        "start_time": 1450228330, 
        "var": 15, 
        "pwm1_enable": 0, 
        "prev_msg": null, 
        "log_time": 1450228330, 
        "present": "1", 
        "target": 0, 
        "name": "Fan1", 
        "fault": "0", 
        "pwm_hwmon": [
            "pwm1"
        ], 
        "driver_path": "/sys/devices/soc.0/ffe03100.i2c/i2c-1/1-004d", 
        "div": "4", 
        "cpld_hwmon": [
            "fan1"
        ]
    },
    ... 
The following Python script, smon_sflow.py, invokes the command, parses the output, and posts a set of custom sFlow metrics:
#!/usr/bin/env python
import json
import socket
from subprocess import check_output

res = check_output(["/usr/sbin/smonctl","-j"])
smon = json.loads(res)
fan_maxpc = 0
fan_down = 0
fan_up = 0
psu_down = 0
psu_up = 0
temp_maxpc = 0
temp_up = 0
temp_down = 0
for s in smon:
  type = s["type"]
  if(type == "fan"):
    if "OK" == s["state"]:
      fan_maxpc = max(fan_maxpc, 100 * s["input"]/s["max"])
      fan_up = fan_up + 1
    else:
      fan_down = fan_down + 1
  elif(type == "power"):
    if "OK" == s["state"]:
      psu_up = psu_up + 1
    else:
      psu_down = psu_down + 1
  elif(type == "temp"):
    if "OK" == s["state"]:
      temp_maxpc = max(temp_maxpc, 100 * s["input"]/s["max"])
      temp_up = temp_up + 1
    else:
      temp_down = temp_down + 1

metrics = {
  "datasource":"smon",
  "fans-max-pc" : {"type":"gauge32", "value":int(fan_maxpc)},
  "fans-up-pc"  : {"type":"gauge32", "value":int(100 * fan_up / (fan_down + fan_up))},
  "psu-up-pc"   : {"type":"gauge32", "value":int(100 * psu_up / (psu_down + psu_up))},
  "temp-max-pc" : {"type":"gauge32", "value":int(temp_maxpc)},
  "temp-up-pc"  : {"type":"gauge32", "value":int(100.0 * temp_up / (temp_down + temp_up))}
}
msg = {"rtmetric":metrics}
sock = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
sock.sendto(json.dumps(msg),("127.0.0.1",36343))
Note: Make sure the following line is uncommented in the /etc/hsflowd.conf file in order to receive custom metrics. If the file is modified, restart hsflowd for the changes to take effect.
  jsonPort = 36343
Adding the following cron entry runs the script every minute:
* * * * * /home/cumulus/smon_sflow.py > /dev/null 2>&1
This example requires Host sFlow version 1.28.3 or later. This is newer than the version of Host sFlow that currently ships with Cumulus Linux 2.5.5. However, Cumulus Linux is an open platform, so the software can be compiled from sources in just the same way you would on a server:
sudo sh -c 'echo "deb http://ftp.us.debian.org/debian wheezy main contrib" > /etc/apt/sources.list.d/deb.list'
sudo apt-get update
sudo apt-get install gcc make libc-dev
wget https://github.com/sflow/host-sflow/archive/v1.28.3.tar.gz
tar -xvzf v1.28.3.tar.gz
cd host-sflow-1.28.3
make CUMULUS=yes
make deb CUMULUS=yes
sudo dpkg -i hsflowd_1.28.3-1_ppc.deb
The sFlow-RT chart at the top of this page shows a trend chart of the environment metrics. Each metrics has been constructed as a percentage, so they can all be combined on the chart.

While custom metrics are useful, they don't capture the semantics of the data and will vary in form and content. In the case of environmental metrics, a standard set of metrics would add significant value since many different types of device include environmental sensors and a common set of measurements from all networked devices would provide a comprehensive view of power, temperature, humidity, and cooling. Anyone interested in developing a standard sFlow export for environmental metrics can contribute ideas on the sFlow.org mailing list.

Saturday, December 12, 2015

Custom events

Measuring Page Load Speed with Navigation Timing describes the standard instrumentation built into web browsers. This article will use navigation timing as an example to demonstrate how custom sFlow events augment standard sFlow instrumentation embedded in network devices, load balancers, hosts and web servers.

The JQuery script can be embedded in a web page to provide timing information:
$(window).load(function(){
 var samplingRate = 10;
 if(samplingRate !== 1 && Math.random() > (1/samplingRate)) return;

 setTimeout(function(){
   if(window.performance) {
     var t = window.performance.timing;
     var msg = {
       sampling_rate : samplingRate,
       t_url         : {type:"string",value:window.location.href},
       t_useragent   : {type:"string",value:navigator.userAgent},
       t_loadtime    : {type:"int32",value:t.loadEventEnd-t.navigationStart},
       t_connecttime : {type:"int32",value:t.responseEnd-t.requestStart} 
     };
     $.ajax({
       url:"/navtiming.php",
       method:"PUT",
       contentType:"application/json",
       data:JSON.stringify(msg) 
     });
    }
  }, 0);
});
The script supports random sampling. In this case a samplingRate of 10 means that, on average, 1-in-10 page hits will generate a measurement record. Measurement records are sent back to the server where the navtiming.php script acts as a gateway, augmenting the measurements and sending them as custom sFlow events.
<?php
$rawInput = file_get_contents("php://input");
$rec = json_decode($rawInput);
$rec->datasource = "navtime";
$rec->t_ip = array("type" => "ip", "value" => $_SERVER['REMOTE_ADDR']);

$msg=array("rtflow"=>$rec);
$sock = fsockopen("udp://localhost",36343,$errno,$errstr);
if(! $sock) { return; }
fwrite($sock, json_encode($msg));
fclose($sock);
?>
In this case the remote IP address associated with the client browser is added to the measurement before it is formatted as a JSON rtflow message and sent to the Host sFlow agent (hsflowd) running on the web server host. The Host sFlow agent encodes the data as an sFlow structure and sends it to the sFlow collector as part of the telemetry stream.

The following sflowtool output verifies that the metrics are being received at the sFlow Analyzer:
startSample ----------------------
sampleType_tag 4300:1003
sampleType RTFLOW
rtflow_datasource_name navtime
rtflow_sampling_rate 1
rtflow_sample_pool 0
rtflow t_url = (string) "http://10.0.0.84/index.html"
rtflow t_useragent = (string) "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"
rtflow t_loadtime = (int32) 115
rtflow t_connecttime = (int32) 27
rtflow t_ip = (ip) 10.1.1.63
endSample   ----------------------
A more interesting way to consume the data is to use sFlow-RT. For example, the following REST API call programs the sFlow-RT analytics pipeline to create a metric that tracks average t_loadtime by URL:
curl -H "Content-Type:application/json" -X PUT --data '{keys:"t_url",value:"avg:t_loadtime",t:15}' http://localhost:8008/flow/urlloadtime/json
The following query can be used to retrieves the resulting metric value:
curl http://localhost:8008/metric/ALL/urlloadtime/json
[{
 "agent": "10.0.0.84",
 "dataSource": "navtime",
 "lastUpdate": 1807,
 "lastUpdateMax": 1807,
 "lastUpdateMin": 1807,
 "metricN": 1,
 "metricName": "urlloadtime",
 "metricValue": 11.8125,
 "topKeys": [{
  "key": "http://10.0.0.84/index.html",
  "lastUpdate": 1807,
  "value": 11.8125
 }]
}]
RESTflow describes the sFlow-RT REST API used to create flow definitions and access flow based metrics and Defining Flows provides reference material.

Installing mod-sflow provides a rich set of transaction and counter metrics from the Apache web server, including information on worker threads, see Thread pools.

Telemetry from Apache, Host sFlow and the custom events are all combined at the sFlow analyzer. For example, the following query pulls together the load average on the server, with Apache thread pool utilization and URL load times:
curl http://localhost:8008/metric/10.0.0.84/load_one,workers_utilization,urlloadtime/json
[
 {
  "agent": "10.0.0.84",
  "dataSource": "2.1",
  "lastUpdate": 23871,
  "lastUpdateMax": 23871,
  "lastUpdateMin": 23871,
  "metricN": 1,
  "metricName": "load_one",
  "metricValue": 0
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "3.80",
  "lastUpdate": 4123,
  "lastUpdateMax": 4123,
  "lastUpdateMin": 4123,
  "metricN": 1,
  "metricName": "workers_utilization",
  "metricValue": 0.390625
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "navtime",
  "lastUpdate": 8821,
  "lastUpdateMax": 8821,
  "lastUpdateMin": 8821,
  "metricN": 1,
  "metricName": "urlloadtime",
  "metricValue": 91.81072992715491,
  "topKeys": [{
   "key": "http://10.0.0.84/index.html",
   "lastUpdate": 8821,
   "value": 91.81072992715491
  }]
 }
]
The article, Cluster performance metrics, describes the metric API in more detail. Additional sFlow-RT APIs can be used to send data to a variety of DevOps tools, including: Ganglia, Graphite, InfluxDB and Grafana, Logstash, Splunk, cloud analytics services.

Finally, standard sFlow instrumentation is also widely implemented by physical and virtual network devices. Combining data from all these sources provides a comprehensive real-time view of applications and the compute, storage and networking resources that the applications depend on.

Thursday, December 10, 2015

Custom metrics with Cumulus Linux

Cumulus Networks, sFlow and data center automation describes how Cumulus Linux is monitored using the open source Host sFlow agent that supports Linux, Windows, FreeBSD, Solaris, and AIX operating systems and KVM, Xen, XCP, XenServer, and Hyper-V hypervisors, delivering a standard set of performance metrics from switches, servers, hypervisors, virtual switches, and virtual machines.

Host sFlow version 1.28.3 adds support for Custom Metrics. This article demonstrates how the extensive set of standard sFlow measurements can be augmented using custom metrics.

Recent releases of Cumulus Linux simplify the task by making machine readable JSON a supported output in command line tools. For example, the cl-bgp tool can be used to dump BGP summary statistics:
cumulus@leaf1$ sudo cl-bgp summary show json
{ "router-id": "192.168.0.80", "as": 65080, "table-version": 5, "rib-count": 9, "rib-memory": 1080, "peer-count": 2, "peer-memory": 34240, "peer-group-count": 1, "peer-group-memory": 56, "peers": { "swp1": { "remote-as": 65082, "version": 4, "msgrcvd": 52082, "msgsent": 52084, "table-version": 0, "outq": 0, "inq": 0, "uptime": "05w1d04h", "prefix-received-count": 2, "prefix-advertised-count": 5, "state": "Established", "id-type": "interface" }, "swp2": { "remote-as": 65083, "version": 4, "msgrcvd": 52082, "msgsent": 52083, "table-version": 0, "outq": 0, "inq": 0, "uptime": "05w1d04h", "prefix-received-count": 2, "prefix-advertised-count": 5, "state": "Established", "id-type": "interface" } }, "total-peers": 2, "dynamic-peers": 0 }
The following Python script, bgp_sflow.py, invokes the command, parses the output, and posts a set of custom sFlow metrics:
#!/usr/bin/env python
import json
import socket
from subprocess import check_output

res = check_output(["/usr/bin/cl-bgp","summary","show","json"])
bgp = json.loads(res)
metrics = {
  "datasource":"bgp",
  "bgp-router-id"    : {"type":"string", "value":bgp["router-id"]},
  "bgp-as"           : {"type":"string", "value":str(bgp["as"])},
  "bgp-total-peers"  : {"type":"gauge32", "value":bgp["total-peers"]},
  "bgp-peer-count"   : {"type":"gauge32", "value":bgp["peer-count"]},
  "bgp-dynamic-peers": {"type":"gauge32", "value":bgp["dynamic-peers"]},
  "bgp-rib-memory"   : {"type":"gauge32", "value":bgp["rib-memory"]},
  "bgp-rib-count"    : {"type":"gauge32", "value":bgp["rib-count"]},
  "bgp-peer-memory"  : {"type":"gauge32", "value":bgp["peer-memory"]},
  "bgp-msgsent"      : {"type":"counter32", "value":sum(bgp["peers"][c]["msgsent"] for c in bgp["peers"])},
  "bgp-msgrcvd"      : {"type":"counter32", "value":sum(bgp["peers"][c]["msgrcvd"] for c in bgp["peers"])}
}
msg = {"rtmetric":metrics}
sock = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
sock.sendto(json.dumps(msg),("127.0.0.1",36343))
Adding the following cron entry runs the script every minute:
* * * * * /home/cumulus/bgp_sflow.py > /dev/null 2>&1
The new metrics will now arrive at the sFlow collector. The following sflowtool output verifies that the metrics are being received:
startSample ----------------------
sampleType_tag 4300:1002
sampleType RTMETRIC
rtmetric_datasource_name bgp
rtmetric bgp-as = (string) "65080"
rtmetric bgp-rib-count = (gauge32) 9
rtmetric bgp-dynamic-peers = (gauge32) 0
rtmetric bgp-rib-memory = (gauge32) 1080
rtmetric bgp-peer-count = (gauge32) 2
rtmetric bgp-router-id = (string) "192.168.0.80"
rtmetric bgp-total-peers = (gauge32) 2
rtmetric bgp-msgrcvd = (counter32) 104648
rtmetric bgp-msgsent = (counter32) 104651
rtmetric bgp-peer-memory = (gauge32) 34240
endSample   ----------------------
A more interesting way to consume this data is using sFlow-RT. The diagram above shows a leaf and spine network built using CumuluxVX virtual machines that was used for a Network virtualization visibility demo. Installing the bgp_sflow.py script on each switch adds centralized visibility into fabric wide BGP statistics.

For example, the following sFlow-RT REST API command returns the total bgp messages sent and received summed across all switches:
$ curl http://10.0.0.86:8008/metric/ALL/sum:bgp-msgrcvd,sum:bgp-msgsent/json
[
 {
  "lastUpdateMax": 20498,
  "lastUpdateMin": 20359,
  "metricN": 4,
  "metricName": "sum:bgp-msgrcvd",
  "metricValue": 0.10000302901465385
 },
 {
  "lastUpdateMax": 20498,
  "lastUpdateMin": 20359,
  "metricN": 4,
  "metricName": "sum:bgp-msgsent",
  "metricValue": 0.10000302901465385
 }
]
The custom metrics are fully integrated with all the other sFlow metrics, for example, the following query returns the host_name, bgp-as and load_one metrics associated with bgp-router-id 192.168.0.80:
$ curl http://10.0.0.86:8008/metric/ALL/host_name,bgp-as,load_one/json?bgp-router-id=192.168.0.80
[
 {
  "agent": "10.0.0.80",
  "lastUpdate": 12194,
  "lastUpdateMax": 12194,
  "lastUpdateMin": 12194,
  "metricN": 1,
  "metricName": "host_name",
  "metricValue": "leaf1"
 },
 {
  "agent": "10.0.0.80",
  "dataSource": "bgp",
  "lastUpdate": 22232,
  "lastUpdateMax": 22232,
  "lastUpdateMin": 22232,
  "metricN": 1,
  "metricName": "bgp-as",
  "metricValue": "65080"
 },
 {
  "agent": "10.0.0.80",
  "lastUpdate": 12194,
  "lastUpdateMax": 12194,
  "lastUpdateMin": 12194,
  "metricN": 1,
  "metricName": "load_one",
  "metricValue": 0
 }
]
The article, Cluster performance metrics, describes the metric API in more detail. Additional sFlow-RT APIs can be used to send data to a variety of DevOps tools, including: Ganglia, Graphite, InfluxDB and Grafana, Logstash, Splunkcloud analytics services.

Software, documentation, applications, and community support is available on sFlow-RT.com. For example, the sFlow-RT Fabric View application shown in the screen capture calculates and displays fabric wide traffic analytics.

Tuesday, December 8, 2015

Using a proxy to feed metrics into Ganglia

The GitHub gmond-proxy project demonstrates how a simple proxy can be used to map metrics retrieved through a REST API into Ganglia's gmond TCP protocol.
The diagram shows the elements of the Ganglia monitoring system. The Ganglia server contains runs the gmetad daemon that polls for data from gmond instances and stores time series data. Trend charts are presented through the web interface. The transparent gmond-proxy replaces a native gmond daemon and delivers metrics in response to gmetad's polling requests.

The following commands install the proxy on the sFlow collector - an Ubuntu 14.04 system that is already runnig sFlow-RT:
wget https://raw.githubusercontent.com/sflow-rt/gmond-proxy/master/gmond_proxy.py
sudo mv gmond_proxy.py /etc/init.d/
sudo chown root:root /etc/init.d/gmond_proxy.py
sudo chmod 755 /etc/init.d/gmond_proxy.py
sudo service gmond_proxy.py start
sudo update-rc.d gmond_proxy.py start
The following commands install Ganglia's gmetad collector and web user interface on the Ganglia server - an Ubuntu 14.04 system:
sudo apt-get install gmetad
sudo apt-get install ganglia-webfrontend
cp /etc/ganglia-webfrontend/apache.conf /etc/apache2/sites-enabled
Next edit the /etc/ganglia/gmetad.conf file and configure the proxy as a data source:
data_source "my cluster" sflow-rt
Restart the Apache and gmetad daemons:
sudo service gmetad restart
sudo service apache2 restart
The Ganglia web user interface, shown in the screen capture, is now available at http://server/ganglia/

Ganglia natively supports sFlow, so what are some of the benefits of using the proxy? Firstly, the proxy allows metrics to be filtered, reducing the amount of data logged and increasing the scaleability of the Ganglia collector. Secondly, sFlow-RT generates traffic flow metrics, making them available to Ganglia. Finally, Ganglia is typically used in conjunction with additional monitoring tools that can all be driven using the analytics stream generated by sFlow-RT.

The diagram above shows how the sFlow-RT analytics engine is used to deliver metrics and events to cloud based and on-site DevOps tools, see: Cloud analytics,  InfluxDB and Grafana, Metric export to Graphite, and Exporting events using syslog. There are important scaleability and cost advantages to placing the sFlow-RT analytics engine in front of metrics collection applications as shown in the diagram. For example, in large scale cloud environments the metrics for each member of a dynamic pool are not necessarily worth trending since virtual machines are frequently added and removed. Instead, sFlow-RT can be configured to track all the members of the pool, calculate summary statistics for the pool, and log summary statistics. This pre-processing can significantly reduce storage requirements, reduce costs and increase query performance.

Monday, December 7, 2015

Broadcom BroadView Instrumentation

The diagram above, from the BroadView™ 2.0 Instrumentation Ecosystem presentation, illustrates how instrumentation built into the network Data Plane (the Broadcom Trident/Tomahawk ASICs used in most data center switches) provides visibility to Software Defined Networking (SDN) controllers so that they can optimize network performance.
The sFlow measurement standard provides open, scaleable, multi-vendor, streaming telemetry that supports SDN applications. Broadcom has been augmenting the rich set of counter and flow measurements in the base sFlow standard with additional metrics. For example, Broadcom ASIC table utilization metrics, DevOps, and SDN describes metrics that were added to track ASIC table resource consumption.

The highlighted Buffer congestion state / statistics capability in the slide refers to the BroadView Buffer Statistics Tracking (BST) instrumentation. The Memory Management Unit (MMU) is on-chip logic that manages how the on-chip packet buffers are organized.  BST is a feature that enables tracking the usage of these buffers. It includes snapshot views of peak utilization of the on-chip buffer memory across queues, ports, priority group, service pools and the entire chip.
The above chart from the Broadcom technical brief, Building an Open Source Data Center Monitoring Tool Using Broadcom BroadView™ Instrumentation Software, shows buffer utilization trended over an hour.

While the trend chart is useful, the value of BST instrumentation is fully realized when the data is integrated into the sFlow telemetry stream, allowing buffer utilizations to be correlated with traffic flows consuming the buffers. Broadcom's recently published sFlow extension, sFlow Broadcom Peak Buffer Utilization Structures, standardizes the export of the buffer metrics, ensures multi-vendor interoperability, and providing the comprehensive, actionable, telemetry from the network required by SDN applications.

Ask switch vendors about their plans to support the extension in their sFlow implementations. The enhanced visibility into buffer utilization addresses a number of important use cases:
  • Fabric-wide visibility into peak buffer utilization
  • Derive worst end to end case latency
  • Pro-actively track microbursts and identify hot spots before packets are lost
  • Correlate with traffic flows and link utilizations
  • Improve performance through QoS marking, load spreading, and workload placement

Wednesday, December 2, 2015

DDoS Blackhole

DDoS Blackhole has been released on GitHub, https://github.com/sflow-rt/ddos-blackhole. The application detects Distributed Denial of Service (DDoS) flood attacks in real-time and can automatically install a null / blackhole route to drop the attack traffic and maintain Internet connectivity. See DDoS for additional background.

The screen capture above shows a simulated DNS amplification attack. The Top Targets chart is a real-time view of external traffic to on-site IP addresses. The red line indicates the threshold that has been set at 10,000 packets per second and it is clear that traffic to address 192.168.151.4 exceeds the threshold. The Top Protocols chart below shows that the increase in traffic is predominantly DNS. The Controls chart shows that a control was added the instant the traffic crossed the threshold.
The Controls tab shows a table of the currently active controls. In this case, the controller is running in Manual mode and is listed with a pending status as it awaits manual confirmation (which is why the attack traffic persists in the Charts page). Clicking on the entry brings up a form that can be used to apply the control.
The chart above from the DDoS article shows an actual attack where the controller automatically dropped the attack traffic.
The basic settings are straightforward, allowing the threshold, duration, mode of operation and protected address ranges to be set.

Controls are added and removed by calling an external TCL/Expect script which logs into the site router and applies the following CLI command to drop traffic to the targeted address:
ip route target_ip/32 null0 name "DOS ATTACK"
The script can easily be modified or replaced to apply different controls or to work with different vendor CLIs.

Additional instructions are available under the Help tab. Instructions for downloading and installing the DDoS Blackhole application are available on sFlow-RT.com.

The software will work on any site with sFlow capable switches, even if the router itself doesn't support sFlow. Running the application in Manual mode is a completely safe way to become familiar with the software features and get an understanding of normal traffic levels. Download the software and give it a try.

Saturday, November 21, 2015

OVN service injection demonstration

Enabling extensibility in OVN, by Gal Sagie, Huawei and Liran Schour, IBM, Open vSwitch 2015 Fall Conference describes a method for composing actions from an external application with actions installed by the Open Network Virtualization (OVN) controller.


An API allows services to be attached to logical topology elements in the OVN logical topology, resulting in a table in the OVN logical flow table that is under the controller of the external service. Changes to the logical table are then automatically instantiated as concrete flows in the Open vSwitch instances responsible for handling the packets in the flow.

The demo presented involves detecting large "Elephant" flows using sFlow instrumentation embedded in Open vSwitch. Once a large flow is detected, logical flows are instantiated in the OVN controller to mark the packets. The concrete marking rules are inserted in the Open vSwitch packet processing pipelines handling the logical flow's packets. In the demo, the marked packets are then diverted by the physical network to a dedicated optical circuit.

There are a number of interesting traffic control use cases described on this blog that could leverage the capabilities of Open vSwitch using this approach:
The logical flow abstraction implemented by OVN greatly simplifies the problem of composing policies needed to integrate service injection and chaining within the packet pipeline and is a very promising platform for tackling this class of problem.

Many service injection use cases, including the hybrid packet/optical case demonstrated in this talk, need to be informed by measurements made throughout the network, for example, you wouldn't want to divert traffic to the optical link if it was already at capacity. New OVS instrumentation features aimed at real-time monitoring of virtual networks, another talk from the Open vSwitch conference, describes how sFlow measurements from Open vSwitch and physical network switches can be combined to provide the comprehensive visibility needed for this type of traffic control.
There are significant advantages to applying service injection actions in the virtual switches, primarily avoiding the resource constraints imposed by physical hardware that limit the number of and types of rules that can be applied. However, some services are location dependent and so the external application may need direct access to the injected table on the individual virtual switches. This could be achieved by exposing the tables through the OVN daemons running on the hypervisors, or directly via OpenFlow connections to the Open vSwitch instances to allow the external application to control only the tables created for it.

The OpenFlow approach could be implemented through the OVN Northbound database, attaching the external service specific tables to the virtual network topology and assigning an OpenFlow controller to manage those tables. Once the changes propagate down to the vSwitches, they will create an additional OpenFlow connection to the designated controller that expose the assigned table. This notion of "slicing" resources is one of the earliest use cases for OpenFlow, see FlowVisor. In addition to supporting location dependent use cases, this approach offloads runtime control of delegated tables from OVN, decoupling the performance of the injected services from OVN applied controls.

Friday, November 20, 2015

Open vSwitch 2015 Fall Conference

Open vSwitch is an open source software virtual switch that is popular in cloud environments such as OpenStack. Open vSwitch is a standard Linux component that forms the basis of a number of commercial and open source solutions for network virtualization, tenant isolation, and network function virtualization (NFV) - implementing distributed virtual firewalls and routers.

The recent Open vSwitch 2015 Fall Conference agenda included a wide variety speakers addressing a range of topics, including: Open Network Virtualization (OVN), containers, service chaining,  and network function virtualization (NFV).

The video above is a recording of the following sFlow related talk from the conference:
New OVS instrumentation features aimed at real-time monitoring of virtual networks (Peter Phaal, InMon)
The talk will describe the recently added packet-sampling mechanism that returns the full list of OVS actions from the kernel. A demonstration will show how the OVS sFlow agent uses this mechanism to provide real-time tunnel visibility. The motivation for this visibility will be discussed, using examples such as end-to-end troubleshooting across physical and virtual networks, and tuning network packet paths by influencing workload placement in a VM/Container environment.
This talk is a follow up to an Open vSwitch 2014 Fall Conference talk on the role of monitoring in building feedback control systems.

Slides and videos for all the conference talks are available on the Open vSwitch web site.

Wednesday, November 18, 2015

Network virtualization visibility demo

New OVS instrumentation features aimed at real-time monitoring of virtual networks, Open vSwitch 2015 Fall Conference, included a demonstration of real-time visibility into the logical network overlays created by network virtualization, virtual switches, and the leaf and spine underlay carrying the tunneled traffic between hosts.

The diagram above shows the demonstration testbed. It consists of a leaf and spine network connecting two hosts, each of which is running a pair of Docker containers connected to Open vSwitch (OVS). The vSwitches are controlled by Open Virtual Network (OVN), which has been configured to create two logical switches, the first connecting the left most containers on each host and the second connecting the right most containers. The testbed is described in more detail in Open Virtual Network (OVN) and is built from free components and can easily be replicated.


The dashboard in the video illustrates the end to end visibility that is possible by combining standard sFlow instrumentation in the physical switches with sFlow instrumentation in Open vSwitch and Host sFlow agents on the servers.

The diagram on the left of the dashboard shows a logical map of the elements in the testbed. The top panel shows the two logical switches created in OVN, sw1 connecting the left containers and sw0 connecting the right containers. The dotted lines represent the logical switches ports and their width shows the current traffic rate flowing over the logical link.

The solid lines below show the path that the virtual network traffic actually takes. From a container on Server1 to the virtual switch (OVS), where it is encapsulated in a Geneve tunnel and sent via leaf1, spine1, and leaf2 to the OVS instance on Server2, which decapsulates the traffic and delivers it to the other container on the logical switch.

Both logical networks share the same physical network and this is shown be link color. Traffic associated with sw1 is shown as red, traffic associated with sw0 is shown as blue, and a mixture of sw1 and sw0 traffic is shown as magenta.

The physical network is a shared resource for the virtual networks that make use of it. Understanding how this resource is being utilized is essential to ensure that virtual networks do not interfere with each other, for example, one tenant backing up data over their virtual network may cause unacceptable response time problems for another tenant.

The strip charts to the right of the diagram show representative examples of the data that is available and demonstrate comprehensive visibility into, and across, layers in the virtualization stack. Going from top to bottom:
  • Container CPU Utilization The trend chart shows the per container CPU load for the containers. This data comes from the Host sFlow agents.
  • Container Traffic This trend chart merges sFlow data from the Host sFlow and Open vSwitch to show traffic flowing between containers.
  • OVN Virtual Switch Traffic This trend chart merges data from the OVN Northbound interface (specifically, logical port MAC addresses and logical switch names) and Open vSwitch to show traffic flowing through each logical switch.
  • Open vSwitch Performance This trend chart shows key performance indicators based on metrics exported by the Open vSwitches, see Open vSwitch performance monitoring.
  • Leaf/Spine Traffic This chart combines data from all the switches in the leaf / spine network to show traffic flows. The chart demonstrates that sFlow from the physical network devices provides visibility into the outer (tunnel) addresses and inner (tenant/virtual network) addresses, see Tunnels.
The visibility shown in the diagram and charts is only possible because all the elements of the infrastructure are instrumented using sFlow. No single element or layer has a complete picture, but when you combine information from all the elements a full picture emerges, see Visibility and the software defined data center

The demonstration is available on GitHub. The following steps will download the demo and provide data that can be used to explore the sFlow-RT APIs described in Open Virtual Network (OVN) that were used to construct this dashboard.
wget http://www.inmon.com/products/sFlow-RT/sflow-rt.tar.gz
tar -xvzf sflow-rt.tar.gz
cd sflow-rt
./get-app.sh pphaal ovs-2015
Edit the start.sh file to playback the included captured sFlow data:
#!/bin/sh

HOME=`dirname $0`
cd $HOME

JAR="./lib/sflowrt.jar"
JVM_OPTS="-Xincgc -Xmx200m"
RT_OPTS="-Dsflow.port=6343 -Dhttp.port=8008 -Dsflow.file=app/ovs-2015/demo.pcap"
SCRIPTS="-Dscript.file=init.js"

exec java ${JVM_OPTS} ${RT_OPTS} ${SCRIPTS} -jar ${JAR}
Start sFlow-RT:
[user@server sflow-rt]$ ./start.sh 
2015-11-18T13:22:14-0800 INFO: Reading PCAP file, app/ovs-2015/demo.pcap
2015-11-18T13:22:15-0800 INFO: Starting the Jetty [HTTP/1.1] server on port 8008
2015-11-18T13:22:15-0800 INFO: Starting com.sflow.rt.rest.SFlowApplication application
2015-11-18T13:22:15-0800 INFO: Listening, http://localhost:8008
2015-11-18T13:22:15-0800 INFO: init.js started
2015-11-18T13:22:15-0800 INFO: app/ovs-2015/scripts/status.js started
2015-11-18T13:22:16-0800 INFO: init.js stopped
Finally, access the web interface http://server:8008/app/ovs-2015/html/ and you should see the screen shown in the video.

The sFlow monitoring technology scales to large production networks. The instrumentation is built into physical switch hardware and is available in 1G/10G/25G/40G/50G/100G data center switches from most vendors (see sFlow.org). The sFlow instrumentation in Open vSwitch is built into the Linux kernel and is an extremely efficient method of monitoring large numbers of virtual machines and/or containers.

The demonstration dashboard illustrates the type of operational visibility that can be delivered using sFlow. However, for large scale deployments, sFlow data can be incorporated into existing DevOps tool sets to augment data that is already being collected.
The diagram above shows how the sFlow-RT analytics engine is used to deliver metrics and events to cloud based and on-site DevOps tools, see: Cloud analytics,  InfluxDB and Grafana, Metric export to Graphite, and Exporting events using syslog. There are important scaleability and cost advantages to placing the sFlow-RT analytics engine in front of metrics collection applications as shown in the diagram. For example, in large scale cloud environments the metrics for each member of a dynamic pool are not necessarily worth trending since virtual machines are frequently added and removed. Instead, sFlow-RT can be configured to track all the members of the pool, calculates summary statistics for the pool, and log summary statistics. This pre-processing can significantly reduce storage requirements, reduce costs and increase query performance.

Friday, November 13, 2015

SC15 live real-time weathermap

Connect to http://inmon.sc15.org/sflow-rt/app/sc15-weather/html/ between now and November 19th to see a real-time heat map of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15) network.

From the SCinet web page, "SCinet brings to life a very high-capacity network that supports the revolutionary applications and experiments that are a hallmark of the SC conference. SCinet will link the convention center to research and commercial networks around the world. In doing so, SCinet serves as the platform for exhibitors to demonstrate the advanced computing resources of their home institutions and elsewhere by supporting a wide variety of bandwidth-driven applications including supercomputing and cloud computing."

The real-time weathermap leverages industry standard sFlow instrumentation built into network switch and router hardware to provide scaleable monitoring of the over 6 Terrabit/s aggregate link capacity comprising the SCinet network. Link colors are updated every second to reflect operational status and utilization of each link.

Clicking on a link in the map pops up a 1 second resolution strip chart showing the protocol mix carried by the link.

The SCinet real-time weathermap was constructed using open source components running on the sFlow-RT real-time analytics engine. Download sFlow-RT and see what you can build.

Update December 1, 2015 The source code is now available on GitHub

Wednesday, November 11, 2015

sFlow Test

sFlow Test has been released on GitHub, https://github.com/sflow-rt/sflow-test. The suite of checks is intended to validate the implementation of sFlow on a data center switch. In particular, the tests are designed to verify that the sFlow agent implementation provides measurements under load with the accuracy needed to drive SDN control applications, including:
Many of the tests can be run while the switches are in production and are a useful way of verifying that a switch is configured and operating correctly.

The stress tests can be scaled to run without specialized equipment. For example, the recommended sampling rate for 10G links in production is 1-in-10,000. Driving a switch with 48x10G ports to 30% of total capacity would require a load generator capable of generating 288Gbit/s. However, dropping the sampling rate to 1-in-100 and generating a load of 2.88Gbit/s is an equivalent test of the sFlow agent's performance and can be achieved by two moderately powerful servers with 10G network adapters.

For example, using the test setup above, run an iperf server on Server2:
iperf -su
Then run the following sequence of tests on Server1:
#!/bin/bash
RT="10.0.0.162"
TGT="server2"

ping -f -c 100000 $TGT
sleep 40
curl http://$RT:8008/app/sflow-test/scripts/test.js/load/json?bps=100000000
iperf -c $TGT -t 40 -u -b 100M
curl http://$RT:8008/app/sflow-test/scripts/test.js/load/json?bps=0
sleep 40
curl http://$RT:8008/app/sflow-test/scripts/test.js/load/json?bps=200000000
iperf -c $TGT -t 40 -u -b 200M
curl http://$RT:8008/app/sflow-test/scripts/test.js/load/json?bps=0
sleep 40
curl http://$RT:8008/app/sflow-test/scripts/test.js/load/json?bps=300000000
iperf -c $TGT -t 40 -u -b 300M
curl http://$RT:8008/app/sflow-test/scripts/test.js/load/json?bps=0
The results of the test are shown in the screen capture at the top of this article. The test pattern is based on the article Large flow detection and the sequence of steps in Load (show in gold) are accurately tracked by the Flows metric (shown in red). The Counters metric (shown in blue) doesn't accurately track the shape of the load and is delayed, but the peaks are consistent with Load and Flows values.

The table below the two strip charts shows that all the tests have passed:

  • test duration The test must run for at least 5 minutes (300 seconds)
  • check sequence numbers Checks that sequence numbers for datagrams and measurements are correctly reported and that measurements are delivered in order
  • check data sources Checks that sampling rates and polling intervals are correctly configured and that data is being received
  • sampled packet size Checks to make sure that sampled packet sizes are correctly reported
  • random number generator Tests the hardware random number generator used for packet sampling
  • compare byte flows and counters Checks that Bits per Second values (shown in the upper strip chart) are consistently reported by interface counters and by packet sampling
  • compare packet flows and counters Checks that Packets per Second values (shown in the lower strip chart) are consistently reported by interface counters and by packet sampling.
  • check ingress port information Verifies that ingress port information is included with packet samples

The project provide an opportunity for users and developers to collaborate on creating a set of tests that capture operational requirements and that can be used in product selection and development. An effective test application will help increase the quality of sFlow implementations and ensure that they address the need for measurement in SDN control applications.

Wednesday, October 28, 2015

Active Route Manager

SDN Active Route Manager has been released on GitHub, https://github.com/sflow-rt/active-routes. The software is based on the article White box Internet router PoC. Active Route Manager peers with a BGP route reflector to track prefixes and combines routing data with sFlow measurements to identify the most active prefixes. Active prefixes can be advertised via BGP to a commodity switch, which acts as a hardware route cache, accelerating the performance of a software router.
There is an interesting parallel with the Open vSwitch architecture, see Open vSwitch performance monitoring, which maintains a cache of active flows in the Linux kernel to accelerate forwarding. In the SDN routing case, active prefixes are pushed to the switch ASIC in order to bypass the slower software router.
In this example, the software is being used in passive mode, estimating the cache hit / miss rates without offloading routes. The software has been configured to manage a cache of 10,000 prefixes. The first screen shot shows the cache warming up.

The first panel shows routes being learned from the route reflector: the upper chart shows the approximately 600,000 routes being learned from the BGP route reflector, and the lower chart shows the rate at which routes are being added (peaking at just over 300,000 prefixes per second).

The second panel shows traffic analytics: the top chart shows how many of the prefixes are seeing traffic (shown in blue) as well as the number inactive prefixes that are covered by the active prefixes (shown in red). The lower chart shows the percentage of traffic destined to the active prefixes.

The third panel shows the behavior of the cache: the upper chart shows the total number of prefixes in the cache, the middle chart, the rate of prefix additions / removals from the cache, and the lower chart shows the cache miss rate, dropping to less than 10% within a couple of seconds of the BGP route reflector session being established.
The second screen shot shows the cache a few minutes later once it has warmed up and is in steady state. There are approximately 10,000 prefixes in the cache with prefixes being added and removed as traffic patterns change and routes are added and dropped. The estimated cache miss rate is less than 0.5% and misses are mostly due to newly active prefixes rather than recently deleted cache entries.

This example is a further demonstration that it is possible to use SDN analytics and control to combine the standard sFlow and BGP capabilities of commodity switch hardware and deliver Terabit routing capacity.

Friday, October 9, 2015

Fabric View

The Fabric View application has been released on Github, https://github.com/sflow-rt/fabric-view Fabric View provides real-time visibility into the performance of leaf and spine ECMP fabrics.
A leaf and spine fabric is challenging to monitor. The fabric spreads traffic across all the switches and links in order to maximize bandwidth. Unlike traditional hierarchical network designs, where a small number of links can be monitored to provide visibility, a leaf and spine network has no special links or switches where running CLI commands or attaching a probe would provide visibility. Even if it were possible to attach probes, the effective bandwidth of a leaf and spine network can be as high as a Petabit/second, well beyond the capabilities of current generation monitoring tools.

Fabric View solves the visibility challenge by using the industry standard sFlow instrumentation built into most data center switches. Fabric View represents the fabric as if it were a single large chassis switch, treating each leaf switch as a line card and the spine switches as the backplane. The result is an intuitive tool that is easily understood by anyone familiar with traditional networks.

Fabric View provides real-time, second-by-second visibility to traffic, identifying top talkers, protocols, tenants, tunneled traffic, etc. In addition, Fabric View reveals key fabric performance indicators such as number of congested spine links, links with colliding Elephant flows, discards and errors.

If you have a leaf / spine network, download the software and try it out. Community support is available on sFlow-RT.com.

Demo


If you don't have access to a live network, Fabric View includes data captured from the Cumulus Networks workbench network shown above, a 2 leaf / 2 spine 10Gbit/s network.

First, download sFlow-RT
wget http://www.inmon.com/products/sFlow-RT/sflow-rt.tar.gz
tar -xvzf sflow-rt.tar.gz
cd sflow-rt
Next, install Fabric View:
./get-app.sh sflow-rt fabric-view
Now edit the start.sh script to playback the captured packet trace:
#!/bin/sh

HOME=`dirname $0`
cd $HOME

JAR="./lib/sflowrt.jar"
JVM_OPTS="-Xincgc -Xmx200m -Dsflow.file=app/fabric-view/demo/ecmp.pcap"
RT_OPTS="-Dsflow.port=6343 -Dhttp.port=8008"
SCRIPTS="-Dscript.file=init.js"

exec java ${JVM_OPTS} ${RT_OPTS} ${SCRIPTS} -jar ${JAR}
Start sFlow-RT:
[user@server sflow-rt]$ ./start.sh 
2015-10-09T11:08:25-0700 INFO: Reading PCAP file, app/fabric-view/demo/ecmp.pcap
2015-10-09T11:08:26-0700 INFO: Starting the Jetty [HTTP/1.1] server on port 8008
2015-10-09T11:08:26-0700 INFO: Starting com.sflow.rt.rest.SFlowApplication application
2015-10-09T11:08:26-0700 INFO: Listening, http://localhost:8008
2015-10-09T11:08:26-0700 INFO: init.js started
2015-10-09T11:08:26-0700 INFO: app/fabric-view/scripts/fabric-view-stats.js started
2015-10-09T11:08:26-0700 INFO: app/fabric-view/scripts/fabric-view.js started
2015-10-09T11:08:26-0700 INFO: app/fabric-view/scripts/fabric-view-elephants.js started
2015-10-09T11:08:26-0700 INFO: app/fabric-view/scripts/fabric-view-usr.js started
2015-10-09T11:08:27-0700 INFO: init.js stopped
Now install the network topology and address groupings:
[user@server sflow-rt]$ curl -H 'Content-Type:application/json' -X PUT --data @app/fabric-view/demo/topology.json http://127.0.0.1:8008/app/fabric-view/scripts/fabric-view.js/topology/json
[user@server sflow-rt]$ curl -H 'Content-Type:application/json' -X PUT --data @app/fabric-view/demo/groups.json http://127.0.0.1:8008/app/fabric-view/scripts/fabric-view.js/groups/json
Finally, access the web interface http://server:8008/app/fabric-view/html/ and you should see the screen shown at the top of this article.

Real-time control

Visibility is just the starting point. Real-time detection of congested links and Elephant flows can be used to drive software defined networking (SDN) control actions to improve performance, for example, by marking and/or steering Elephant flows.

Leaf and spine traffic engineering using segment routing and SDN describes a demonstration shown at the 2015 Open Network Summit that integrated Fabric View with the ONOS controller to load balance Elephant flows.

The fabric-view-elephants.js script has three dummy function that are place holders for control actions:
function elephantStart(flowKey, rec) {
  // place holder to mark elephant flows
}

function linkBusy(linkDs, linkRec, now) {
  // place holder to steer elephant flows
}

function elephantEnd(flowKey, rec) {
  // place holder to remove marking
}
RESTful control of Cumulus Linux ACLs demonstrated how flows could be marked. Adding the following code to the fabric-view-elephants.js script demonstrates how the large flow marking function can be integrated in Fabric View:
var aclProtocol = "http";
var aclPort = 8080;
var aclRoot = '/acl/';
var mark_dscp = 10;
var mark_cos  = 5;
var id = 0;

var marking_enabled = true;

function newRequest(host, path) {
    var url = aclProtocol + "://" + host;
    if(aclPort) url += ":" + aclPort;
    url += aclRoot + path;
    var req = {"url":url};
    if(aclUser) req.user = aclUser;
    if(aclPasswd) req.password = aclPasswd;
    return req;
}

function submitRequest(req) {
    if(!req.error) req.error =  function(err) { logWarning("request error " + req.url + " error: " + err); };
    try { httpAsync(req); }
    catch(e) { logWarning('bad request, ' + req.url + " " + e); }
}

function elephantStart(flowKey, rec) {
   if(!marking_enabled) return;

   var [ipsrc,ipdst,proto,srcprt,dstprt] = flowKey.split(',');

   // only interested in TCP flows
   if(proto !== 6) return;

   var acl = [
     '[iptables]',
     '# marking Elephant flow',
     '-t mangle -A FORWARD --in-interface swp+ '
     + ' -s ' + ipsrc + ' -d ' + ipdst 
     + ' -p tcp --sport ' + srcprt + ' --dport ' + dstprt
     + ' -j SETQOS --set-dscp ' + mark_dscp + ' --set-cos ' + mark_cos
   ];

   var rulename = 'mark' + id++;
   var req = newRequest(rec.agent, '/acl/'+rulename);
   req.operation = 'PUT';
   req.headers = {
     "Content-Type":"application/json; charset=utf-8",
     "Accept":"application/json"
   };
   req.body = JSON.stringify(acl);
   req.error= function(res) {
      logWarning('mark failed=' + res + ' agent=' + rec.agent);
   };
   req.success = function(res) {
      logInfo('mark rule=' + rulename);
   };
   rec.rulename=rulename; 
   submitRequest(req);
}

function elephantEnd(flowKey, rec) {
   if(!rec.rulename) return;

   var req = newRequest(rec.agent, '/acl/'+rec.rulename);
   req.operation = 'DELETE';
   req.error = function(res) {
      logInfo('delete failed='+res);
   };
   req.success = function(res) {
      logInfo('delete rule='+rec.rulename);
   };
   submitRequest(req);
}
This example doesn't only apply to Cumulus Linux. It can easily be modified to interact with other vendor's switch API's, for example NX-API for Cisco Nexus 9k/3k switches, eAPI for Arista switches, or with SDN controller REST APIs, such as Floodlight, OpenDaylight, ONOS etc.

Update March 22, 2018: Fabric View is now available on Docker Hub. To run the demo using Docker, type the following command and point your browser at http://localhost:8008:
docker run --entrypoint /sflow-rt/run_demo.sh -p 8008:8008 sflow/fabric-view

Monday, September 28, 2015

Real-time analytics and control applications

sFlow-RT 2.0 released - adds application support describes a new application framework for sharing solutions built on top of the real-time analytics platform. Application examples are provided on the sFlow-RT Download page.

The flow-graph application, shown above, generates a real-time graph of communication between hosts.  The application uses a simple sFlow-RT script to track associations between hosts based on their communication patterns and plots the results using the vis.js dynamic, browser based visualization library. This example can be modified to track different types of relationship and extended to incorporate other popular data visualization libraries such as D3.js.
The dashboard-example includes representative real-time metric and top flows trend charts. The example uses the jQuery-UI library to build build a simple tabbed interface. This example can be extended to build groups of custom charts.
The top-flows application supports the definition of custom flows and tracks the largest flows in a continuously updating table.

Each of the examples has a server-side component that uses sFlow-RT's script API to collect, analyze, and export measurements. An HTML5 client side user interface connects to the server and presents the data.

The sFlow-RT analytics engine is a highly scaleable platform for processing sFlow measurements from physical and virtual network switches, servers, virtual machines, Linux containers, load balancers, web and application servers, etc. The analytics capability can be applied to a wide range of SDN and DevOps use cases - many of which have been described on this blog. Application support provides a simple way for vendors, researchers, and developers to distribute solutions.

Monday, September 21, 2015

Open Virtual Network (OVN)


Open Virtual Network (OVN) is an open source network virtualization solution built as part of the Open vSwitch (OVS) project. OVN provides layer 2/3 virtual networking and firewall services for connecting virtual machines and Linux containers.

OVN is built on the same architectural principles as VMware's commercial NSX and offers the same core network virtualization capability — providing a free alternative that is likely to see rapid adoption in open source orchestration systems, Mirantis: Why the Open Virtual Network (OVN) matters to OpenStack.

This article uses OVN as an example, describing a testbed which demonstrates how the standard sFlow instrumentation build into the physical and virtual switches provides the end-to-end visibility required to manage large scale network virtualization and deliver reliable services.

Open Virtual Network


The Northbound DB provides a way to describe the logical networks that are required. The database abstracts away implementation details which are handled by the ovn-northd and ovn-controllers and presents an easily consumable network virtualization service to orchestration tools like OpenStack.


The purple tables on the left describe a simple logical switch LS1 that has two logical ports LP1 and LP2 with MAC addresses AA and BB respectively. The green tables on the right show the Southbound DB that is constructed by combining information from the ovn-controllers on hypervisors HV1 and HV2 to build forwarding tables in the vSwitches that realize the virtual network.

Docker, OVN, OVS, ECMP Testbed

The diagram shows the virtual testbed that was created using virtual machines running under VirtualBox:
  • Physical Network The recent release of Cumulus VX by Cumulus Networks makes it possible to build realistic networks out of virtual machines. In this case we built a two-spine, two-leaf network using VirtualBox that provides L3 ECMP connectivity using BGP as the routing protocol, a configuration that is very similar to that used by large cloud providers. The green virtual machines leaf1, leaf2, spine1 and spine2 comprise the ECMP network.
  • Servers Server 1, Server 2 and the Orchestration Server virtual machines are ubuntu-14.04.3-server installations. Server 1 and Server 2 are connected to the physical network with addresses 192.168.1.1 and 192.168.2.1 respectively that will be used to form the underlay network. Docker has been installed on Server 1 and Server 2 and each server has two containers. The containers on Server 1 have been assigned addresses 172.16.1.1/00:00:00:CC:01:01 and 172.16.1.2/00:00:00:CC:01:02 and the containers on Server 2 have been assigned addresses 172.16.2.1/00:00:00:CC:02:01, 172.16.2.2/00:00:00:CC:02:02.
  • Virtual Network  Open vSwitch (OVS) was installed from sources on Server 1 and Server 2 along with ovn-controller daemons. The ovs-northd daemon was built and installed on the Orchestration Server. A single logical switch sw0 has been configured that connects the server1-container2 (MAC 00:00:00:CC:01:02) to server2-container2 (MAC 00:00:00:CC:02:02). 
  • Management Network The out of band management network shown in orange is a VirtualBox bridged network connecting management ports on the physical switches and servers to the Orchestration Server.
Pinging between Server 1, Container 2 (172.16.1.2) and Server 2, Container 2 (172.16.2.2) verifies that the logical network is operational.

Visibility

Enabling sFlow instrumentation in the testbed provides visibility into the physical and virtual network and server resources associated with the logical network.

Most physical switches support sFlow. With Cumulus Linux, installing the Host sFlow agent enables the hardware support for sFlow in the bare metal switch to provide line rate monitoring on every 1, 10, 25, 40, 50 and 100 Gbit/s port. Since Cumulus VX isn't a hardware switch the Host sFlow agent makes use of the Linux iptables/nflog capability to monitor traffic.

Host sFlow agents are installed on Server 1 and Server 2. These agents stream server, virtual machine, and container metrics. In addition, the Host sFlow agent automatically enables the sFlow in Open vSwitch which in turn exports traffic flow, interface counter, resource and tunnel encap/decap information.

The sFlow data from the leaf1, leaf2, spine1, spine2, server1 and server2 is transmitted over the management network to sFlow-RT real-time analytics software running on the Orchestration Server.

A difficult challenge in managing large scale cloud infrastructure is rapidly identifying overloaded resources (hot spots), for example:
  • Congested network link between physical switches
  • Poorly performing virtual switch
  • Overloaded server
  • Overloaded container / virtual machine
  • Oversubscribed service pool
  • Distributed Denial of Service (DDoS) attack
  • DevOps
Identifying an overloaded resource is only half the solution - the source of the load must also be found so that corrective action can be taken. This process of identifying and curing overloaded resources is critical to delivering on service level agreements. The scale and complexity of the infrastructure demands that this process be automated so that performance problems are quickly identified and immediately addressed.

The sFlow-RT analytics platform is designed with automation in mind, providing REST and embedded script APIs that facilitate metrics driven control actions. The following examples use the sFlow-RT REST API to demonstrate the type of data available using sFlow.

Congested network link between physical switches

The following query find the busiest link in the fabric based on sFlow interface counters:
curl "http://10.0.0.86:8008/metric/10.0.0.80;10.0.0.81;10.0.0.82;100.0.0.83/max:ifinoctets,max:ifoutoctets/json"
[
 {
  "agent": "10.0.0.80",
  "dataSource": "4",
  "lastUpdate": 3374,
  "lastUpdateMax": 17190,
  "lastUpdateMin": 3374,
  "metricN": 21,
  "metricName": "max:ifinoctets",
  "metricValue": 101670.72864951608
 },
 {
  "agent": "10.0.0.80",
  "dataSource": "4",
  "lastUpdate": 3375,
  "lastUpdateMax": 17191,
  "lastUpdateMin": 3375,
  "metricN": 21,
  "metricName": "max:ifoutoctets",
  "metricValue": 101671.07968507096
 }
]
Mapping the sFlow agent and dataSource associated with the busy link to a switch name and interface name is accomplished with a second query:
curl "http://10.0.0.86:8008/metric/10.0.0.80/host_name,4.ifname/json"
[
 {
  "agent": "10.0.0.80",
  "dataSource": "2.1",
  "lastUpdate": 13011,
  "lastUpdateMax": 13011,
  "lastUpdateMin": 13011,
  "metricN": 1,
  "metricName": "host_name",
  "metricValue": "leaf1"
 },
 {
  "agent": "10.0.0.80",
  "dataSource": "4",
  "lastUpdate": 13011,
  "lastUpdateMax": 13011,
  "lastUpdateMin": 13011,
  "metricN": 1,
  "metricName": "4.ifname",
  "metricValue": "swp2"
 }
]
Now we know that interface swp2 on switch leaf1 is the busy link, the next step is identifying the traffic flowing on the link by creating a flow definition (see RESTflow):
curl -H "Content-Type:application/json" -X PUT -d '{"keys":"macsource,macdestination,ipsource,ipdestination,stack","value":"bytes"}' http://10.0.0.86:8008/flow/test1/json
Now that a flow has been defined, we can query the new metric to see traffic on the port:
curl "http://10.0.0.86:8008/metric/10.0.0.80/4.test1/json"
[{
 "agent": "10.0.0.80",
 "dataSource": "4",
 "lastUpdate": 714,
 "lastUpdateMax": 714,
 "lastUpdateMin": 714,
 "metricN": 1,
 "metricName": "4.test1",
 "metricValue": 211902.75708445764,
 "topKeys": [{
  "key": "080027AABAA5,08002745B9B4,192.168.2.1,192.168.1.1,eth.ip.udp.geneve.eth.ip.icmp",
  "lastUpdate": 712,
  "value": 211902.75708445764
 }]
}]
We can see that the traffic is a Geneve tunnel between Server 2 (192.168.2.1) and Server 1 (192.168.1.1) and that it is carrying encapsulated ICMP traffic. At this point, an additional flow can be created to find the sources of traffic in the virtual overlay network (see Down the rabbit hole).

The following flow definition takes the data from the physical switches and examines the tunnel contents:
curl -H "Content-Type:application/json" -X PUT -d '{"keys":"ipsource,ipdestination,genevevni,macsource.1,host:macsource.1:vir_host_name,macdestination.1,host:macdestination.1:vir_host_name,ipsource.1,ipdestination.1,stack","value":"bytes"}' http://10.0.0.86:8008/flow/test2/json
Querying the new metric to find out about the flow:
curl "http://10.0.0.86:8008/metric/10.0.0.80/4.test2/json"
[{
 "agent": "10.0.0.80",
 "dataSource": "4",
 "lastUpdate": 9442,
 "lastUpdateMax": 9442,
 "lastUpdateMin": 9442,
 "metricN": 1,
 "metricName": "4.test2",
 "metricValue": 3423.596229984865,
 "topKeys": [{
  "key": "192.168.2.1,192.168.1.1,1,000000CC0202,/lonely_albattani,000000CC0102,/angry_hopper,172.16.2.2,172.16.1.2,eth.ip.udp.geneve.eth.ip.icmp",
  "lastUpdate": 9442,
  "value": 3423.596229984865
 }]
Now it is clear that the encapsulated flow starts at Server 2, Container 2 and ends at Server 1, Container 1.
Querying the OVN Northbound database for the MAC addresses, 000000CC0202 and 000000CC0102 links this traffic to the two ports on logical switch sw0.
The flow also merges information about the identity of the containers - obtained from sFlow export from the Host sFlow agents on the servers. For example, the host:macsource.1:vir_host_name function in the flow definition looks up the virtual_host_name associated with the inner source MAC address. In this case, identifying docker container named /lonely_albattani as the source of the traffic.

At this point we have enough information to start putting in place controls. For example, knowing the container name and hosting server would allow the container to be shutdown, or the container workload could be moved - a relatively simple task since OVN will automatically update the settings on the destination server to associate the container with its logical network.

While this example showed manual steps to demonstrate sFlow-RT APIs, in practice the entire process is automated. For example, Leaf and spine traffic engineering using segment routing and SDN demonstrates how congestion on the physical links can be mitigated in ECMP fabrics.

Poor virtual switch performance

Open vSwitch performance monitoring describes key datapath performance metrics that Open vSwitch includes in its sFlow export. For example, the following query identifies the virtual switch with the lowest cache hit rate, the switch handling the largest number of cache misses, and the switch handling the largest number of active flows:
curl "http://10.0.0.86:8008/metric/ALL/min:ovs_dp_hitrate,max:ovs_dp_misses,max:ovs_dp_flows/json"
[
 {
  "agent": "10.0.0.84",
  "dataSource": "2.1000",
  "lastUpdate": 19782,
  "lastUpdateMax": 19782,
  "lastUpdateMin": 19782,
  "metricN": 2,
  "metricName": "min:ovs_dp_hitrate",
  "metricValue": 99.91260923845194
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "2.1000",
  "lastUpdate": 19782,
  "lastUpdateMax": 19782,
  "lastUpdateMin": 19782,
  "metricN": 2,
  "metricName": "max:ovs_dp_misses",
  "metricValue": 0.3516881028938907
 },
 {
  "agent": "10.0.0.85",
  "dataSource": "2.1000",
  "lastUpdate": 8090,
  "lastUpdateMax": 19782,
  "lastUpdateMin": 8090,
  "metricN": 2,
  "metricName": "max:ovs_dp_flows",
  "metricValue": 11
 }
]
In this case the vSwitch on Server 1 (10.0.0.84) is handling the largest number of packets in its slow path and has the lowest cache hit rate. The vSwitch on Server 2 (10.0.0.85) has the largest number of active flows in its datapath.

The Open vSwitch datapath integrates sFlow support. The test1 flow definition created in the previous example provides general L2/L3 information, so we can make a query to see the active flows in the datapath on 10.0.0.84:
curl "http://10.0.0.86:8008/activeflows/10.0.0.84/test1/json"
[
 {
  "agent": "10.0.0.84",
  "dataSource": "16",
  "flowN": 1,
  "key": "000000CC0102,000000CC0202,172.16.1.2,172.16.2.2,eth.ip.icmp",
  "value": 97002.07726081279
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "0",
  "flowN": 1,
  "key": "000000CC0202,000000CC0102,172.16.2.2,172.16.1.2,eth.ip.icmp",
  "value": 60884.34095101907
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "3",
  "flowN": 1,
  "key": "080027946A4E,0800271AF7F0,192.168.2.1,192.168.1.1,eth.ip.udp.geneve.eth.ip.icmp",
  "value": 47117.093823014926
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "17",
  "flowN": 1,
  "key": "0800271AF7F0,080027946A4E,192.168.1.1,192.168.2.1,eth.ip.udp.geneve.eth.ip.icmp",
  "value": 37191.709371373545
 }
]
The previous example showed how the flow information can be associated with Docker containers, logical networks, and physical networks so that control actions can be planned and executed reduce traffic on an overloaded virtual switch.

Overloaded server

The following query finds the server with the largest load average and the server with the highest load average:
curl "http://10.0.0.86:8008/metric/ALL/max:load_one,max:cpu_utilization/json"
[
 {
  "agent": "10.0.0.84",
  "dataSource": "2.1",
  "lastUpdate": 10661,
  "lastUpdateMax": 13769,
  "lastUpdateMin": 10661,
  "metricN": 7,
  "metricName": "max:load_one",
  "metricValue": 0.82
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "2.1",
  "lastUpdate": 10661,
  "lastUpdateMax": 13769,
  "lastUpdateMin": 10661,
  "metricN": 7,
  "metricName": "max:cpu_utilization",
  "metricValue": 69.68566862013851
 }
]
In this case Server 1 (10.0.0.84) has the highest CPU load.
Interestingly, the switches in this case are running Cumulus Linux, which for all intents makes them servers since Cumulus Linux is based on Debian and can run unmodified Debian packages, including Host sFlow (see Cumulus Networks, sFlow and data center automation). If the busiest server happens to be one of the switches, it will show up as a result in this query.
Since many workloads in a cloud environment tend to be network services, following up by examining network traffic, as was demonstrated in the previous two examples, is often the next step to identifying the source of the load.

In this case the server is also running Linux containers and the next example shows how to identify busy containers / virtual machines.

Overloaded container / virtual machine

The following query finds the container / virtual machine with the largest CPU utilization:
curl "http://10.0.0.86:8008/metric/ALL/max:vir_cpu_utilization/json"
[{
 "agent": "10.0.0.84",
 "dataSource": "3.100002",
 "lastUpdate": 13949,
 "lastUpdateMax": 13949,
 "lastUpdateMin": 13949,
 "metricN": 2,
 "metricName": "max:vir_cpu_utilization",
 "metricValue": 62.7706705162029
}]
The following query extracts additional information for the agent and dataSource:
curl "http://10.0.0.86:8008/metric/10.0.0.84/host_name,node_domains,cpu_utilization,3.100002.vir_host_name/json"
[
 {
  "agent": "10.0.0.84",
  "dataSource": "2.1",
  "lastUpdate": 3377,
  "lastUpdateMax": 3377,
  "lastUpdateMin": 3377,
  "metricN": 1,
  "metricName": "host_name",
  "metricValue": "server1"
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "2.1",
  "lastUpdate": 3377,
  "lastUpdateMax": 3377,
  "lastUpdateMin": 3377,
  "metricN": 1,
  "metricName": "node_domains",
  "metricValue": 2
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "2.1",
  "lastUpdate": 3377,
  "lastUpdateMax": 3377,
  "lastUpdateMin": 3377,
  "metricN": 1,
  "metricName": "cpu_utilization",
  "metricValue": 69.4535519125683
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "3.100002",
  "lastUpdate": 19429,
  "lastUpdateMax": 19429,
  "lastUpdateMin": 19429,
  "metricN": 1,
  "metricName": "3.100002.vir_host_name",
  "metricValue": "/angry_hopper"
 }
]
The results identify the container /angry_hopper running on server1, which is running two containers and itself has a CPU load of 69%.

Oversubscribed service pool

Cluster performance metrics describes how sFlow metrics can be used to characterize the performance of a pool of servers.
Dynamically Scaling Netflix in the Cloud
The presentation Dynamically Scaling Netflix in the Cloud shows how Netflix adjusts the number of virtual machines in autoscaling groups based on measured load. Netflix runs on Amazon infrastructure. However, the combined network, servers, virtual machine and container metrics available through sFlow can be used to drive autoscaling cloud orchestration systems like OpenStack, Apache Mesos, etc. Joint VM Placement and Routing for Data Center Traffic Engineering, shows that jointly optimizing network and server resources can yield significant benefits. Finally, the Linux containers used in this testbed can be started and stopped in under a second, making it possible to rapidly expand and contract capacity in response to changing demand - provided that you have a fast, lightweight measurement system like sFlow that can provide the needed metrics.

Distributed Denial of Service (DDoS) attack

Multi-tenant performance isolation describes a large scale outage at a cloud service provider caused by a DDoS attack. The real-time traffic information available through sFlow provides the information needed to identify attacks and target mitigation actions in order to maintain service levels. DDoS mitigation with Cumulus Linux describes how hardware filtering capabilities of physical switches can be deployed to automatically filter out large scale attacks that would otherwise overload the servers.

DevOps


The previous examples focused on automation applications for sFlow. The diagram above shows how the sFlow-RT analytics engine is used to deliver metrics and events to cloud based and on-site DevOps tools, see: Cloud analytics,  InfluxDB and Grafana, Cloud Analytics, Metric export to Graphite, and Exporting events using syslog. There are important scaleability and cost advantages to placing the sFlow-RT analytics engine in front of metrics collection applications as shown in the diagram. For example, in large scale cloud environments the metrics for each member of a dynamic pool are not necessarily worth trending since virtual machines are frequently added and removed. Instead, sFlow-RT can be configured to track all the members of the pool, calculates summary statistics for the pool, and log summary statistics. This pre-processing can significantly reduce storage requirements, reduce costs and increase query performance.

Final Comments

The OVN project shows great promise in making network virtualization an easily consumable component in open source cloud infrastructures. Virtualizing networks provides flexibility and security, but can be challenging to monitor, optimize and troubleshoot. However, this article demonstrates that built-in support for sFlow telemetry within commodity cloud infrastructure provides visibility to manage virtual and physical network and server resources.