sFlow: September 2015

Monday, September 28, 2015

Real-time analytics and control applications

sFlow-RT 2.0 released - adds application support describes a new application framework for sharing solutions built on top of the real-time analytics platform. Application examples are provided on the sFlow-RT Download page.

The flow-graph application, shown above, generates a real-time graph of communication between hosts. The application uses a simple sFlow-RT script to track associations between hosts based on their communication patterns and plots the results using the vis.js dynamic, browser based visualization library. This example can be modified to track different types of relationship and extended to incorporate other popular data visualization libraries such as D3.js.

The dashboard-example includes representative real-time metric and top flows trend charts. The example uses the jQuery-UI library to build build a simple tabbed interface. This example can be extended to build groups of custom charts.

The top-flows application supports the definition of custom flows and tracks the largest flows in a continuously updating table.

Each of the examples has a server-side component that uses sFlow-RT's script API to collect, analyze, and export measurements. An HTML5 client side user interface connects to the server and presents the data.

The sFlow-RT analytics engine is a highly scaleable platform for processing sFlow measurements from physical and virtual network switches, servers, virtual machines, Linux containers, load balancers, web and application servers, etc. The analytics capability can be applied to a wide range of SDN and DevOps use cases - many of which have been described on this blog. Application support provides a simple way for vendors, researchers, and developers to distribute solutions.

Monday, September 21, 2015

Open Virtual Network (OVN)

Open Virtual Network (OVN) is an open source network virtualization solution built as part of the Open vSwitch (OVS) project. OVN provides layer 2/3 virtual networking and firewall services for connecting virtual machines and Linux containers.

OVN is built on the same architectural principles as VMware's commercial NSX and offers the same core network virtualization capability — providing a free alternative that is likely to see rapid adoption in open source orchestration systems, Mirantis: Why the Open Virtual Network (OVN) matters to OpenStack.

This article uses OVN as an example, describing a testbed which demonstrates how the standard sFlow instrumentation build into the physical and virtual switches provides the end-to-end visibility required to manage large scale network virtualization and deliver reliable services.

Open Virtual Network

The Northbound DB provides a way to describe the logical networks that are required. The database abstracts away implementation details which are handled by the ovn-northd and ovn-controllers and presents an easily consumable network virtualization service to orchestration tools like OpenStack.

The purple tables on the left describe a simple logical switch LS1 that has two logical ports LP1 and LP2 with MAC addresses AA and BB respectively. The green tables on the right show the Southbound DB that is constructed by combining information from the ovn-controllers on hypervisors HV1 and HV2 to build forwarding tables in the vSwitches that realize the virtual network.

Docker, OVN, OVS, ECMP Testbed

The diagram shows the virtual testbed that was created using virtual machines running under VirtualBox:

Physical Network The recent release of Cumulus VX by Cumulus Networks makes it possible to build realistic networks out of virtual machines. In this case we built a two-spine, two-leaf network using VirtualBox that provides L3 ECMP connectivity using BGP as the routing protocol, a configuration that is very similar to that used by large cloud providers. The green virtual machines leaf1, leaf2, spine1 and spine2 comprise the ECMP network.
Servers Server 1, Server 2 and the Orchestration Server virtual machines are ubuntu-14.04.3-server installations. Server 1 and Server 2 are connected to the physical network with addresses 192.168.1.1 and 192.168.2.1 respectively that will be used to form the underlay network. Docker has been installed on Server 1 and Server 2 and each server has two containers. The containers on Server 1 have been assigned addresses 172.16.1.1/00:00:00:CC:01:01 and 172.16.1.2/00:00:00:CC:01:02 and the containers on Server 2 have been assigned addresses 172.16.2.1/00:00:00:CC:02:01, 172.16.2.2/00:00:00:CC:02:02.
Virtual Network Open vSwitch (OVS) was installed from sources on Server 1 and Server 2 along with ovn-controller daemons. The ovs-northd daemon was built and installed on the Orchestration Server. A single logical switch sw0 has been configured that connects the server1-container2 (MAC 00:00:00:CC:01:02) to server2-container2 (MAC 00:00:00:CC:02:02).
Management Network The out of band management network shown in orange is a VirtualBox bridged network connecting management ports on the physical switches and servers to the Orchestration Server.

Pinging between Server 1, Container 2 (172.16.1.2) and Server 2, Container 2 (172.16.2.2) verifies that the logical network is operational.

Visibility

Enabling sFlow instrumentation in the testbed provides visibility into the physical and virtual network and server resources associated with the logical network.

Most physical switches support sFlow. With Cumulus Linux, installing the Host sFlow agent enables the hardware support for sFlow in the bare metal switch to provide line rate monitoring on every 1, 10, 25, 40, 50 and 100 Gbit/s port. Since Cumulus VX isn't a hardware switch the Host sFlow agent makes use of the Linux iptables/nflog capability to monitor traffic.

Host sFlow agents are installed on Server 1 and Server 2. These agents stream server, virtual machine, and container metrics. In addition, the Host sFlow agent automatically enables the sFlow in Open vSwitch which in turn exports traffic flow, interface counter, resource and tunnel encap/decap information.

The sFlow data from the leaf1, leaf2, spine1, spine2, server1 and server2 is transmitted over the management network to sFlow-RT real-time analytics software running on the Orchestration Server.

A difficult challenge in managing large scale cloud infrastructure is rapidly identifying overloaded resources (hot spots), for example:

Congested network link between physical switches
Poorly performing virtual switch
Overloaded server
Overloaded container / virtual machine
Oversubscribed service pool
Distributed Denial of Service (DDoS) attack
DevOps

Identifying an overloaded resource is only half the solution - the source of the load must also be found so that corrective action can be taken. This process of identifying and curing overloaded resources is critical to delivering on service level agreements. The scale and complexity of the infrastructure demands that this process be automated so that performance problems are quickly identified and immediately addressed.

The sFlow-RT analytics platform is designed with automation in mind, providing REST and embedded script APIs that facilitate metrics driven control actions. The following examples use the sFlow-RT REST API to demonstrate the type of data available using sFlow.

Congested network link between physical switches

The following query find the busiest link in the fabric based on sFlow interface counters:

curl "http://10.0.0.86:8008/metric/10.0.0.80;10.0.0.81;10.0.0.82;100.0.0.83/max:ifinoctets,max:ifoutoctets/json"
[
 {
  "agent": "10.0.0.80",
  "dataSource": "4",
  "lastUpdate": 3374,
  "lastUpdateMax": 17190,
  "lastUpdateMin": 3374,
  "metricN": 21,
  "metricName": "max:ifinoctets",
  "metricValue": 101670.72864951608
 },
 {
  "agent": "10.0.0.80",
  "dataSource": "4",
  "lastUpdate": 3375,
  "lastUpdateMax": 17191,
  "lastUpdateMin": 3375,
  "metricN": 21,
  "metricName": "max:ifoutoctets",
  "metricValue": 101671.07968507096
 }
]

Mapping the sFlow agent and dataSource associated with the busy link to a switch name and interface name is accomplished with a second query:

curl "http://10.0.0.86:8008/metric/10.0.0.80/host_name,4.ifname/json"
[
 {
  "agent": "10.0.0.80",
  "dataSource": "2.1",
  "lastUpdate": 13011,
  "lastUpdateMax": 13011,
  "lastUpdateMin": 13011,
  "metricN": 1,
  "metricName": "host_name",
  "metricValue": "leaf1"
 },
 {
  "agent": "10.0.0.80",
  "dataSource": "4",
  "lastUpdate": 13011,
  "lastUpdateMax": 13011,
  "lastUpdateMin": 13011,
  "metricN": 1,
  "metricName": "4.ifname",
  "metricValue": "swp2"
 }
]

Now we know that interface swp2 on switch leaf1 is the busy link, the next step is identifying the traffic flowing on the link by creating a flow definition (see RESTflow):

curl -H "Content-Type:application/json" -X PUT -d '{"keys":"macsource,macdestination,ipsource,ipdestination,stack","value":"bytes"}' http://10.0.0.86:8008/flow/test1/json

Now that a flow has been defined, we can query the new metric to see traffic on the port:

curl "http://10.0.0.86:8008/metric/10.0.0.80/4.test1/json"
[{
 "agent": "10.0.0.80",
 "dataSource": "4",
 "lastUpdate": 714,
 "lastUpdateMax": 714,
 "lastUpdateMin": 714,
 "metricN": 1,
 "metricName": "4.test1",
 "metricValue": 211902.75708445764,
 "topKeys": [{
  "key": "080027AABAA5,08002745B9B4,192.168.2.1,192.168.1.1,eth.ip.udp.geneve.eth.ip.icmp",
  "lastUpdate": 712,
  "value": 211902.75708445764
 }]
}]

We can see that the traffic is a Geneve tunnel between Server 2 (192.168.2.1) and Server 1 (192.168.1.1) and that it is carrying encapsulated ICMP traffic. At this point, an additional flow can be created to find the sources of traffic in the virtual overlay network (see Down the rabbit hole).

The following flow definition takes the data from the physical switches and examines the tunnel contents:

curl -H "Content-Type:application/json" -X PUT -d '{"keys":"ipsource,ipdestination,genevevni,macsource.1,host:macsource.1:vir_host_name,macdestination.1,host:macdestination.1:vir_host_name,ipsource.1,ipdestination.1,stack","value":"bytes"}' http://10.0.0.86:8008/flow/test2/json

Querying the new metric to find out about the flow:

curl "http://10.0.0.86:8008/metric/10.0.0.80/4.test2/json"
[{
 "agent": "10.0.0.80",
 "dataSource": "4",
 "lastUpdate": 9442,
 "lastUpdateMax": 9442,
 "lastUpdateMin": 9442,
 "metricN": 1,
 "metricName": "4.test2",
 "metricValue": 3423.596229984865,
 "topKeys": [{
  "key": "192.168.2.1,192.168.1.1,1,000000CC0202,/lonely_albattani,000000CC0102,/angry_hopper,172.16.2.2,172.16.1.2,eth.ip.udp.geneve.eth.ip.icmp",
  "lastUpdate": 9442,
  "value": 3423.596229984865
 }]

Now it is clear that the encapsulated flow starts at Server 2, Container 2 and ends at Server 1, Container 1.

Querying the OVN Northbound database for the MAC addresses, 000000CC0202 and 000000CC0102 links this traffic to the two ports on logical switch sw0.

The flow also merges information about the identity of the containers - obtained from sFlow export from the Host sFlow agents on the servers. For example, the host:macsource.1:vir_host_name function in the flow definition looks up the virtual_host_name associated with the inner source MAC address. In this case, identifying docker container named /lonely_albattani as the source of the traffic.

At this point we have enough information to start putting in place controls. For example, knowing the container name and hosting server would allow the container to be shutdown, or the container workload could be moved - a relatively simple task since OVN will automatically update the settings on the destination server to associate the container with its logical network.

While this example showed manual steps to demonstrate sFlow-RT APIs, in practice the entire process is automated. For example, Leaf and spine traffic engineering using segment routing and SDN demonstrates how congestion on the physical links can be mitigated in ECMP fabrics.

Poor virtual switch performance

Open vSwitch performance monitoring describes key datapath performance metrics that Open vSwitch includes in its sFlow export. For example, the following query identifies the virtual switch with the lowest cache hit rate, the switch handling the largest number of cache misses, and the switch handling the largest number of active flows:

curl "http://10.0.0.86:8008/metric/ALL/min:ovs_dp_hitrate,max:ovs_dp_misses,max:ovs_dp_flows/json"
[
 {
  "agent": "10.0.0.84",
  "dataSource": "2.1000",
  "lastUpdate": 19782,
  "lastUpdateMax": 19782,
  "lastUpdateMin": 19782,
  "metricN": 2,
  "metricName": "min:ovs_dp_hitrate",
  "metricValue": 99.91260923845194
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "2.1000",
  "lastUpdate": 19782,
  "lastUpdateMax": 19782,
  "lastUpdateMin": 19782,
  "metricN": 2,
  "metricName": "max:ovs_dp_misses",
  "metricValue": 0.3516881028938907
 },
 {
  "agent": "10.0.0.85",
  "dataSource": "2.1000",
  "lastUpdate": 8090,
  "lastUpdateMax": 19782,
  "lastUpdateMin": 8090,
  "metricN": 2,
  "metricName": "max:ovs_dp_flows",
  "metricValue": 11
 }
]

In this case the vSwitch on Server 1 (10.0.0.84) is handling the largest number of packets in its slow path and has the lowest cache hit rate. The vSwitch on Server 2 (10.0.0.85) has the largest number of active flows in its datapath.

The Open vSwitch datapath integrates sFlow support. The test1 flow definition created in the previous example provides general L2/L3 information, so we can make a query to see the active flows in the datapath on 10.0.0.84:

curl "http://10.0.0.86:8008/activeflows/10.0.0.84/test1/json"
[
 {
  "agent": "10.0.0.84",
  "dataSource": "16",
  "flowN": 1,
  "key": "000000CC0102,000000CC0202,172.16.1.2,172.16.2.2,eth.ip.icmp",
  "value": 97002.07726081279
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "0",
  "flowN": 1,
  "key": "000000CC0202,000000CC0102,172.16.2.2,172.16.1.2,eth.ip.icmp",
  "value": 60884.34095101907
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "3",
  "flowN": 1,
  "key": "080027946A4E,0800271AF7F0,192.168.2.1,192.168.1.1,eth.ip.udp.geneve.eth.ip.icmp",
  "value": 47117.093823014926
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "17",
  "flowN": 1,
  "key": "0800271AF7F0,080027946A4E,192.168.1.1,192.168.2.1,eth.ip.udp.geneve.eth.ip.icmp",
  "value": 37191.709371373545
 }
]

The previous example showed how the flow information can be associated with Docker containers, logical networks, and physical networks so that control actions can be planned and executed reduce traffic on an overloaded virtual switch.

Overloaded server

The following query finds the server with the largest load average and the server with the highest load average:

curl "http://10.0.0.86:8008/metric/ALL/max:load_one,max:cpu_utilization/json"
[
 {
  "agent": "10.0.0.84",
  "dataSource": "2.1",
  "lastUpdate": 10661,
  "lastUpdateMax": 13769,
  "lastUpdateMin": 10661,
  "metricN": 7,
  "metricName": "max:load_one",
  "metricValue": 0.82
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "2.1",
  "lastUpdate": 10661,
  "lastUpdateMax": 13769,
  "lastUpdateMin": 10661,
  "metricN": 7,
  "metricName": "max:cpu_utilization",
  "metricValue": 69.68566862013851
 }
]

In this case Server 1 (10.0.0.84) has the highest CPU load.

Interestingly, the switches in this case are running Cumulus Linux, which for all intents makes them servers since Cumulus Linux is based on Debian and can run unmodified Debian packages, including Host sFlow (see Cumulus Networks, sFlow and data center automation). If the busiest server happens to be one of the switches, it will show up as a result in this query.

Since many workloads in a cloud environment tend to be network services, following up by examining network traffic, as was demonstrated in the previous two examples, is often the next step to identifying the source of the load.

In this case the server is also running Linux containers and the next example shows how to identify busy containers / virtual machines.

Overloaded container / virtual machine

The following query finds the container / virtual machine with the largest CPU utilization:

curl "http://10.0.0.86:8008/metric/ALL/max:vir_cpu_utilization/json"
[{
 "agent": "10.0.0.84",
 "dataSource": "3.100002",
 "lastUpdate": 13949,
 "lastUpdateMax": 13949,
 "lastUpdateMin": 13949,
 "metricN": 2,
 "metricName": "max:vir_cpu_utilization",
 "metricValue": 62.7706705162029
}]

The following query extracts additional information for the agent and dataSource:

curl "http://10.0.0.86:8008/metric/10.0.0.84/host_name,node_domains,cpu_utilization,3.100002.vir_host_name/json"
[
 {
  "agent": "10.0.0.84",
  "dataSource": "2.1",
  "lastUpdate": 3377,
  "lastUpdateMax": 3377,
  "lastUpdateMin": 3377,
  "metricN": 1,
  "metricName": "host_name",
  "metricValue": "server1"
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "2.1",
  "lastUpdate": 3377,
  "lastUpdateMax": 3377,
  "lastUpdateMin": 3377,
  "metricN": 1,
  "metricName": "node_domains",
  "metricValue": 2
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "2.1",
  "lastUpdate": 3377,
  "lastUpdateMax": 3377,
  "lastUpdateMin": 3377,
  "metricN": 1,
  "metricName": "cpu_utilization",
  "metricValue": 69.4535519125683
 },
 {
  "agent": "10.0.0.84",
  "dataSource": "3.100002",
  "lastUpdate": 19429,
  "lastUpdateMax": 19429,
  "lastUpdateMin": 19429,
  "metricN": 1,
  "metricName": "3.100002.vir_host_name",
  "metricValue": "/angry_hopper"
 }
]

The results identify the container /angry_hopper running on server1, which is running two containers and itself has a CPU load of 69%.

Oversubscribed service pool

Cluster performance metrics describes how sFlow metrics can be used to characterize the performance of a pool of servers.

Dynamically Scaling Netflix in the Cloud

The presentation Dynamically Scaling Netflix in the Cloud shows how Netflix adjusts the number of virtual machines in autoscaling groups based on measured load. Netflix runs on Amazon infrastructure. However, the combined network, servers, virtual machine and container metrics available through sFlow can be used to drive autoscaling cloud orchestration systems like OpenStack, Apache Mesos, etc. Joint VM Placement and Routing for Data Center Traffic Engineering, shows that jointly optimizing network and server resources can yield significant benefits. Finally, the Linux containers used in this testbed can be started and stopped in under a second, making it possible to rapidly expand and contract capacity in response to changing demand - provided that you have a fast, lightweight measurement system like sFlow that can provide the needed metrics.

Distributed Denial of Service (DDoS) attack

Multi-tenant performance isolation describes a large scale outage at a cloud service provider caused by a DDoS attack. The real-time traffic information available through sFlow provides the information needed to identify attacks and target mitigation actions in order to maintain service levels. DDoS mitigation with Cumulus Linux describes how hardware filtering capabilities of physical switches can be deployed to automatically filter out large scale attacks that would otherwise overload the servers.

DevOps

The previous examples focused on automation applications for sFlow. The diagram above shows how the sFlow-RT analytics engine is used to deliver metrics and events to cloud based and on-site DevOps tools, see: Cloud analytics, InfluxDB and Grafana, Cloud Analytics, Metric export to Graphite, and Exporting events using syslog. There are important scaleability and cost advantages to placing the sFlow-RT analytics engine in front of metrics collection applications as shown in the diagram. For example, in large scale cloud environments the metrics for each member of a dynamic pool are not necessarily worth trending since virtual machines are frequently added and removed. Instead, sFlow-RT can be configured to track all the members of the pool, calculates summary statistics for the pool, and log summary statistics. This pre-processing can significantly reduce storage requirements, reduce costs and increase query performance.

Final Comments

The OVN project shows great promise in making network virtualization an easily consumable component in open source cloud infrastructures. Virtualizing networks provides flexibility and security, but can be challenging to monitor, optimize and troubleshoot. However, this article demonstrates that built-in support for sFlow telemetry within commodity cloud infrastructure provides visibility to manage virtual and physical network and server resources.

Tuesday, September 8, 2015

Cisco adds sFlow support to Nexus 9K series

Cisco adds support for the sFlow standard in the Cisco Nexus 9000 Series 7.0(3)I2(1) NX-OS Release. Combined with the Nexus 3000/3100 series, which have included sFlow support since NX-OS 5.0(3)U4(1), Cisco now offers cost effective, built-in, visibility across the full spectrum of data center switches.

Cisco network engineers might not be familiar with the multi-vendor sFlow technology since it is a relatively new addition to Cisco products. The article, Cisco adds sFlow support, describes some of the key features of sFlow and contrasts them to Cisco NetFlow.

Nexus 9000 switches can be operated in NX-OS mode or ACI mode:

NX-OS mode includes a number of open features such as sFlow, Python, NX-API, and Bash that integrate with an open ecosystem of orchestration tools such as Puppet, Chef, CFEngine, and Ansible. "By embracing the open culture of development and operations (DevOps) and creating a more Linux-like environment in the Cisco Nexus 9000 Series, Cisco enables IT departments with strong Linux skill sets to meet business needs efficiently," Cisco Nexus 9000 Series Switches: Integrate Programmability into Your Data Center. Open APIs are becoming increasingly popular, preventing vendor lock-in, and allowing organizations to benefit from the rapidly increasing range of open hardware and software solutions to reduce costs and increase agility.
ACI mode is a closed solution that relies on proprietary hardware and places the switches under the control of Cisco's APIC (Application Policy Infrastructure Controller) - eliminating many of the features, including sFlow, available in NX-OS mode. The ACI solution is more expensive and the closed platform locks customers into Cisco hardware and solutions.

SDN fabric controllers compares tightly coupled (ACI) and loosely federated (NX-OS) approaches to virtualizing data center networking and there are a number of articles on this blog exploring use cases for real-time sFlow analytics in the data center.