Wednesday, November 18, 2015

Network virtualization visibility demo

New OVS instrumentation features aimed at real-time monitoring of virtual networks, Open vSwitch 2015 Fall Conference, included a demonstration of real-time visibility into the logical network overlays created by network virtualization, virtual switches, and the leaf and spine underlay carrying the tunneled traffic between hosts.

The diagram above shows the demonstration testbed. It consists of a leaf and spine network connecting two hosts, each of which is running a pair of Docker containers connected to Open vSwitch (OVS). The vSwitches are controlled by Open Virtual Network (OVN), which has been configured to create two logical switches, the first connecting the left most containers on each host and the second connecting the right most containers. The testbed is described in more detail in Open Virtual Network (OVN) and is built from free components and can easily be replicated.

The dashboard in the video illustrates the end to end visibility that is possible by combining standard sFlow instrumentation in the physical switches with sFlow instrumentation in Open vSwitch and Host sFlow agents on the servers.

The diagram on the left of the dashboard shows a logical map of the elements in the testbed. The top panel shows the two logical switches created in OVN, sw1 connecting the left containers and sw0 connecting the right containers. The dotted lines represent the logical switches ports and their width shows the current traffic rate flowing over the logical link.

The solid lines below show the path that the virtual network traffic actually takes. From a container on Server1 to the virtual switch (OVS), where it is encapsulated in a Geneve tunnel and sent via leaf1, spine1, and leaf2 to the OVS instance on Server2, which decapsulates the traffic and delivers it to the other container on the logical switch.

Both logical networks share the same physical network and this is shown be link color. Traffic associated with sw1 is shown as red, traffic associated with sw0 is shown as blue, and a mixture of sw1 and sw0 traffic is shown as magenta.

The physical network is a shared resource for the virtual networks that make use of it. Understanding how this resource is being utilized is essential to ensure that virtual networks do not interfere with each other, for example, one tenant backing up data over their virtual network may cause unacceptable response time problems for another tenant.

The strip charts to the right of the diagram show representative examples of the data that is available and demonstrate comprehensive visibility into, and across, layers in the virtualization stack. Going from top to bottom:
  • Container CPU Utilization The trend chart shows the per container CPU load for the containers. This data comes from the Host sFlow agents.
  • Container Traffic This trend chart merges sFlow data from the Host sFlow and Open vSwitch to show traffic flowing between containers.
  • OVN Virtual Switch Traffic This trend chart merges data from the OVN Northbound interface (specifically, logical port MAC addresses and logical switch names) and Open vSwitch to show traffic flowing through each logical switch.
  • Open vSwitch Performance This trend chart shows key performance indicators based on metrics exported by the Open vSwitches, see Open vSwitch performance monitoring.
  • Leaf/Spine Traffic This chart combines data from all the switches in the leaf / spine network to show traffic flows. The chart demonstrates that sFlow from the physical network devices provides visibility into the outer (tunnel) addresses and inner (tenant/virtual network) addresses, see Tunnels.
The visibility shown in the diagram and charts is only possible because all the elements of the infrastructure are instrumented using sFlow. No single element or layer has a complete picture, but when you combine information from all the elements a full picture emerges, see Visibility and the software defined data center

The demonstration is available on GitHub. The following steps will download the demo and provide data that can be used to explore the sFlow-RT APIs described in Open Virtual Network (OVN) that were used to construct this dashboard.
tar -xvzf sflow-rt.tar.gz
cd sflow-rt
./ pphaal ovs-2015
Edit the file to playback the included captured sFlow data:

HOME=`dirname $0`
cd $HOME

JVM_OPTS="-Xincgc -Xmx200m"
RT_OPTS="-Dsflow.port=6343 -Dhttp.port=8008 -Dsflow.file=app/ovs-2015/demo.pcap"

exec java ${JVM_OPTS} ${RT_OPTS} ${SCRIPTS} -jar ${JAR}
Start sFlow-RT:
[user@server sflow-rt]$ ./ 
2015-11-18T13:22:14-0800 INFO: Reading PCAP file, app/ovs-2015/demo.pcap
2015-11-18T13:22:15-0800 INFO: Starting the Jetty [HTTP/1.1] server on port 8008
2015-11-18T13:22:15-0800 INFO: Starting application
2015-11-18T13:22:15-0800 INFO: Listening, http://localhost:8008
2015-11-18T13:22:15-0800 INFO: init.js started
2015-11-18T13:22:15-0800 INFO: app/ovs-2015/scripts/status.js started
2015-11-18T13:22:16-0800 INFO: init.js stopped
Finally, access the web interface http://server:8008/app/ovs-2015/html/ and you should see the screen shown in the video.

The sFlow monitoring technology scales to large production networks. The instrumentation is built into physical switch hardware and is available in 1G/10G/25G/40G/50G/100G data center switches from most vendors (see The sFlow instrumentation in Open vSwitch is built into the Linux kernel and is an extremely efficient method of monitoring large numbers of virtual machines and/or containers.

The demonstration dashboard illustrates the type of operational visibility that can be delivered using sFlow. However, for large scale deployments, sFlow data can be incorporated into existing DevOps tool sets to augment data that is already being collected.
The diagram above shows how the sFlow-RT analytics engine is used to deliver metrics and events to cloud based and on-site DevOps tools, see: Cloud analytics,  InfluxDB and Grafana, Metric export to Graphite, and Exporting events using syslog. There are important scaleability and cost advantages to placing the sFlow-RT analytics engine in front of metrics collection applications as shown in the diagram. For example, in large scale cloud environments the metrics for each member of a dynamic pool are not necessarily worth trending since virtual machines are frequently added and removed. Instead, sFlow-RT can be configured to track all the members of the pool, calculates summary statistics for the pool, and log summary statistics. This pre-processing can significantly reduce storage requirements, reduce costs and increase query performance.

No comments:

Post a Comment