sFlow: April 2016

Thursday, April 28, 2016

Cisco SF250, SG250, SF350, SG350, SG350XG, and SG550XG series switches

Software version 2.1.0 adds sFlow support to Cisco 250 Series Smart Switches, 350 Series Smart Switches and 550X Series Stackable Managed Switches.

Cisco network engineers might not be familiar with the multi-vendor sFlow technology since it is a relatively new addition to Cisco products. The article, Cisco adds sFlow support, describes some of the key features of sFlow and contrasts them to Cisco NetFlow.

Configuring sFlow on the switches is straightforward. For example, The following commands configure a switch to sample packets at 1-in-1024, poll counters every 30 seconds and send sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343):

sflow receiver 1 10.0.0.50

For each interface:

sflow flow-sampling 1024 1
sflow counter-sampling 30 1

A previous posting discussed the selection of sampling rates. Additional information can be found on the Cisco web site.

Trying out sFlow offers suggestions for getting started with sFlow monitoring and reporting. The article recommends the sFlowTrend analyzer as a way to get started since it is a free, purpose built sFlow analyzer that delivers the full capabilities of sFlow instrumentation in the Cisco switches.

Tuesday, April 26, 2016

Multi-tenant sFlow

This article discusses how real-time sFlow telemetry can be shared with network tenants to provide each tenant with a real-time view of their slice of the shared resources. The diagram shows a simple network with two tenants, Tenant A and Tenant B, each assigned their own subnet, 10.0.0.0/24 and 10.0.1.0/24 respectively.

One option would be to simply replicate the sFlow datagrams and send copies to both tenants. Forwarding using sflowtool describes how sflowtool can be used to replicate and forward sFlow and sFlow-RT can be configured to forward sFlow using its REST API:

curl -H "Content-Type:application/json" \
-X PUT --data '{"address":"10.0.0.1","port":6343}' \
http://127.0.0.1:8008/forwarding/TenantA/json

However, there are serious problems with this approach:

Private information about Tenant B's traffic is leaked to Tenant A.
Information from internal links within the network (i.e. links between s1, s2, s3 and s4) is leaked to Tenant A.
Duplicate data from each network hop is likely to cause Tenant A to over-estimate their traffic.

The sFlow-RT multi-tenant forwarding function addresses these challenges. The first task is to provide sFlow-RT with an accurate network topology specifying the internal links connecting the switches, e.g.

curl -H "Content-Type:application/json" -X PUT --data '{\
 "L1":{"node1":"s1", "port1":"s1-eth1", "node2":"s3", "port2":"s3-eth1"},\
 "L2":{"node1":"s1", "port1":"s1-eth2", "node2":"s4", "port2":"s4-eth1"},\
 "L3":{"node1":"s2", "port1":"s2-eth1", "node2":"s3", "port2":"s3-eth2"},\
 "L4":{"node1":"s2", "port1":"s2-eth2", "node2":"s4", "port2":"s4-eth2"}\
}' http://127.0.0.1:8008/topology/json

The topology allows sFlow-RT to model the network as if it were one switch and provide this abstracted view of the sFlow data to tenants.

The following REST API call configures multi-tenant forwarding for Tenant A:

curl -H "Content-Type:application/json" -X PUT --data \
'{"collectorAddress":"10.0.0.1","collectorPort":6343, \
"filters":{"cidr":["10.0.0.0/24"]}}' \
http://127.0.0.1:8008/tenant/TenantA/json

In this example, sFlow-RT filters the sFlow sent to Tenant A to only include traffic to or from hosts within Tenant A's allocated address space, 10.0.0.0/24. In addition, only edge ports are considered -sFlow from inter-switch links is suppressed. When performing multi-tenant forwarding sFlow-RT acts as a proxy, reconstructing a valid sFlow telemetry stream based on the filtered records and re-calculating sequence numbers, sampling information, etc.

In addition to splitting sFlow telemetry by IP address, sFlow-RT can split telemetry based on switch port and MAC addresses - splitting on MAC addresses is a simple way to share sFlow telemetry between members in an Internet Exchange, see Internet Exchange (IX) Metrics.

Tenants can use whatever software they want to process the standard sFlow feed. However, standing up sFlow-RT instances for each tenant is straightforward and provides real-time network analytics through an easily consumable RESTflow API, see Network and system analytics as a Docker microservice.

Finally, network analytics is a valuable service to offer tenants and in the case of commercial service providers can be used as an additional source of revenue or as a way to differentiate the service from competitors.

Monday, April 25, 2016

Network visibility with Docker

Microservices describes the critical role that network visibility provides as a common point of reference for monitoring, managing and securing the interactions between the numerous and diverse distributed service instances in a microservices deployment.

Industry standard sFlow is well placed to give network visibility into the Docker infrastructure used to support microservices. The sFlow standard is widely supported by data center switch vendors (Cisco, Arista, Juniper, Dell, HPE, Brocade, Cumulus, etc.) providing a cost effective and scaleable method of monitoring the physical network infrastructure. In addition, Linux bridge, macvlan, ipvlan, adapters described how sFlow is also an efficient means of leveraging instrumentation built into the Linux kernel to extend visibility into Docker host networking.

The following commands build the Host sFlow binary package from sources on an Ubuntu 14.04 system:

sudo apt-get update
sudo apt-get install build-essential
sudo apt-get install libpcap-dev
sudo apt-get install wget
wget https://github.com/sflow/host-sflow/archive/v1.29.1.tar.gz
tar -xvzf v1.29.1.tar.gz
cd host-sflow-1.29.1
make DOCKER=yes PCAP=yes deb

This resulting hsflowd_1.29.1-1_amd64.deb package can be copied and installed on all the hosts in the Docker cluster using configuration management tools such as Puppet, Chef, Ansible, etc.

This article will explore the alternative of deploying sFlow agents as Docker containers.

Create a directory for the project and edit the Dockerfile:

mkdir hsflowd
cp hsflowd_1.29.1-1_amd64.deb hsflowd
cd hsflowd
printf "sflow {\n dnssd=on\n pcap { dev = docker0 }\n}" > hsflowd.conf
vi Dockerfile

Add the following contents to Dockerfile:

FROM   ubuntu:trusty
RUN    apt-get update && apt-get install -y libpcap0.8 docker.io
ADD    hsflowd_1.29.1-1_amd64.deb /tmp
RUN    dpkg -i /tmp/hsflowd_1.29.1-1_amd64.deb
ADD    hsflowd.conf /etc/hsflowd.conf
CMD    /etc/init.d/hsflowd start && tail -f /dev/null

Build the project:

docker build -t hsflowd .

Run the service:

docker run --pid=host --uts=host --net=host \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /sys/fs/cgroup/:/sys/fs/cgroup/:ro -d hsflowd

In this example, DNS Service Discovery (DNS-SD), is being used as the configuration method for the sFlow agents. Adding the following entry to DNS zone file allows the agents to automatically discover the designated sFlow analyzers, analytics1 and analytics2, and configuration parameters:

_sflow._udp   30  SRV     0 0 6343  analytics1
_sflow._udp   30  SRV     0 0 6343  analytics2
_sflow._udp   30  TXT     (
"txtvers=1"
"sampling=400"
"polling=20"
)

As soon as the container starts, the sFlow agent will make a DNS request to find the sFlow analyzers, which can themselves be packaged as Docker containers. Network and system analytics as a Docker microservice describes how sFlow analytics can be packaged as a RESTful service and integrated with a wide variety of on-site and cloud, orchestration, DevOps and Software Defined Networking (SDN) tools.

Any change to the entries in the zone file will be automatically picked up by the sFlow agents.

The agent has been configured for Docker bridged networking, monitoring traffic through bridge docker0. For macvlan or ipvlan networking, change the pcap setting from docker0 to eth0.

One of the major advantages of packaging the sFlow agents and analytics components as Docker containers is that large scale deployments can be automated using Docker Compose with Swarm, deploying sFlow agents on every node in the Swarm cluster to deliver real-time cluster-wide visibility into the resource consumption and communication patterns of all microservices running on the cluster.

Tuesday, April 12, 2016

Lasers!

Cool! You mean that I actually have frickin' switches with frickin' laser beams attached to their frickin' ports?

Dr. Evil is right, lasers are cool! The draft sFlow Optical Interface Structures specification exports metrics gathered from instrumentation built into Small Form-factor Pluggable (SFP) and Quad Small Form-factor Pluggable (QSFP) optics modules. This article provides some background on optical modules and discusses the value of including optical metrics in the sFlow telemetry stream exported from switches and hosts.

Pluggable optical modules are intelligent devices that do more than simply convert between optical and electrical signals. The functional diagram below shows the elements within a pluggable optical module.

The transmit and receive functions are shown in the upper half of the diagram. Incoming optical signals are received on a fiber and amplified as they are converted to electrical signals that can be handled by the switch. Transmit data drives the modulation of a laser diode which transmits the optical signal down a fiber.

The bottom half of the diagram shows the management components. Power, voltage and temperature sensors are monitored and the results are written into registers in an EEPROM that are accessible via a management interface.

The proposed sFlow extension standardizes the export of the module sensor data so that they can be exported along with switch port interface counters. A standard structure ensures multi-vendor interoperability and including the optical metrics as part of the sFlow export provides a scaleable method of monitoring all the optical modules in the network.

While the measurements from a single module are useful, the value increases when measurements from all modules can be combined at the sFlow collector. For example, matching pairs of modules allows problems with the transmitter, receiver and the cable connecting them to be isolated.

The sFlow counter push mechanism is an extremely efficient method of monitoring at scale and can collect metrics from all the optical modules in the data center. Combining data from all the modules makes it easier to find outliers.

Finally, incorporating optics monitoring as part of the comprehensive sFlow telemetry stream allows optical metrics to be correlated with switch port, traffic flow and application performance metrics. For example, an increase in application response time can be traced to the paths that the traffic takes across the network, packet discard rates on the ports, and signal strength from the optical sensors, to find a marginal fiber link.

Monday, April 11, 2016

Minimizing cost of visibility

Visibility allows orchestration systems (OpenDaylight, ONOS, OpenStack Heat, Kubernetes, Docker Storm, Apache Mesos, etc.) to adapt to changing demand by targeting resources where they are needed to increase efficiency, improve performance, and reduce costs. However, the overhead of monitoring must be low in order to realize the benefits.

An analogous observation that readers may be familiar with is the importance of minimizing costs when investing in order to maximize returns - see Vanguard Principle 3: Minimize cost

Suppose that a 100 server pool is being monitored and visibility will allow the orchestration system to realize a 10% improvement by better workload scheduling and placement - increasing the pool's capacity by 10% without the need to add an additional 10 servers and saving the associated CAPEX/OPEX costs.

The chart shows the impact that measurement overhead has in realizing the potential gains in this example. If the measurement overhead is 0%, then the 10% performance gain is fully realized. However, even a relatively modest 2% measurement overhead reduces the potential improvement to just under 8% (over a 20% drop in the potential gains). A 9% measurement overhead wipes out the potential efficiency gain and measurement overheads greater than 9% result in a net loss of capacity.

More specifically, Optimizing software defined data center and Microservices discuss the critical role of network visibility in improving cloud computing performance. Consider the task of monitoring network activity in a high traffic Docker cluster running on the 100 server pool. High performance network monitoring solutions often require at least one dedicated CPU core (if Intel DPDK, or an equivalent technology, is used to accelerate network instrumentation). Suppose the server has 24 cores, dedicating one core to monitoring is a 4.2% measurement overhead and reduces the potential efficiency gain from 10% to 5% (a drop of nearly 50%). On the other hand, industry standard sFlow uses instrumentation built into hardware and software data paths. Docker network visibility demonstration shows how Linux kernel instrumentation can be used to monitor traffic using less than 1% of 1 CPU core, an insignificant 0.04% measurement overhead that allows the orchestration system to achieve the full 10% efficiency gain.

To conclude, visibility is essential to the operation of cloud infrastructure and can drive greater efficiency. However, the net gains in efficiency are significantly affected by any overhead imposed by monitoring. Industry standard sFlow measurement technology is widely supported, minimizes overhead, and ensure that efficiency gains are fully realizable.

Saturday, April 9, 2016

Docker network visibility demonstration

The 2 minute live demonstration shows how the open source Host sFlow agent can be used to efficiently monitor Docker networking in production environments. The demonstration shows real-time tracking of 30Gbit/s traffic flows using less than 1% of a single processor core.

Saturday, April 2, 2016

Internet Exchange (IX) Metrics

IX Metrics has been released on GitHub, https://github.com/sflow-rt/ix-metrics. The application provides real-time monitoring of traffic between members in an Internet Exchange (IX).

Close monitoring of exchange traffic is critical to operations:

Ensure that there is sufficient capacity to accommodate new and existing members.
Ensure that all traffic sources are accounted for and that there are no unauthorized connections.
Ensure that only allowed traffic types are present.
Ensure that non-unicast traffic is strictly controlled.
Ensure that packet size policies are controlled to avoid loss due to MTU mismatches.

IX Metrics imports information about exchange members using the IX Member List JSON Schema. The member information is used to create traffic analytics and traffic is checked against the schema to identify errors, for example, if a member is using a MAC address that isn't listed.

The measurements from the exchange infrastructure are useful to members since it allows them to easily see how much traffic they are exchanging with other members through their peering relationships. This information is easy to collect using the exchange infrastructure, but much harder for members to determine independently.

The sFlow standard has long been a popular method of monitoring exchanges for a number of reasons:

sFlow instrumentation is built into high capacity data center switches used in exchanges.
Exchanges handle large amounts of traffic, many Terabits/second in large exchanges and sFlow instrumentation has the scaleability to monitor the entire infrastructure.
Exchanges typically operate at layer 2 (i.e. traffic is switched, not routed between members) and sFlow provides detailed visibility into layer two traffic, including: packet sizes, Ethernet protocols, MAC addresses, VLANs, etc.

Leveraging the real-time analytics capabilities of sFlow-RT allows the IX Metrics application to provide up to the second visibility into exchange traffic. The IX Metrics application can also export metrics to InfluxDB to support operations and member facing dashboards (built using Grafana for example). In addition, export of notification using syslog supports integration with Security Information and Event Management (SIEM) tools like Logstash.

IX Metrics is open source software that can easily be modified to integrate with other tools and additional applications can be installed on the sFlow-RT platform alongside the IX Metrics application, for example: