This article describes how to enable industry standard sFlow telemetry using the open source Host sFlow agent. The Host sFlow agent uses Control Plane Services (CPS) to configure sFlow instrumentation in the hardware and gather metrics. CPS in turn uses the Open Compute Project (OCP) Switch Abstraction Interface (SAI) as a vendor independent method of configuring the hardware. Hardware support for sFlow is a standard feature supported by Network Processing Unit (NPU) vendors (Barefoot, Broadcom, Cavium, Innovium, Intel, Marvell, Mellanox, etc.) and vendor neutral sFlow configuration is part of the SAI.
Installing and configuring Host sFlow agent
Installing the software is simple. Log into the switch and type the following commands:wget --no-check-certificate https://github.com/sflow/host-sflow/releases/download/v2.0.17-1/hsflowd-opx_2.0.17-1_amd64.deb sudo dpkg -i hsflowd-opx_2.0.17-1_amd64.debThe sFlow agent requires very little configuration, automatically monitoring all switch ports using the following default settings:
Link Speed | Sampling Rate | Polling Interval |
---|---|---|
1 Gbit/s | 1-in-1,000 | 30 seconds |
10 Gbit/s | 1-in-10,000 | 30 seconds |
25 Gbit/s | 1-in-25,000 | 30 seconds |
40 Gbit/s | 1-in-40,000 | 30 seconds |
50 Gbit/s | 1-in-50,000 | 30 seconds |
100 Gbit/s | 1-in-100,000 | 30 seconds |
Note: The default settings ensure that large flows (defined as consuming 10% of link bandwidth) are detected within approximately 1 second - see Large flow detection
Edit the /etc/hsflowd.conf file to specify the address of an sFlow analyzer (10.0.0.1):
sflow { collector { ip = 10.0.0.1 } }Monitoring Linux services describes how configure Host sFlow to include detailed telemetry for all services running on OpenSwitch:
- bind9.service
- cron.service
- dbus.service
- getty@tty1.service
- getty@tty2.service
- getty@tty3.service
- getty@tty4.service
- getty@tty5.service
- getty@tty6.service
- hsflowd.service
- lldpd.service
- networking.service
- opx-alms.service
- opx-cps.service
- opx-front-panel-ports.service
- opx-ip.service
- opx-monitor-phy-media.service
- opx-nas-shell.service
- opx-nas.service
- opx-nbmgr.service
- opx-phy-media-config.service
- opx-tmpctl.service
- polkitd.service
- redis-server.service
- rsyslog.service
- snmpd.service
- ssh.service
- systemd-journald.service
- systemd-logind.service
- systemd-udevd.service
Finally, start the Host sFlow agent:
sudo systemctl enable hsflowd sudo systemctl start hsflowdUsing the Host sFlow agent to monitor Linux servers and switches provides a consistent set of measurements end-to-end, particularly for cloud infrastructure such as OpenStack and Docker where the network extends into the servers in the form of virtual switches and routers.
Collecting and analyzing sFlow
Visibility and the software defined data center describes the general architecture of sFlow monitoring. Standard sFlow agents embedded within the elements of the infrastructure stream essential performance metrics to management tools, ensuring that every resource in a dynamic cloud infrastructure is immediately detected and continuously monitored.The Host sFlow agent on OpenSwitch streams standard Linux performance statistics in addition to the interface counters and packet samples that you would typically get from a networking device.
Note: Enhanced visibility into host performance is particularly important on open switch platforms since they may be running a number of user installed services that can stress the limited CPU, memory and IO resources.For example, the following sflowtool output shows the raw data contained in an sFlow datagram:
startDatagram ================================= datagramSourceIP 10.0.0.59 datagramSize 1332 unixSecondsUTC 1516946395 datagramVersion 5 agentSubId 100000 agent 10.0.0.59 packetSequenceNo 340132 sysUpTime 17479000 samplesInPacket 7 startSample ---------------------- sampleType_tag 0:2 sampleType COUNTERSSAMPLE sampleSequenceNo 876 sourceId 2:1 counterBlock_tag 0:2001 counterBlock_tag 0:2005 disk_total 8102721536 disk_free 5178248192 disk_partition_max_used 37.77 disk_reads 25339 disk_bytes_read 562041856 disk_read_time 25380 disk_writes 3192551 disk_bytes_written 28776890368 disk_write_time 1043712 counterBlock_tag 0:2004 mem_total 2107891712 mem_free 142082048 mem_shared 0 mem_buffers 155873280 mem_cached 1611935744 swap_total 0 swap_free 0 page_in 184268 page_out 9367478 swap_in 0 swap_out 0 counterBlock_tag 0:2003 cpu_load_one 0.010 cpu_load_five 0.030 cpu_load_fifteen 0.000 cpu_proc_run 2 cpu_proc_total 167 cpu_num 2 cpu_speed 2699 cpu_uptime 3541814 cpu_user 3336490 cpu_nice 0 cpu_system 5479320 cpu_idle 2754958964 cpu_wio 168960 cpuintr 160 cpu_sintr 2717250 cpuinterrupts 656232310 cpu_contexts 1704273704 cpu_steal 0 cpu_guest 0 cpu_guest_nice 0 counterBlock_tag 0:2006 nio_bytes_in 267777 nio_pkts_in 4210 nio_errs_in 0 nio_drops_in 0 nio_bytes_out 2104528 nio_pkts_out 2227 nio_errs_out 0 nio_drops_out 0 counterBlock_tag 0:2000 hostname opx2_vm UUID 40-d4-8b-d5-6b-29-4e-4a-be-48-d6-55-8d-f6-81-73 machine_type 3 os_name 2 os_release 3.16.0-4-amd64 endSample ---------------------- startSample ---------------------- sampleType_tag 0:2 sampleType COUNTERSSAMPLE sampleSequenceNo 876 sourceId 0:44 counterBlock_tag 0:1005 ifName e101-001-0 counterBlock_tag 0:1 ifIndex 3 networkType 6 ifSpeed 0 ifDirection 2 ifStatus 0 ifInOctets 0 ifInUcastPkts 0 ifInMulticastPkts 0 ifInBroadcastPkts 0 ifInDiscards 0 ifInErrors 0 ifInUnknownProtos 4294967295 ifOutOctets 0 ifOutUcastPkts 0 ifOutMulticastPkts 0 ifOutBroadcastPkts 0 ifOutDiscards 0 ifOutErrors 0 ifPromiscuousMode 0 endSample ---------------------- startSample ---------------------- sampleType_tag 0:1 sampleType FLOWSAMPLE sampleSequenceNo 1022129 sourceId 0:7 meanSkipCount 128 samplePool 130832512 dropEvents 0 inputPort 7 outputPort 10 flowBlock_tag 0:1 flowSampleType HEADER headerProtocol 1 sampledPacketSize 1518 strippedBytes 4 headerLen 128 headerBytes 6C-64-1A-00-04-5E-E8-E7-32-77-E2-B5-08-00-45-00-05-DC-63-06-40-00-40-06-9E-21-0A-64-0A-97-0A-64-14-96-9A-6D-13-89-4A-0C-4A-42-EA-3C-14-B5-80-10-00-2E-AB-45-00-00-01-01-08-0A-5D-B2-EB-A5-15-ED-48-B7-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35 dstMAC 6c641a00045e srcMAC e8e73277e2b5 IPSize 1500 ip.tot_len 1500 srcIP 10.100.10.151 dstIP 10.100.20.150 IPProtocol 6 IPTOS 0 IPTTL 64 TCPSrcPort 39533 TCPDstPort 5001 TCPFlags 16 endSample ----------------------Note: The Linux host metrics (red), network interface counters (green), and packet sample information (blue) have been highlighted.
sflowtool has a number of additional uses:
- Verifying that sFlow is being received correctly at the destination
- Converting binary sFlow data into ASCII for scripted analysis (Python, Perl etc.)
- Converting sFlow into IPFIX/NetFlow
- Converting sFlow into PCAP format for use with tcpdump, Wireshark, etc.
- Replicate sFlow streams for multiple collectors
- Source code for sFlow decoder that can be used to build custom sFlow analyzer
A key feature of sFlow telemetry is the low latency network-wide visibility that is possible because of the stateless nature of the measurements. Comprehensive real-time visibility is an essential building block that provides feedback for operations, automation, and control. Articles on this blog use the sFlow-RT analyzer to demonstrate use cases for real-time telemetry, including:
The programmability of an open Linux network operating system combined with real-time visibility is transformative, providing the foundation services necessary for solutions that automatically adapt the network to changing demands.