As virtualization shifts the network edge from top of rack switches to software virtual switches running on the hypervisors; visibility in the virtual switching layer is essential in order to provide network, server and storage management teams with the information needed to coordinate resources and ensure optimal performance.
The recent release of Citrix XenServer 6.0 provides an opportunity for a side-by-side comparison of sFlow and NetFlow monitoring technologies since both protocols are supported by the Open vSwitch that is now the default XenServer network stack.
The diagram above shows the experimental setup. Traffic between the virtual machines VM1 and VM2 passes through the Virtual Switch where sFlow and NetFlow measurements are simultaneously generated. The sFlow is sent to an sFlow Analyzer (InMon sFlowTrend) and the NetFlow to a NetFlow Analyzer (SolarWinds Real-Time NetFlow Analyzer). Both tools are running in tandem making it is easy to perform side by side comparisons to see differences in the visibility that NetFlow and sFlow provide into the same underlying traffic.
Note: XenServer 6.0, sFlowTrend and Real-Time NetFlow Analyzer are all available at no charge, making it easy for anyone to reproduce these tests.
Configuration
The Host sFlow supplemental pack was installed to automate sFlow configuration of the Open vSwitch and to export standard sFlow Host metrics. The following /etc/hsflowd.conf file sets the packet sampling rate to 1-in-400, counter polling interval to 20 seconds and sends sFlow to sFlowTrend running on host 10.0.0.42 and listening on UDP port 6343.sflow{ DNSSD = off polling = 20 sampling = 400 collector{ ip = 10.0.0.42 udpport = 6343 } }
The following command was used to manually configure NetFlow monitoring, sending NetFlow to the Real-Time NetFlow Analyzer running on host 10.0.0.42 and listening on UDP port 2055:
ovs−vsctl −− set Bridge xenbr0 netflow=@nf \ −− −−id=@nf create NetFlow targets=\"10.0.0.42:2055\" active−timeout=60
Results
The following charts show the top protocols measured using sFlow and NetFlow:Top Protocols in sFlowTrend |
Top Protocols in Real-Time NetFlow Analyzer |
Note: The overhead associated with Ethernet headers and tunneling protocols can represent a significant fraction of overall bandwidth. By exporting packet headers, sFlow provides detailed information on the encapsulations and their overhead. NetFlow does not provide a direct measure of total bandwidth.
The periodic, 60 second, spikes in traffic shown on the NetFlow Analyzer chart are an artifact of the way NetFlow reports on long running connections. With NetFlow, packet and byte counters are maintained for each connection in a flow cache within the switch. When the connection terminates, a flow record is generated containing the connection information and counters. The active-timeout setting in the NetFlow configuration is used to ensure visibility into long running connections, causing the switch to periodically export NetFlow records for active connections. In contrast, sFlow does not use a flow cache, instead sampled packet headers are continually exported, resulting in real-time charts that more accurately reflect the traffic trend.
In addition, exporting packet headers allows an sFlow analyzer to monitor all types of traffic flowing across the switch; note the ARP and IPv6 traffic displayed in sFlowTrend in addition to the TCP/UDP flows. Visibility into layer 2 traffic is particularly important in switched environments where protocols such as DHCP/BOOTP, STP, LLDP and ARP need to be closely managed. sFlow also provides visibility into networked storage, including Ethernet SAN technologies (e.g. FCoE or AoE), that typically dominates bandwidth usage in the data center. Looking forward, there are a number of tunneling protocols being developed to connect virtual switches, including: GRE, mpls, VPLS, VXLAN and NVGRE. As new protocols are deployed on the network they are easily monitored without any change to exiting sFlow agents ensuring end-to-end visibility across the physical and virtual network.
In contrast, NetFlow relies on the switch to decode the traffic. In this case the switch is exporting NetFlow version 5 which only exports records for IPv4 traffic. The NetFlow analyzer is thus only able to report on IPv4 protocols, all other traffic is invisible. This limitation is not unique to Open vSwitch; NetFlow version 5 is the most widely supported version of NetFlow in network devices and is also the version exported by VMware vSphere 5.0.
The next two charts show top connections flowing through the virtual switch:
Top Connections in sFlowTrend |
Top Connections in Real-Time NetFlow Analyzer |
The next two charts show interface utilization and packet counts from sFlowTrend:
Link Utilization in sFlowTrend |
Link Counters in sFlowTrend |
In contrast, NetFlow only reports on traffic flows so neither of these charts is available in the NetFlow Analyzer. The remaining charts are based on sFlow counter data so there are no corresponding NetFlow Analyzer charts.
The next sFlowTrend chart shows the CPU load on the hypervisor:
Hypervisor CPU in sFlowTrend |
The next sFlowTrend chart shows a trend in disk IO on the virtual machine:
Virtual Machine Disk IO in sFlowTrend |
Comments
NetFlow provides limited visibility, focusing on layer 3 network connections. The NetFlow architecture relies on complex functionality within the switches and the complexity of configuring and maintaining NetFlow adds to operational costs and limits scalability. For example, gaining visibility into IPv6 traffic requires firmware (and often hardware) upgrades to the network infrastructure that can be challenging in large scale, always-on, cloud environments.In contrast, adding support for additional protocols in sFlow requires no change to the network infrastructure, but is simply a matter of upgrading the sFlow analyzer. The sFlow architecture eliminates complexity from the agents, increasing scalability and reducing the operational costs associated with configuration and maintenance. sFlow provides comprehensive visibility into network and system resources needed to manage performance in virtualized and cloud environments.