Friday, August 31, 2012

Cisco adds sFlow support

Cisco Nexus 3000 series switches
Cisco added support for the sFlow standard in the latest NX-OS 5.0(3)U4(1) release for Nexus 3000 series switches. The Nexus 3000 series are the first Cisco switches based on merchant silicon, which includes hardware support for sFlow, offering scalable, wire-speed, monitoring of all traffic flowing throughout entire networks of Nexus 3000 series switches.
Example: sFlowTrend Top connections chart
The article, 10 Gigabit Ethernet, describes the trend toward 10 Gigabit networking and the critical role that top of rack switches play in next generation data center architectures. Most organisations are predicted to upgrade to 10 Gigabit top of rack switches within the next two years in order to support the demands of virtualization and cloud computing. With the addition of Cisco, all leading switch vendors now have 10 Gigabit top of rack switches that support the sFlow standard, making sFlow the obvious choice when selecting a vendor neutral performance monitoring solution for large scale cloud environments.

Since the Nexus 3000 series switches are the first Cisco products with sFlow, the rest of this article is addressed to Cisco network administrators who are likely to be unfamiliar with sFlow technology. As a Cisco network administrator, you are likely to have experience with using Cisco's Switched Port Analyzer (SPAN) technology to selectively monitor traffic in Cisco edge switches and with Cisco's Netflow technology for monitoring TCP/IP traffic in Cisco routers.

By adding sFlow support to the Nexus 3000 series, Cisco eliminates the need for probes, providing wire-speed 10 Gigabit monitoring of all switch ports - the functional equivalent of forty-eight 10 Gigabit probes and four 40 Gigabit probes in a Nexus 3064 - embedded in the switch hardware at no extra cost. If you are familiar with RMON probes, sFlow is functionally equivalent to deploying an RMON probe for each switch port.

Based on the name, you might think that sFlow is just another version of Cisco Netflow. However, this is not the case - sFlow differs significantly from NetFlow and understanding these differences is important if you want to get the most out of sFlow:
  1. sFlow exports interface counters, eliminating the need for SNMP polling - extremely useful when you have tens of thousands of edge switch ports to monitor.
  2. sFlow exports packet headers not flow records. By exporting packet headers, sFlow is able to provide full layer 2 - 7 visibility into all types of traffic flowing at the network edge, including: MAC addresses, VLANs, TRILL, tunnels (GRE, VXLAN etc.), Ethernet SAN traffic (FCoE and AoE), IPv6 in addition to the TCP/IP information typically reported by NetFlow. You can even use sFlow with Wireshark for remote packet capture.
  3. sFlow is highly scalable. Unlike NetFlow, which is typically enabled on selected links at the core, sFlow is enabled on every port, on every switch, for full end-to-end network visibility. The sFlow measurements are implemented in silicon and won't impact switch CPU. The scalability of sFlow allows tens of thousands of 10G switch ports in the top of rack switches, as well as their 40 Gigabit uplink ports, to be centrally monitored. In addition, sFlow is available in 100 Gigabit switches, ensuring visibility as higher speed interconnects are deployed to support the growing 10 Gigabit edge.
  4. sFlow is easy to configure and manage. Eliminating complexity is essential for large scale web 2.0, big data, virtualization and cloud deployments.
  5. sFlow is a multi-vendor standard supported by almost every network equipment vendor. You can mix and match Cisco Nexus 3000 series switches with best in class solutions from other vendors and still maintain comprehensive, interoperable, data center wide visibility.
  6. sFlow is not just for switches. The sFlow standard also provides visibility into server, storage, virtual machine and application performance, helping to break down management silos by providing a consistent view of performance to operations and development teams (see DevOps).
  7. sFlow functionality is determined by the choice of sFlow analyzer. With Flexible NetFlow, much of the analysis is performed on the network device, limiting the functionality of NetFlow collectors to simply recording the data and generating reports. As a result, NetFlow collectors end up being fairly generic in functionality. In contrast, sFlow shifts analysis from the switches to a central sFlow analyzer which determines how to process the data and present the results, see Choosing an sFlow analyzer. The result is a greater diversity of solutions and there is likely to be an sFlow analyzer that is particularly well adapted to your requirements. While many NetFlow collectors claim sFlow support, their support tends to be limited, ignoring sFlow specific features and treating sFlow as if it were basic NetFlow version 5.
Trying out sFlow is easy, just upgrade to the latest NX-OS release, configure sFlow export, and install the free sFlowTrend analyzer to gain real-time visibility - providing immediate answers to the Who, What, Where, When, Why and How questions that are the key to effective management.

2 comments:

  1. Actually Cisco supporting sFlow on a Broadcom based switch is nothing to brag about, since the Broadcom ASICs do not support IPfix (IETF Standard based on NetFlow version 9) or NetFlow and have no roadmap for supporting either of these features in the future.

    The author of this article actually should do a little investigation and reading before stating that sFlow is a standard, because RFC 3176 , which is an Informational RFC (meaning not a standard or better yet, here is the disclaimer from the IETF – This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.) is based version 4 sFlow and not the current version 5 sFlow.

    Next, sFlow is about as accurate as nothing but gives some people the feeling of reality. sFlow is a sample of a sample, meaning that sFlow samples the first 128 bytes (default value for sFlow version 5) of an Ethernet frame. And just too make sure it isn't accurate, sFlow will only take samples based on a down counter of say 1 in every 2048 packets (reference sample rate and most implementation).

    You might ask, "Why 1 in every 2048 packet and not 1 in every 512?" Simple reason is that while the frame is sampled in hardware, the actual creation and export of EVERY frame sampled is generated by the CPU on the switch. Just think on that, every sampled frame is forwarded to your CPU and than crafted into a single frame with other information and forwarded by the CPU back to hardware from transmission to a destination sFlow collector.

    Just to put this in perspective, the Juniper is so concerned about CPU resources, that they have limited the sFlow samples to a maximum of 300 samples per second per switch, including the EX8200, regardless of traffic load. I guess that should send a red flag up about the impact sFlow has on the CPU. Here is a factoid to think on, even if the frame size is 512 bytes and the load on every port of a 48 port 10GE switch (Nexus 3064) is just 1%, means each port is transmitting (or receiving) 23,496 pps and based on 1:2048 you gathered a whooping 11 samples (partial frames don't count) or 0.05% of the traffic load. That means you are missing 99.95% of the traffic on every interface at that rate.

    So how’s your traffic analytical skills? Remember, you are basing you entire network analysis on the fact that you are missing 99.95% of the data? Good Luck with that number!

    ReplyDelete
    Replies
    1. Robert, it would make no sense to implement NetFlow/IPFIX on the Broadcom ASICs. High port density, buffers, CAM space and low latency are higher priorities when deciding how allocate resources on a switch ASIC. The advantage of sFlow is that it provides a way to instrument high performance switches without consuming excessive resources or impacting performance. If your only choice is NetFlow, then you end up having to add expensive management cards, or with no monitoring at all on many platforms - Cisco Catalyst 3k, 4k, Nexus 5k etc. The paper, Traffic Monitoring in a Switched Environment is a bit dated, but describes some of the considerations when including monitoring within a high speed switch.

      RFC 3176 is deprecated and as you say, very few vendors implement sFlow version 4. The current version of the sFlow standard (version 5) is maintained by the sFlow.org industry consortium, not the IETF, or the IEEE. For some background, see Standards.

      As you point out, sFlow specifies a particular sampling algorithm that all devices must implement. When performed correctly, hardware-based, random 1-in-N sampling produces surprisingly accurate results - many service providers use sFlow data for traffic accounting. The paper, Packet Sampling Basics describes the mathematics behind sFlow's sampling algorithm. The surprising result is that accuracy depends on the number of samples generated, not the total number of packets on the network.

      In your example, you looked at a Nexus 3064 generating 11 samples per second per port (0.05% of the traffic). You ask how you can get useful results when you are missing 99.95% of the traffic. In this case, the only number that matters is the number of samples being generated - 11 per second. If your sFlow analyzer is reporting per port traffic information every minute, then the results will be based on a sample size of 660. If you are rolling up traffic for the whole switch, then you have a sample size of 31,600 and if you are monitoring all the switches in your data center you would have a considerably larger sample size. Now consider the types of question you might want to ask, for example how much storage traffic (NFS, iSCSI, FCoE) is being carried on the network. Typically storage traffic will constitute a significant portion of data center traffic, so let's assume that the actual number is 60%. On a per-port basis the sFlow analyzer would produce a result in the range 54-66%. On a per switch basis, the range would be 59.63-60.37%, and finally if you want a data center wide number you would be extremely close to the correct answer. The per-minute answers are usefully more than accurate enough for troubleshooting, congestions management, DDoS analysis etc. For accounting and capacity planning purposes you typically consider longer time period. For example, if you wanted to generate a monthly report based on the single link's data, you would have a sample size of over 28 million, giving an extremely accurate measurement of the traffic.

      The goal with sFlow is to sample as little of the traffic as possible, while still generating acceptable accuracy since this increases the scalability of the overall monitoring system.

      While it's useful to understand the underlying sampling technology, the calculations don't need to be performed manually. A good sFlow analyzer will automatically perform the calculations and present results; sFlowTrend is free, properly scales sFlow, and is a good way to familiarize yourself with sFlow.

      Not all sampling schemes produce accurate results. The sFlow standard specifies a particular scheme that is proven to work. NetFlow and IPFIX allow many types of sampling, many of which produce questionable results and give sampling a bad name.

      Delete