Friday, August 31, 2012

Cisco adds sFlow support

Cisco Nexus 3000 series switches
Cisco added support for the sFlow standard in the latest NX-OS 5.0(3)U4(1) release for Nexus 3000 series switches. The Nexus 3000 series are the first Cisco switches based on merchant silicon, which includes hardware support for sFlow, offering scalable, wire-speed, monitoring of all traffic flowing throughout entire networks of Nexus 3000 series switches.
Example: sFlowTrend Top connections chart
The article, 10 Gigabit Ethernet, describes the trend toward 10 Gigabit networking and the critical role that top of rack switches play in next generation data center architectures. Most organisations are predicted to upgrade to 10 Gigabit top of rack switches within the next two years in order to support the demands of virtualization and cloud computing. With the addition of Cisco, all leading switch vendors now have 10 Gigabit top of rack switches that support the sFlow standard, making sFlow the obvious choice when selecting a vendor neutral performance monitoring solution for large scale cloud environments.

Since the Nexus 3000 series switches are the first Cisco products with sFlow, the rest of this article is addressed to Cisco network administrators who are likely to be unfamiliar with sFlow technology. As a Cisco network administrator, you are likely to have experience with using Cisco's Switched Port Analyzer (SPAN) technology to selectively monitor traffic in Cisco edge switches and with Cisco's Netflow technology for monitoring TCP/IP traffic in Cisco routers.

By adding sFlow support to the Nexus 3000 series, Cisco eliminates the need for probes, providing wire-speed 10 Gigabit monitoring of all switch ports - the functional equivalent of forty-eight 10 Gigabit probes and four 40 Gigabit probes in a Nexus 3064 - embedded in the switch hardware at no extra cost. If you are familiar with RMON probes, sFlow is functionally equivalent to deploying an RMON probe for each switch port.

Based on the name, you might think that sFlow is just another version of Cisco Netflow. However, this is not the case - sFlow differs significantly from NetFlow and understanding these differences is important if you want to get the most out of sFlow:
  1. sFlow exports interface counters, eliminating the need for SNMP polling - extremely useful when you have tens of thousands of edge switch ports to monitor.
  2. sFlow exports packet headers not flow records. By exporting packet headers, sFlow is able to provide full layer 2 - 7 visibility into all types of traffic flowing at the network edge, including: MAC addresses, VLANs, TRILL, tunnels (GRE, VXLAN etc.), Ethernet SAN traffic (FCoE and AoE), IPv6 in addition to the TCP/IP information typically reported by NetFlow. You can even use sFlow with Wireshark for remote packet capture.
  3. sFlow is highly scalable. Unlike NetFlow, which is typically enabled on selected links at the core, sFlow is enabled on every port, on every switch, for full end-to-end network visibility. The sFlow measurements are implemented in silicon and won't impact switch CPU. The scalability of sFlow allows tens of thousands of 10G switch ports in the top of rack switches, as well as their 40 Gigabit uplink ports, to be centrally monitored. In addition, sFlow is available in 100 Gigabit switches, ensuring visibility as higher speed interconnects are deployed to support the growing 10 Gigabit edge.
  4. sFlow is easy to configure and manage. Eliminating complexity is essential for large scale web 2.0, big data, virtualization and cloud deployments.
  5. sFlow is a multi-vendor standard supported by almost every network equipment vendor. You can mix and match Cisco Nexus 3000 series switches with best in class solutions from other vendors and still maintain comprehensive, interoperable, data center wide visibility.
  6. sFlow is not just for switches. The sFlow standard also provides visibility into server, storage, virtual machine and application performance, helping to break down management silos by providing a consistent view of performance to operations and development teams (see DevOps).
  7. sFlow functionality is determined by the choice of sFlow analyzer. With Flexible NetFlow, much of the analysis is performed on the network device, limiting the functionality of NetFlow collectors to simply recording the data and generating reports. As a result, NetFlow collectors end up being fairly generic in functionality. In contrast, sFlow shifts analysis from the switches to a central sFlow analyzer which determines how to process the data and present the results, see Choosing an sFlow analyzer. The result is a greater diversity of solutions and there is likely to be an sFlow analyzer that is particularly well adapted to your requirements. While many NetFlow collectors claim sFlow support, their support tends to be limited, ignoring sFlow specific features and treating sFlow as if it were basic NetFlow version 5.
Trying out sFlow is easy, just upgrade to the latest NX-OS release, configure sFlow export, and install the free sFlowTrend analyzer to gain real-time visibility - providing immediate answers to the Who, What, Where, When, Why and How questions that are the key to effective management.

Configuring Cisco switches

The following commands configure a Cisco switch (10.0.0.250), sampling packets at 1-in-5000, polling counters every 20 seconds and sending sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343):

switch# configure terminal
switch(config)# feature sflow
switch(config)# sflow agent-ip 10.0.0.250
switch(config)# sflow sampling-rate 5000
switch(config)# sflow counter-poll-interval 20
switch(config)# sflow collector-ip 10.0.0.50 vrf default
switch(config)# sflow data-source interface ethernet 1/1
...
switch(config)# sflow data-source interface ethernet 1/24
switch(config)# copy running-config startup-config

A previous posting discussed the selection of sampling rates. Additional information can be found on the Cisco web site.

See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.

Thursday, August 30, 2012

Configuring a ZTE switch

The following commands configure a ZTE switch (10.0.0.254) to sample packets at 1-in-1024, poll counters every 30 seconds and send sFlow to an analyzer (10.0.0.50) using the default sFlow port 6343:
sflow enable
sflow agent-config ipv4-address 10.0.0.254
sflow collector-config ipv4-address 10.0.0.50 6343
For each interface:
interface gei_1/1
sflow-sample-rate ingress 1024
sflow-sample-rate egress 1024
exit
A previous posting discussed the selection of sampling rates. Additional information can be found on the ZTE web site.

Note: For bi-directional sampling, the ingress and egress sampling rates must be set to the same value since sFlow doesn't support asymmetric sampling on an interface. Ingress only or egress only sampling is supported, so you can disable sampling in either direction. The ZTE documentation doesn't describe how to configure counter polling. Counter export is a required component of the sFlow protocol, so this is either an omission in documentation, or a defect that should be fixed in a future release.

See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.

Wednesday, August 29, 2012

Configuring Edge-Core switches

The following commands configure an Edge-Core switch to sample packets at 1-in-1024, poll counters every 30 seconds and send sFlow to an analyzer (10.0.0.50) using the default sFlow port 6343:
sflow receiver 1 10.0.0.50 port 6343
For each interface:
interface te1
sflow flow-sampling 1024 1
sflow counter-sampling 20 1
A previous posting discussed the selection of sampling rates. Additional information can be found on the Edge-Core web site.

See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.

Monday, August 27, 2012

Push vs Pull

Push-me-pull-you from Doctor Doolittle
There are two major performance monitoring architectures:
  • Push, metrics are periodically sent by each monitored system to a central collector. Examples of push architectures include: sFlow, Ganglia, Graphite, collectd and StatsD.
  • Pull, a central collector periodically requests metrics from each monitored system. Examples of pull architectures include: SNMP, JMX, WMI and libvirt.
The remainder of this article will explore some of the strengths and weaknesses of push and pull architectures:


PushPull
Discovery Agent automatically sends metrics as soon as it starts up, ensuring that it is immediately detected and continuously monitored. Speed of discovery is independent of number of agents. Discovery requires collector to periodically sweep address space to find new agents. Speed of discovery depends on discovery sweep interval and size of address space.
Scalability Polling task fully distributed among agents, resulting in linear scalability. Lightweight central collector listens for updates and stores measurements. Minimal work for agents to periodically send fixed set of measurements. Agents are stateless, exporting data as soon as it is generated. Workload on central poller increases with the number of devices polled. Additional work on poller to generate requests and maintaining session state in order to match requests and responses. Additional work for agents to parse and process requests. Agents often required to maintain state so that metrics can be retrieved at a later time by the poller.
Security Push agents are inherently secure against remote attacks since they do not listen for network connections. Polling protocol can potentially open up system to remote access and denial of service attacks.
Operational Complexity Minimal configuration required for agents: polling interval and address of collector. Firewalls need to be configured for unidirectional communication of measurements from agents to collector. Poller needs to be configured with list of devices to poll, security credentials to access the devices and the set of measurements to retrieve. Firewalls need to be configured to allow bi-directional communication between poller and agents.
Latency The low overhead and distributed nature of the push model permits measurement to be sent more frequently, allowing the management system to quickly react to changes. In addition, many push protocols, like sFlow, are implemented on top of UDP, providing non-blocking, low-latency transport of measurements. The lack of scalability in polling typically means that measurements are retrieved less often, resulting in a delayed view of performance that makes the management system less responsive to changes. The two way communication involved in polling increases latency as connections are established and authenticated before measurements can be retrieved.
FlexibilityRelatively inflexible: pre-determined, fixed set of measurements are periodically exported. Flexible: poller can ask for any metric at any time.

The push model is particularly attractive for large scale cloud environments where services and hosts are constantly being added, removed, started and stopped. Maintaining lists of devices to poll for statistics in these environments is challenging and the discovery, scalability, security, low-latency and the simplicity of the push model make it a clear winner.

The sFlow standard is particularly well suited to large scale monitoring of cloud infrastructures, delivering the comprehensive visibility into the performance of network, compute and application resources needed for effective management and control.

In practice, a hybrid approach provides the best overall solution. The core set of standard metrics needed to manage performance and detect problems is pushed using sFlow and a pull protocol is used to retrieve diagnostic information from specific devices when a problem is detected.

Friday, August 24, 2012

Configuring Huawei switches

The following commands configure a Huawei switch (10.0.0.254) to sample packets at 1-in-1024, poll counters every 30 seconds and send sFlow to an analyzer (10.0.0.50) using the default sFlow port 6343:
system-view
sflow collector 1 ip 10.0.0.50 port 6343
sflow agent ip 10.0.0.254
For each interface:
system-view
interface gigabitethernet 1/0/2
sflow flow-sampling collector 1
sflow flow-sampling rate 1024
sflow counter-sampling collector 1
sflow counter-sampling interval 30
A previous posting discussed the selection of sampling rates. Additional information can be found on the Huawei web site.

See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.