Saturday, September 22, 2012

Switch configurations

A large number of articles describing the steps to configure sFlow traffic monitoring have been published on this blog over the last few years. This article compiles a set of links in order to make the configurations easier to find.

Note: Trying out sFlow is easy, just follow the instructions to configure sFlow export and install the free sFlowTrend analyzer to gain real-time visibility - providing immediate answers to the Who, What, Where, When, Why and How questions that are the key to effective management.
Please contribute to this list, either by commenting on specific articles if they are incorrect or to provide configuration information for additional vendors or devices.

Wednesday, September 19, 2012

Packets and Flows

Figure 1: Sending a picture over a packet switched network
Figure 1 illustrates how data is transferred over a packet switched network. Host A is in the process of transferring a picture to Host B. The picture has been broken up into parts and each part is sent as a separate packet. Three packets containing parts 8, 9 and 10 are in transit and are in the process of being forwarded by switches Z, Y and X respectively.

In the example, Host A is responsible for breaking up the picture into parts and transmitting the packets. Host B is responsible for re-constructing the picture, detecting parts that are missing, corrupted, or delivered out of order and sending acknowledgement packets back to Host A, which is then responsible for resending packets if necessary.

The packet switches are responsible for transmitting, or forwarding, the packets. Each packet switch examines the destination address (e.g. To: B) and sends the packet on a link that will take the packet closer to its destination. The switches are unaware of the contents of the packets they forward, in this case the picture fragments and part numbers. Figure 1 only shows packets relating to the image transfer between Host A and Host B, but in reality the switches will be simultaneously forwarding packets from many other hosts.
Figure 2: Sorting mail
The mail sorting room shown in Figure 2 is a good analogy for the function performed by a packet switch. Letters arrive in the sorting room and are quickly placed into pigeon holes based on destination. The mail sorters don't know or care what's in the letters, they are focused on quickly reading the destination address on each envelope and placing the letter in a pigeon hole along with other letters to the same region so that the letters can be sent to another sorting facility closer to the destination.

In a packet switched network, each host and switch has a different perspective on data transfers and maintains different state in order to perform its task. Managing the performance of the communication system requires a correct understanding of the nature of the task that each element is responsible for and a way to monitor how effectively that task is being performed.

As an example of a poorly fitting model, consider the concept of "flow records" that are often presented as an intuitive way to monitor and understand traffic on packet switched networks. Continuing our example, the data transfer would be represented by two flow records, one accounting for packets from Host A to Host B and another accounting for packets from Host B to Host A.
Figure 3: Telephone operator
There is an inherent appeal in flow records since they are similar to the familiar "call records" that you see on a telephone bill, recording the number dialed, the time the call started and the call duration. However, as the switchboard and patch cords demonstrate in Figure 3, telephones networks are circuit switched, i.e. a dedicated circuit is set up between the two telephones involving all the switches in the path. It is easy to see how a circuit switch might easily generate call records by considering the manual operator. The operator just needs to note when they connected the call using the patch cord, who they connected to, and when they terminated the call by pulling the plug.

Viewing packet switches through the lens of a circuit oriented measurement is misleading. Start by considering the steps that the mail sorter in Figure 2 would have to go through in order to create flow records. The mail sorter would be requiring to keep track of the From: and To: address information on each letter, count the number letters that Host A sent to Host B, open the letters and peek inside to decide whether the letter was part of an existing conversation or the start of a new conversation. This task is extremely cumbersome and error prone, and the resulting flow records don't monitor the task that the mail sorter is actually performing; for example, flow records won't tell you how many postcards, letters and parcels were sorted.

Packet and circuit switched networks have very different characteristics and an effective monitoring system will collect measurements that are relevant to the performance of the network:
  • Circuit switches have a limited number of connections that they can handle and if more calls are attempted, calls are blocked (i.e. receive a busy signal). Blocking probabilities and sizing of circuit switches are analyzed using Erlang calculations.
  • Packet switches don't block. Instead packets are interleaved as they are forwarded. If the number of packets arriving exceeds the forwarding capacity of the switch, then packets may be delayed as they wait to be serviced, or be discarded if there are too many packets already waiting in the queue. Queuing delays and packet discard probabilities are analyzed using queuing theory.
To make the example in Figure 1 concrete, make Host A is an Apache web server, Host B a laptop running a web browser, and the picture transfer a response associated with the HTTP request

The following table compares the switch, host and application measurements provided by sFlow and NetFlow:

SwitchEach switch exports packet oriented measurements, exporting interface counters and randomly sampled packet headers and associated forwarding decisions.Switches exports connection oriented flow records that include source address, destination address, protocol, bytes, packets and duration.

Note: Many switches aren't capable of making these measurements and so go unmonitored.
HostThe server exports standard host metrics, including: CPU, memory and disk performance.None. NetFlow is generally only implemented in network devices.
ApplicationThe web server exports standard HTTP metrics that include request counts and randomly sampled web requests, providing detailed information such as the URL, referrer, server address, client address, user, browser, request bytes, response bytes, status and duration. The web server also reports maximum, idle and active workers.None. NetFlow is typically only implemented in network devices.

NetFlow takes a network centric view of measurement and tries to infer application behavior by examining packets in the network. NetFlow imposes a stateful, connection oriented, model on core devices that should be stateless. Unfortunately, the resulting flow measurements aren't a natural fit for packet switches, providing a distorted view of the operation of these devices. For example, the switch makes forwarding decisions on a packet by packet basis and these decisions can change over the lifetime of a flow. The packet oriented measurements made by sFlow accurately capture forwarding decisions, but flow oriented measurement can be misleading. Another example building on the mail sorting analogy in Figure 2; packet oriented measurements support analysis of small, large and jumbo frames (postcards, letters and parcels), but this detail is lost in flow records.

Flows are an abstraction that is useful for understanding end-to-end traffic traversing the packet switches. However, the flow abstraction describes connections created by the communication end points and to properly measure connection performance, one needs to instrument those end points. Hosts are responsible for initiating and terminating flows and are a natural place to report flows, but the traditional flow model ignores important detail that the host can provide. For example, the host is in a position to include important details about services such as user names, URLs, response times, and status codes as well as information about the computational resources needed to deliver the services; information that is essential for managing service capacity, utilization and response times.

While NetFlow is network centric and tries to infer information about applications from network packets (which is becoming increasingly difficult as more traffic is encrypted), the sFlow standard takes a systems approach, exposing information from network, servers and applications in order to provide a comprehensive view of performance.

Measurement isn't simply about producing pretty charts. The ultimate goal is to be able to act on the measurements and control performance. Control requires a model of behavior that allow performance to be predicted and measurements that characterize demand and show how close the system performance matches the predictions. The sFlow standard is well suited to automation, providing comprehensive measurements based on models of network, server and application performance. The Data center convergence, visibility and control presentation describes the critical role that measurement plays in managing costs and optimizing performance.

Tuesday, September 18, 2012

Configuring Hitachi switches

The following commands configure a Hitachi switch, sampling at 1-in-1024, polling counters every 30 seconds and sending sFlow to an analyzers ( over UDP using the default sFlow port (6343):
(config)# sflow yes
(config)# destination udp 6343
(config)# extended-information-type switch
(config)# polling-interval 30
(config)# port 1/1-24
[sflow port 1/1-24]
(config)# sample 1024
[sflow port 1/1-24]
(config)# exit
(config)# exit
A previous posting discussed the selection of sampling rates. Additional information can be found on the Hitachi web site.

See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.

Sunday, September 16, 2012

Host sFlow progress

The open source Host sFlow project has made significant progress over the last two years. The rapidly increasing number of downloads reflects the projects maturity and the unique functionality delivered by the sFlow standard in unifying network, server and application performance monitoring of large scale cloud environments.

Starting with Linux, the Host sFlow project has added support for FreeBSD, Solaris and Windows. Installed on hypervisors, Host sFlow reports on the performance of the hypervisor and all the virtual machines. Currently supported hypervisors include: Microsoft Hyper-V, Citrix XenServer, Xen Cloud Platform (XCP)KVM and libvirt. In addition, the Host sFlow agent is extensible and a growing number of projects implement sFlow instrumentation in virtual switches (Open vSwitch, Hyper-V extensible virtual switch), popular web servers (Apache, NGINX), Memcached, application servers (Java, Tomcat) and even in-house applications written in scripting languages (PHP, Python, Ruby, Perl).

Popular open source projects such as Ganglia and Graphite offer scalable collection and reporting of sFlow metrics from large scale compute, virtual machine pools, web farms, Java application servers and Memcache clusters.

Deploying Host sFlow agents on servers extends the sFlow monitoring built into networking devices from leading vendors, including: IBM, HP, Dell, Cisco, Juniper, Brocade, F5, Alcatel-Lucent, Arista Networks, Allied Telesis, Extreme Networks, Fortinet, Hitachi, Huawei, NEC, ZTE and others. The combination of sFlow in servers and switches delivers integrated, end-to-end visibility into cloud computing, software defined networking and converged storage.

Saturday, September 15, 2012

Configuring AlaxalA switches

The following commands configure an AlaxalA switch, sampling at 1-in-1024, polling counters every 30 seconds and sending sFlow to an analyzer ( over UDP using the default sFlow port (6343):
(config)# sflow destination 6343
(config)# sflow extended-information-type switch
(config)# sflow sample 1024
(config)# sflow polling-interval 30
For each port:
(config)# interface gigabitethernet 0/4
(config-if)# sflow forward ingress
A previous posting discussed the selection of sampling rates. Additional information can be found on the AlaxalA web site.

See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.

Wednesday, September 12, 2012

Snowflakes, IPFIX, NetFlow and sFlow

Snow flakes by Wilson Bentley
Each snowflake is unique and beautiful. However, while such immense diversity is attractive in nature, variation in data center management standards results in operational complexity, making it difficult to implement the automation and control needed to effectively manage at scale.

The following table examines the approaches taken by the IPFIX and sFlow standards by contrasting how they handle four basic aspects of measurement.

Note: The IPFIX standard is based on Cisco's NetFlow™ version 9 protocol and most of the points of comparison apply equally to NetFlow.

Packet IPFIX currently defines over 50 fields relating to packet header (see IP Flow Information Export (IPFIX) Entities):
  • protocolIdentifier
  • ipClassOfService
  • tcpControlBits
  • sourceTransportPort
  • sourceIPv4Address
  • destinationTransportPort
  • destinationIPv4Address
  • sourceIPv6Address
  • destinationIPv6Address
  • flowLabelIPv6
  • icmpTypeCodeIPv4
  • igmpType
  • sourceMacAddress
  • vlanId
  • ipVersion
  • ipv6ExtensionHeaders
  • destinationMacAddress
  • icmpTypeCodeIPv6
  • icmpTypeIPv4
  • icmpCodeIPv4
  • icmpTypeIPv6
  • icmpCodeIPv6
  • udpSourcePort
  • udpDestinationPort
  • tcpSourcePort
  • tcpDestinationPort
  • tcpSequenceNumber
  • tcpAcknowledgementNumber
  • tcpWindowSize
  • tcpUrgentPointer
  • tcpHeaderLength
  • ipHeaderLength
  • totalLengthIPv4
  • payloadLengthIPv6
  • ipTTL
  • nextHeaderIPv6
  • ipDiffServCodePoint
  • ipPrecedence
  • fragmentFlags
  • ipPayloadLength
  • udpMessageLength
  • isMulticast
  • ipv4IHL
  • ipv4Options
  • tcpOptions
  • ipTotalLength
  • ethernetHeaderLength
  • ethernetPayloadLength
  • ethernetTotalLength
  • dot1qVlanId
  • dot1qPriority
  • dot1qCustomerVlanId
  • dot1qCustomerPriority
  • ethernetType
The IPFIX standard does not require vendors to support all the fields, each vendor is free to export any combination of fields that they choose, none of the fields are mandatory. The result is that each vendor and each product produces unique and incompatible data.
The sFlow standard specifies a single way to report packet attributes, the packet header, ensuring that every vendor and product produces compatible results.

Every sFlow compatible device deployed since the sFlow standard was published in 2001 provides visibility into every protocol that has ever, or will ever, run over Ethernet. The packet header includes all the protocol fields exported by IPFIX as well as fields associated with emerging protocols such as FCoE, AoE, TRILL, NVGRE and VxLAN that have yet to by defined in IPFIX.
Time IPFIX has over 30 elements that can be used to represent time (see IP Flow Information Export (IPFIX) Entities):
  • flowEndSysUpTime
  • flowStartSysUpTime
  • flowStartSeconds
  • flowEndSeconds
  • flowStartMilliseconds
  • flowEndMilliseconds
  • flowStartMicroseconds
  • flowEndMicroseconds
  • flowStartNanoseconds
  • flowEndNanoseconds
  • flowStartDeltaMicroseconds
  • flowEndDeltaMicroseconds
  • flowDurationMilliseconds
  • flowDurationMicroseconds
  • observationTimeSeconds
  • observationTimeMilliseconds
  • observationTimeMicroseconds
  • observationTimeNanoseconds
  • monitoringIntervalStartMilliSeconds
  • monitoringIntervalEndMilliSeconds
  • collectionTimeMilliseconds
  • maxExportSeconds
  • maxFlowEndSeconds
  • minExportSeconds
  • minFlowStartSeconds
  • maxFlowEndMicroseconds
  • maxFlowEndMilliseconds
  • maxFlowEndNanoseconds
  • minFlowStartMicroseconds
  • minFlowStartMilliseconds
  • minFlowStartNanoseconds
The IPFIX standard allows vendors to report time using these elements in any combination, or to omit timestamps altogether. In order to report time consistently, every agent must have a real-time clock and be time synchronized. Finally, it is left up the vendors to decide how often to export data and so an IPFIX collector must understand each vendor's implementation in order to be certain that it has received all the data and detect data loss.
The sFlow standard requires that data be sent immediately. The stateless nature of the protocol means that data can be combined and timestamps added by the central sFlow collector without any need for timestamps or time synchronization among the agents.

Note: The sFlow datagrams do contain a time stamp, the agent uptime in milliseconds at the time the datagram was sent.
SamplingIPFIX currently defines eight different algorithms for packet sampling (see IANA Packet Sampling Parameters):
  • Systematic count-based Sampling
  • Systematic time-based Sampling
  • Random n-out-of-N Sampling
  • Uniform probabilistic Sampling
  • Property match Filtering
  • Hash based Filtering using BOB
  • Hash based Filtering using IPSX
  • Hash based Filtering using CRC
Vendors are not required to implement any of these algorithms and are free to invent their own sampling schemes (see NetFlow-lite). In addition, many of the standard algorithms can be shown to be inaccurate.
The sFlow standard mandates a single, statistically valid, sampling algorithm. All sFlow compliant vendors and products, implement the same algorithm and produce accurate, interoperable results.
URLThere is no-standard IPFIX element for exporting a URL. However, IPFIX does allow vendor extensions, resulting in multiple schemes for exporting URL data. Examples include:
  • nProbe URLs are additional fields that can be included as flow keys when configuring the probe.
  • Dell SonicWall URLs are included in an HTTP specific table and link to flow records.
  • Citrix AppFlow URLs are included in an HTTP request table with links to additional HTTP response and ingress/egress connection tables. 
In each case, in addition to the URL element itself being vendor specific, the information model associated with the exported URLs is also unique, reflecting the internal architecture of the exporting device.
The sFlow standard mandates a set of HTTP counters and transaction attributes that ensures consistent reporting from HTTP aware entities such as web servers (Apache, Tomcat, NGINX etc.) and load balancers (F5 etc.), irrespective of vendor or internal architecture.

Each URL is exported as part of the standard transaction record that includes: client IP, server IP, referrer, authuser, user-agent, mime-type, status, request-bytes, response-bytes, response time. In addition, the sFlow standard defines a unified data model that links measurements from network devices, servers and application instances to provide a comprehensive, data center wide, view of performance.

From the examples in the table, it is apparent that IPFIX and sFlow standards take two very different approaches. The IPFIX standard is descriptive, defining a standard set of attributes that vendors can use to describe the information that they choose to export. The result is that vendors use IPFIX to differentiate each product, reporting a unique and inconsistent set of measurements based on its internal architecture and product features. In contrast, the sFlow standard is prescriptive, defining a set of measurements that every vendor must implement. While IPFIX provides a way to describe each "snowflake", the sFlow standard results from vendors working together to identifying common measurements and implement them in an interoperable way.

Henry Ford transformed the auto industry by moving from hand-made, custom parts to standardized components and processes that allowed for mass production. The data center is undergoing a similar transformation, from small, static, custom environments to large scale, commoditized, flexible, cloud architectures. The sFlow standard delivers the universal performance measurements needed for automation, enjoys broad vendor support, and along with other disruptive technologies like 10G Ethernet, merchant silicon, Software Defined Networking (SDN), OpenFlow, networked storage and virtualization is enabling this transformation.

Tuesday, September 11, 2012

Vendor support

Cisco's recent support for the sFlow standard should come as no surprise. The graph trends the rapid growth in vendor support for sFlow over the last decade. Today, in addition to Cisco, virtually every other major vendor ships products with sFlow, including: HP, IBM, Dell, Juniper, Brocade, Arista, Huawei, Hitachi, AlaxalA, NEC, Alcatel-Lucent, Fortinet, D-Link, NETGEAR, Extreme Networks, Allied Telesis, ZTE, ZyXEL and LG-ERICCSON.

Growth would have been even faster but industry consolidation has combined a number of sFlow vendors; 3Com and H3C are now combined with ProCurve in Hewlett-Packard, Blade Network Technologies is now part of IBM and Force10 joins PowerConnect as part of Dell. However, this consolidation of US vendors is more than offset by adoption of the sFlow standard among emerging Asian vendors, including: Huawei, ZTE and Edge-Core Networks. Additionally, the graph doesn't count merchant silicon vendors, including Broadcom, Marvell and Intel, that implement sFlow support in the ASICs used by many of the switch vendors.

The rise in vendor support for sFlow was initially driven adoption of 1G Ethernet and more recent growth has been driven by the accelerating deployment of 10G Ethernet. Looking forward, the growth in number of vendors will slow down - there are very few vendors left that do not support sFlow. However, expect vendors to expand the range of products that support sFlow as new 10G, 40G and 100G Ethernet switches are developed to address increasing demand for bandwidth. Also expect to see increased support for sFlow in wireless networks.

Finally, the sFlow standard provides the end-to-end, multi-vendor visibility needed for effective control of resources in the data center and new technologies like OpenFlow and Software Defined Networking (SDN) are unlocking this potential by allowing networks to automatically adapt to the changing real-time traffic patterns reported by sFlow.

Monday, September 10, 2012

Configuring IBM RackSwitch switches

The following commands configure an IBM RackSwitch, sampling packets at 1-in-1024, polling counters every 30 seconds and sending sFlow to an analyzer ( over UDP using the default sFlow port (6343):
RS(config)# sflow server
RS(config)# sflow port 6343
RS(config)# sflow enable
For each port:
RS(config)# interface port 1
RS(config-if)# sflow polling 30
RS(config-if)# sflow sampling 1024
A previous posting discussed the selection of sampling rates. Additional information can be found on the IBM web site.

See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.

Wednesday, September 5, 2012

GPU performance monitoring

NVIDIA's Compute Unified Device Architecture (CUDA™) dramatically increases computing performance by harnessing the power of the graphics processing unit (GPU). Recently, NVIDIA published the sFlow NVML GPU Structures specification, defining a standard set of metrics for reporting GPU health and performance, and extended the Host sFlow agent to export the GPU metrics.

The following displays the sFlow metrics using sflowtool, the GPU metrics are highlighted:
[pp@test] /usr/local/bin/sflowtool
startDatagram =================================
datagramSize 512
unixSecondsUTC 1346360234
datagramVersion 5
agentSubId 100000
packetSequenceNo 1
sysUpTime 3000
samplesInPacket 1
startSample ----------------------
sampleType_tag 0:2
sampleSequenceNo 1
sourceId 2:1
counterBlock_tag 0:2001
adaptor_0_ifIndex 1
adaptor_0_MACs 1
adaptor_0_MAC_0 000000000000
adaptor_1_ifIndex 2
adaptor_1_MACs 1
adaptor_1_MAC_0 e0cb4e98f891
adaptor_2_ifIndex 3
adaptor_2_MACs 1
adaptor_2_MAC_0 e0cb4e98f890
counterBlock_tag 0:2005
disk_total 145102770176
disk_free 46691696640
disk_partition_max_used 76.06
disk_reads 477615
disk_bytes_read 13102692352
disk_read_time 2227298
disk_writes 2370522
disk_bytes_written 193176428544
disk_write_time 445531146
counterBlock_tag 0:2004
mem_total 12618829824
mem_free 2484174848
mem_shared 0
mem_buffers 971259904
mem_cached 8214761472
swap_total 12580810752
swap_free 12580810752
page_in 6400433
page_out 94324428
swap_in 0
swap_out 0
counterBlock_tag 5703:1
nvml_device_count 1
nvml_processes 0
nvml_gpu_mS 0
nvml_mem_mS 0
nvml_mem_bytes_total 6441598976
nvml_mem_bytes_free 6429614080
nvml_ecc_errors 0
nvml_energy_mJ 74569
nvml_temperature_C 54
nvml_fan_speed_pc 30
counterBlock_tag 0:2003
cpu_load_one 0.040
cpu_load_five 0.240
cpu_load_fifteen 0.350
cpu_proc_run 0
cpu_proc_total 229
cpu_num 8
cpu_speed 1600
cpu_uptime 896187
cpu_user 21731800
cpu_nice 120230
cpu_system 5686620
cpu_idle 2844149774
cpu_wio 2992230
cpuintr 570
cpu_sintr 222180
cpuinterrupts 166594944
cpu_contexts 266986130
counterBlock_tag 0:2006
nio_bytes_in 0
nio_pkts_in 0
nio_errs_in 0
nio_drops_in 0
nio_bytes_out 0
nio_pkts_out 0
nio_errs_out 0
nio_drops_out 0
counterBlock_tag 0:2000
hostname test0
UUID 00000000000000000000000000000000
machine_type 3
os_name 2
endSample   ----------------------
endDatagram   =================================
Note: Currently only the Linux version of Host sFlow includes the GPU support and the agent needs to be compiled from sources on a system that includes the NVML library.

The inclusion of GPU metrics in Host sFlow offers an extremely scaleable, lightweight solution for monitoring compute cluster performance. In addition to exporting a comprehensive set of standard performance metrics, the Host sFlow agent also offers a convenient API for exporting custom application metrics.

The sFlow standard isn't limited to monitoring compute resources; most network switch vendors include sFlow support, providing detailed visibility into cluster communication patterns and network utilization. Combining sFlow from switches, servers and applications delivers a comprehensive view of cluster performance.

Tuesday, September 4, 2012

Configuring ZyXEL switches

The following commands configure a ZyXEL switch, sampling packets at 1-in-1024, polling counters every 30 seconds and sending sFlow to an analyzer  ( over UDP using the default sFlow port (6343):
sysname(config)# sflow
sysname(config)# sflow collector udp-port 6343
sysname(config)# interface port-channel 1-24
sysname(config-interface)# sflow collector poll-interval 30 sampling-rate 1024
sysname(config-interface)# exit
sysname(config)# exit
A previous posting discussed the selection of sampling rates. Additional information can be found on the ZyXEL web site.

See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.