Thursday, November 1, 2012
Finding elephants
The Blind Men and the Elephant
John Godfrey Saxe (1816-1887)
It was six men of Indostan
To learning much inclined,
Who went to see the Elephant
(Though all of them were blind),
That each by observation
Might satisfy his mind.
The First approached the Elephant,
And happening to fall
Against his broad and sturdy side,
At once began to bawl:
"God bless me! but the Elephant
Is very like a wall!"
The Second, feeling of the tusk,
Cried, "Ho, what have we here,
So very round and smooth and sharp?
To me 'tis mighty clear
This wonder of an Elephant
Is very like a spear!"
The Third approached the animal,
And happening to take
The squirming trunk within his hands,
Thus boldly up and spake:
"I see," quoth he, "the Elephant
Is very like a snake!"
The Fourth reached out an eager hand,
And felt about the knee
"What most this wondrous beast is like
Is mighty plain," quoth he:
"'Tis clear enough the Elephant
Is very like a tree!"
The Fifth, who chanced to touch the ear,
Said: "E'en the blindest man
Can tell what this resembles most;
Deny the fact who can,
This marvel of an Elephant
Is very like a fan!"
The Sixth no sooner had begun
About the beast to grope,
Than seizing on the swinging tail
That fell within his scope,
"I see," quoth he, "the Elephant
Is very like a rope!"
And so these men of Indostan
Disputed loud and long,
Each in his own opinion
Exceeding stiff and strong,
Though each was partly in the right,
And all were in the wrong!
MORAL.
So oft in theologic wars,
The disputants, I ween,
Rail on in utter ignorance
Of what each other mean,
And prate about an Elephant
Not one of them has seen!
This poem is just one version of the popular Bind Men and an Elephant story. There are many conclusions that you can draw from the story, but this version captures the idea that you can't properly understand something unless you have seen it in its entirety.
Similar arguments occur among data center operations teams advocating for different performance monitoring technologies such as NetFlow, IPFIX, SNMP, WMI, JMX, libvirt etc. Many of these arguments arise because teams are siloed, with each group arguing from their narrow perspective and dismissing the concerns of other teams. While narrowly focussed tools have their place, they miss the big picture.
Cloud-based architectures tightly couple network, storage and compute resources into a large scale, flexible, platform for delivering application services. The article, System boundary, describes the need for a holistic approach to organizing and managing data center resources. A comprehensive, cloud-oriented, approach to monitoring is essential for troubleshoot performance problems, automating operations and fully exploiting the potential of cloud architectures to increase efficiency by adapting to changing demand.
The sFlow standard addresses the challenge of cloud monitoring by embedding instrumentation throughout the cloud infrastructure in order to provide a comprehensive, real-time, view of the performance of all the individual network, server and application resources as well as the cloud as a whole. The sFlow architecture is designed to see elephants, not just individual body parts. It's only by gaining this comprehensive view of performance that large scale cloud data center environments can be properly understood and managed.
Friday, October 19, 2012
Using Ganglia to monitor GPU performance
The Ganglia charts show GPU health and performance metrics collected using sFlow, see GPU performance monitoring. The combination of Ganglia and sFlow provides a highly scaleable solution for monitoring the performance of large GPU based compute clusters, eliminating the need to poll for GPU metrics. Instead, all the host and GPU metrics are efficiently pushed directly to the central Ganglia collector.
The screen capture shows the new GPU metrics, including:
- Processes
- GPU Utilization
- Memory R/W Utilization
- ECC Errors
- Power
- Temperature
Note: Support for the GPU metrics is currently only available in Ganglia if you compile gmond from the latest development sources.
Tuesday, October 9, 2012
sFlowTrend adds web server monitoring
This chart was generated using the free sFlowTrend application to monitor an Apache web server using the sFlow standard. The chart shows a real-time, minute by minute view of Top URIs by Operations/s for a busy web server. What is interesting about the chart is the sudden drop off in total operations per second over the last few minutes.
The drop in throughput can be verified by examining the standard HTTP performance counters that are exported using sFlow's efficient push mechanism. The Counters chart above shows the same drop in throughput.
There are a couple of possible explanations that come to mind, the first is that the size of pages has increased, possibly because large images were added.
The Top URI extensions by Bytes/s chart shown above makes it clear that the proportion of image data hasn't changed and that the overall data rate has fallen, so the drop in throughput doesn't appear to be a bandwidth problem.
Another possibility is that there has been an increase in server latency. The Top URIs by Duration chart above shows a recent increase in the latency of the http://10.0.0.150/login.php page.
At this point the problem can probably be resolved by talking with the application team to see if they have made any recent changes to the login page. However, there is additional information available that might help further diagnose the problem.
Host sFlow agents installed on the servers provides a scaleable way of monitoring performance. The CPU utilization chart above shows a drop in CPU load on the web server that coincides with the reduced web throughput. It appears that the performance problem isn't related to web server CPU, but is likely the result of requests to a slow backend system.
Note: If it had been a CPU related issue, we might have expected that the latence would have increased for all URIs, not just the login.php page.
Network visibility is a critical component of application performance monitoring. In this case, network traffic data can help by identifying the backend systems that the web server is depends on. Fortunately, most switch vendors support the sFlow standard and the traffic data is readily accessible in sFlowTrend.
The Top servers chart above shows the top services and servers by Frames/s. The drop in traffic to the web server, 10.0.0.150 is readily apparent, as is a drop in traffic to the Memcached server, 10.0.0.151 (TCP:11211). The Memcached server is used to cache the results of database queries in order to improve site performance and scaleability, but the performance problem doesn't seem to be directly related to the Memcached performance since the amount of Memcache traffic has dropped proportionally with the HTTP traffic (if there had been an increase in Memcache traffic, this might have indicated that the Memcached server was overloaded).
A final piece of information available through sFlow is the link utilization trend which confirms that there is the drop in performance isn't due to a lack of network capacity.
At this point we have a pretty thorough understanding of the impact of the problem on application, server and network resources. Talking to the developers reveals a recent update to the login.php script that introduced a software bug that failed to properly cache information. The resulting increase in load to the database was causing the login page to load slowly and resulted in the drop in site throughput. Fixing the bug returned site performance to normal levels.
Note: This example is a recreation of a typical performance problem using real servers and switches generating sFlow data. However, the load is artificially generated using Apache JMeter since actual production data can't be shown.
Trying out sFlow monitoring on your own site is easy. The sFlowTrend application is a free download. There are open source sFlow modules available for popular web servers, including: Apache, NGINX, Tomcat and node.js. The open source Host sFlow agent runs on most operating systems and enabling sFlow on switches is straightforward (see sFlow.org for a list of switches supporting the sFlow standard). The article, Choosing an sFlow analyzer, provides additional information for large scale deployments.
The drop in throughput can be verified by examining the standard HTTP performance counters that are exported using sFlow's efficient push mechanism. The Counters chart above shows the same drop in throughput.
There are a couple of possible explanations that come to mind, the first is that the size of pages has increased, possibly because large images were added.
The Top URI extensions by Bytes/s chart shown above makes it clear that the proportion of image data hasn't changed and that the overall data rate has fallen, so the drop in throughput doesn't appear to be a bandwidth problem.
Another possibility is that there has been an increase in server latency. The Top URIs by Duration chart above shows a recent increase in the latency of the http://10.0.0.150/login.php page.
At this point the problem can probably be resolved by talking with the application team to see if they have made any recent changes to the login page. However, there is additional information available that might help further diagnose the problem.
Host sFlow agents installed on the servers provides a scaleable way of monitoring performance. The CPU utilization chart above shows a drop in CPU load on the web server that coincides with the reduced web throughput. It appears that the performance problem isn't related to web server CPU, but is likely the result of requests to a slow backend system.
Note: If it had been a CPU related issue, we might have expected that the latence would have increased for all URIs, not just the login.php page.
Network visibility is a critical component of application performance monitoring. In this case, network traffic data can help by identifying the backend systems that the web server is depends on. Fortunately, most switch vendors support the sFlow standard and the traffic data is readily accessible in sFlowTrend.
The Top servers chart above shows the top services and servers by Frames/s. The drop in traffic to the web server, 10.0.0.150 is readily apparent, as is a drop in traffic to the Memcached server, 10.0.0.151 (TCP:11211). The Memcached server is used to cache the results of database queries in order to improve site performance and scaleability, but the performance problem doesn't seem to be directly related to the Memcached performance since the amount of Memcache traffic has dropped proportionally with the HTTP traffic (if there had been an increase in Memcache traffic, this might have indicated that the Memcached server was overloaded).
A final piece of information available through sFlow is the link utilization trend which confirms that there is the drop in performance isn't due to a lack of network capacity.
At this point we have a pretty thorough understanding of the impact of the problem on application, server and network resources. Talking to the developers reveals a recent update to the login.php script that introduced a software bug that failed to properly cache information. The resulting increase in load to the database was causing the login page to load slowly and resulted in the drop in site throughput. Fixing the bug returned site performance to normal levels.
Note: This example is a recreation of a typical performance problem using real servers and switches generating sFlow data. However, the load is artificially generated using Apache JMeter since actual production data can't be shown.
Trying out sFlow monitoring on your own site is easy. The sFlowTrend application is a free download. There are open source sFlow modules available for popular web servers, including: Apache, NGINX, Tomcat and node.js. The open source Host sFlow agent runs on most operating systems and enabling sFlow on switches is straightforward (see sFlow.org for a list of switches supporting the sFlow standard). The article, Choosing an sFlow analyzer, provides additional information for large scale deployments.
Saturday, October 6, 2012
Thread pools
Figure 1: Thread pool |
The recently finalized sFlow Application Structures specification defines a standard set of metrics for reporting on thread pools:
- Active Threads The number of threads in the thread pool that are actively processing a request.
- Idle Threads The number of threads in the thread pool that are waiting for a request.
- Maximum Threads The maximum number of threads that can exist in the thread pool.
- Delayed Tasks The number of tasks that could not be served immediately, but spent time in the task queue.
- Dropped Tasks The number of tasks that were dropped because the task queue was full.
The grid of characters is used to visualize the the state of the pool (referred to as the "scoreboard"), each cell in the grid represents a slot for a thread and the size of the grid shows the maximum number of threads that are permitted in the pool. The summary line above the grid states that 6 requests are currently being processed and that there are 69 idle workers (i.e. there are six "W" characters and sixty nine "_" characters in the grid).
While the server-status page isn't designed to be machine readable, the information is critical and there are numerous performance monitoring tools that make HTTP requests and extract the worker pool statistics from the text. A much more efficient way to retrieve the information is to use the Apache sFlow module, which in addition to reporting the thread pool statistics will export HTTP counters, URLs, response times, status codes, etc.
The article, Using Ganglia to monitor web farms, describes how to use the open source Ganglia performance monitoring software to collect and report on web server clusters using sFlow. Ganglia now includes support for the sFlow thread pool metrics.
Figure 2: Ganglia chart showing active threads from an Apache web server |
Monitoring thread pools using sFlow is very useful, but only scratches the surface of what is possible. The sFlow standard is widely support be network equipment vendors and can be combined with sFlow metrics from hosts, services and applications to provide a comprehensive view of data center performance.
Monday, October 1, 2012
Link aggregation
Figure 1: Link Aggregation Groups |
Note: There is much confusion caused by the many different names that can be used to describe link aggregation, including Port Grouping, Port Trunking, Link Bundling, NIC/Link Bonding, NIC/Link Teaming etc. These are all examples of link aggregation and the discussion in this paper is applicable.
Figure 1 shows a number of common uses for link aggregation. Switches A, B, C and D are interconnected by LAGs, each of which is made up of four individual links. In this case the LAGs are used to provide greater bandwidth between switches at the network core.
A LAG generally doesn't provide the same performance characteristics as a single link with equivalent capacity. In this example, suppose that the LAGs are 4 x 10 Gigabit Ethernet. The LAG needs to ensure in-order delivery of packets since many network protocols perform badly when packets arrive out of order (e.g. TCP). Packet header fields are examined and used to assign all packets that are part of a connection to the same link within the aggregation group. The result is that the maximum bandwidth available to any single connection is 10 Gigabits per second, not 40 Gigabits per second. The LAG can carry 40 Gigabits per second, but the traffic must be a mixture of connections.
The alternative of a single 40G Ethernet link allows a single connection to use the full bandwidth of the link and transfer data at 40 Gigabits per second. However, the LAG is potentially more resilient, since a link failure will simply reduce the LAG capacity by 25% and the two switches will still have connectivity. On the other hand the LAG involves four times as many links and so there is an increased likelihood of link failures.
Servers are often connected to two separate switches to ensure that if one switch fails, the server has backup connectivity through the second switch. In this example, servers A and B are connected to switches C and D. A limitation of this approach is that the backup link is idle and the bandwidth isn't available to the server.
A Multi-chassis Link Aggregation Group (MLAG) allows the server to actively use both links, treating them as a single, high capacity LAG. The "multi-chassis" part of the name refers to what happens at the other end of the link. The two switches C and D communicate with each other in order to handle the two links as if they were arriving at a single switch as part of a conventional LAG, ensuring in-order delivery of packets etc.
There is no standard for logically combining the switches to support MLAGs - each vendor has their own approach (e.g. Hewlett-Packard Intelligent Redundant Framework (IRF), Cisco Virtual Switching System (VSS), Cisco Virtual PortChannel (vPC), Arista MLAG domains, Dell/Force10 VirtualScale (VS) etc.). However, as far as the servers are concerned the network adapters are combined (or bonded) to form a simple LAG that provides the benefit of increased bandwidth and redundancy. However, a potential drawback of actively using both adapters is an increased vulnerability to failures, since bandwidth will drop by 50% during a failure, potentially triggering congestion related service problems.
MLAGs aren't restricted to the server access layer. Looking at Figure 1, if switches A and B share control information and switches C and D share control information, it is possible to aggregate links into two groups of 8, or even a single group of 16. One of the benefits of aggregating core links is that the topology can become logically "loop free", ensuring fast convergence in the event of a link failure and relegating spanning tree to provide protection against configuration errors.
Based on the discussion, it should be clear that managing the performance of LAGs requires visibility into network traffic patterns and paths through the LAGs and member links, visibility into link utilizations and the balance between group members, and visibility into the health of each link.
The LAG extension to the sFlow standard builds the detailed visibility that sFlow already provides into switched network traffic to provide additional detail about LAG topology and health. The IEEE 802.3 LAG MIB defines the set of objects describing elements of the LAG and counters than can be used to monitor LAG health. The sFlow LAG extension simply maps values defined in the MIB into an sFlow counter structure that is exported using sFlow's scaleable "push" mechanism, allowing large scale monitoring of LAG based network architectures.
The new measurements are best understood by examining a single aggregation group.
Figure 2: Detail of a Link Aggregation Group |
LACP associates a System ID with each switch. The system ID is simply a vendor assigned MAC address that is unique to each switch. In this example, Switch A has the System ID 000000000010 and Switch B has the ID 000000000012.
Each switch assigns an Aggregation ID, or logical port number, to the group of physical ports. Switch A identifies the LAG as port 501 and Switch C identifies the LAG as port 512.
The following sflowtool output shows what an interface counter sample exported by Switch A reporting on physical port 2, would look like:
startSample ----------------------
sampleType_tag 0:2
sampleType COUNTERSSAMPLE
sampleSequenceNo 110521
sourceId 0:2
counterBlock_tag 0:1
ifIndex 2
networkType 6
ifSpeed 100000000
ifDirection 1
ifStatus 3
ifInOctets 35293750622
ifInUcastPkts 241166136
ifInMulticastPkts 831459
ifInBroadcastPkts 11589475
ifInDiscards 0
ifInErrors 0
ifInUnknownProtos 0
ifOutOctets 184200359626
ifOutUcastPkts 375811771
ifOutMulticastPkts 1991731
ifOutBroadcastPkts 5001804
ifOutDiscards 63606
ifOutErrors 0
ifPromiscuousMode 1
counterBlock_tag 0:2
dot3StatsAlignmentErrors 1
dot3StatsFCSErrors 0
dot3StatsSingleCollisionFrames 0
dot3StatsMultipleCollisionFrames 0
dot3StatsSQETestErrors 0
dot3StatsDeferredTransmissions 0
dot3StatsLateCollisions 0
dot3StatsExcessiveCollisions 0
dot3StatsInternalMacTransmitErrors 0
dot3StatsCarrierSenseErrors 0
dot3StatsFrameTooLongs 0
dot3StatsInternalMacReceiveErrors 0
dot3StatsSymbolErrors 0
counterBlock_tag 0:7
actorSystemID 000000000010
partnerSystemID 000000000012
attachedAggID 501
actorAdminPortState 5
actorOperPortState 61
partnerAdminPortState 5
partnerOperPortState 61
LACPDUsRx 11
markerPDUsRx 0
markerResponsePDUsRx 0
unknownRx 0
illegalRx 0
LACPDUsTx 19
markerPDUsTx 0
markerResponsePDUsTx 0
endSample ----------------------
The LAG MIB should be consulted for detailed descriptions of the fields, for example, refer to the following LacpState definition from the MIB to understand the operational port state values:LacpState ::= TEXTUAL-CONVENTION STATUS current DESCRIPTION “The Actor and Partner State values from the LACPDU.” SYNTAX BITS { lacpActivity(0), lacpTimeout(1), aggregation(2), synchronization(3), collecting(4), distributing(5), defaulted(6), expired(7) }In the sflowtool output the actor (local) and partner (remote) operational state associated with the LAG member is 61, which is 111101 in binary. This value indicates that the lacpActivity(0), aggregation(2), synchronization(3), collecting(4) and distributing(5) bits are set - i.e. the link is healthy.
While this article discussed the low level details of LAG monitoring, performance management tools should automate this analysis and allow the health and performance of all the LAGs to be tracked. In addition, sFlow integrates LAG monitoring with measurements of traffic flows, server activity and application response times to provide comprehensive visibility into data center performance. The Data center convergence, visibility and control presentation describes the critical role that measurement plays in managing costs and optimizing performance.
Today, almost every switch vendor offers products that implement the sFlow standard. If you make use of link aggregation, ask your switch vendor add support for the LAG extension. Implementing the sFlow LAG extension is straightforward if they already support IEEE LAG MIB.
Saturday, September 22, 2012
Switch configurations
A large number of articles describing the steps to configure sFlow traffic monitoring have been published on this blog over the last few years. This article compiles a set of links in order to make the configurations easier to find.
Note: Trying out sFlow is easy, just follow the instructions to configure sFlow export and install the free sFlowTrend analyzer to gain real-time visibility - providing immediate answers to the Who, What, Where, When, Why and How questions that are the key to effective management.
Note: Trying out sFlow is easy, just follow the instructions to configure sFlow export and install the free sFlowTrend analyzer to gain real-time visibility - providing immediate answers to the Who, What, Where, When, Why and How questions that are the key to effective management.
- AlaxalA
- Alcatel-Lucent
- Allied Telesis
- Arista
- Brocade
- Cisco 8000 Series Routers, ASR 9000 Series Routers, NCS 5500 Series Routers, Nexus, SF250, SG250, SF350, SG350, SG350XG, and SG550XG
- Cumulus Networks
- Dell Force10, PowerConnect
- D-Link
- Edge-Core
- Enterasys
- Extreme
- F5
- Fortinet
- Hewlett-Packard H3C, ProCurve
- Hitachi
- Huawei
- IBM RackSwitch
- Juniper
- LG-ERICSSON
- Mellanox
- Mininet
- NEC
- NETGEAR
- Nokia Service Router Linux
- OpenSwitch
- Pica8 Pronto
- Quanta
- SONiC
- Vyatta
- VyOS
- ZTE
- ZyXEL
Wednesday, September 19, 2012
Packets and Flows
Figure 1: Sending a picture over a packet switched network |
In the example, Host A is responsible for breaking up the picture into parts and transmitting the packets. Host B is responsible for re-constructing the picture, detecting parts that are missing, corrupted, or delivered out of order and sending acknowledgement packets back to Host A, which is then responsible for resending packets if necessary.
The packet switches are responsible for transmitting, or forwarding, the packets. Each packet switch examines the destination address (e.g. To: B) and sends the packet on a link that will take the packet closer to its destination. The switches are unaware of the contents of the packets they forward, in this case the picture fragments and part numbers. Figure 1 only shows packets relating to the image transfer between Host A and Host B, but in reality the switches will be simultaneously forwarding packets from many other hosts.
Figure 2: Sorting mail |
In a packet switched network, each host and switch has a different perspective on data transfers and maintains different state in order to perform its task. Managing the performance of the communication system requires a correct understanding of the nature of the task that each element is responsible for and a way to monitor how effectively that task is being performed.
As an example of a poorly fitting model, consider the concept of "flow records" that are often presented as an intuitive way to monitor and understand traffic on packet switched networks. Continuing our example, the data transfer would be represented by two flow records, one accounting for packets from Host A to Host B and another accounting for packets from Host B to Host A.
Figure 3: Telephone operator |
Viewing packet switches through the lens of a circuit oriented measurement is misleading. Start by considering the steps that the mail sorter in Figure 2 would have to go through in order to create flow records. The mail sorter would be requiring to keep track of the From: and To: address information on each letter, count the number letters that Host A sent to Host B, open the letters and peek inside to decide whether the letter was part of an existing conversation or the start of a new conversation. This task is extremely cumbersome and error prone, and the resulting flow records don't monitor the task that the mail sorter is actually performing; for example, flow records won't tell you how many postcards, letters and parcels were sorted.
Packet and circuit switched networks have very different characteristics and an effective monitoring system will collect measurements that are relevant to the performance of the network:
- Circuit switches have a limited number of connections that they can handle and if more calls are attempted, calls are blocked (i.e. receive a busy signal). Blocking probabilities and sizing of circuit switches are analyzed using Erlang calculations.
- Packet switches don't block. Instead packets are interleaved as they are forwarded. If the number of packets arriving exceeds the forwarding capacity of the switch, then packets may be delayed as they wait to be serviced, or be discarded if there are too many packets already waiting in the queue. Queuing delays and packet discard probabilities are analyzed using queuing theory.
The following table compares the switch, host and application measurements provided by sFlow and NetFlow:
sFlow | NetFlow | |
---|---|---|
Switch | Each switch exports packet oriented measurements, exporting interface counters and randomly sampled packet headers and associated forwarding decisions. | Switches exports connection oriented flow records that include source address, destination address, protocol, bytes, packets and duration. Note: Many switches aren't capable of making these measurements and so go unmonitored. |
Host | The server exports standard host metrics, including: CPU, memory and disk performance. | None. NetFlow is generally only implemented in network devices. |
Application | The web server exports standard HTTP metrics that include request counts and randomly sampled web requests, providing detailed information such as the URL, referrer, server address, client address, user, browser, request bytes, response bytes, status and duration. The web server also reports maximum, idle and active workers. | None. NetFlow is typically only implemented in network devices. |
NetFlow takes a network centric view of measurement and tries to infer application behavior by examining packets in the network. NetFlow imposes a stateful, connection oriented, model on core devices that should be stateless. Unfortunately, the resulting flow measurements aren't a natural fit for packet switches, providing a distorted view of the operation of these devices. For example, the switch makes forwarding decisions on a packet by packet basis and these decisions can change over the lifetime of a flow. The packet oriented measurements made by sFlow accurately capture forwarding decisions, but flow oriented measurement can be misleading. Another example building on the mail sorting analogy in Figure 2; packet oriented measurements support analysis of small, large and jumbo frames (postcards, letters and parcels), but this detail is lost in flow records.
Flows are an abstraction that is useful for understanding end-to-end traffic traversing the packet switches. However, the flow abstraction describes connections created by the communication end points and to properly measure connection performance, one needs to instrument those end points. Hosts are responsible for initiating and terminating flows and are a natural place to report flows, but the traditional flow model ignores important detail that the host can provide. For example, the host is in a position to include important details about services such as user names, URLs, response times, and status codes as well as information about the computational resources needed to deliver the services; information that is essential for managing service capacity, utilization and response times.
While NetFlow is network centric and tries to infer information about applications from network packets (which is becoming increasingly difficult as more traffic is encrypted), the sFlow standard takes a systems approach, exposing information from network, servers and applications in order to provide a comprehensive view of performance.
Measurement isn't simply about producing pretty charts. The ultimate goal is to be able to act on the measurements and control performance. Control requires a model of behavior that allow performance to be predicted and measurements that characterize demand and show how close the system performance matches the predictions. The sFlow standard is well suited to automation, providing comprehensive measurements based on models of network, server and application performance. The Data center convergence, visibility and control presentation describes the critical role that measurement plays in managing costs and optimizing performance.
Tuesday, September 18, 2012
Configuring Hitachi switches
The following commands configure a Hitachi switch, sampling at 1-in-1024, polling counters every 30 seconds and sending sFlow to an analyzers (10.0.0.50) over UDP using the default sFlow port (6343):
See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.
(config)# sflow yes [sflow] (config)# destination 10.0.0.50 udp 6343 [sflow] (config)# extended-information-type switch [sflow] (config)# polling-interval 30 [sflow] (config)# port 1/1-24 [sflow port 1/1-24] (config)# sample 1024 [sflow port 1/1-24] (config)# exit [sflow] (config)# exitA previous posting discussed the selection of sampling rates. Additional information can be found on the Hitachi web site.
See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.
Sunday, September 16, 2012
Host sFlow progress
The open source Host sFlow project has made significant progress over the last two years. The rapidly increasing number of downloads reflects the projects maturity and the unique functionality delivered by the sFlow standard in unifying network, server and application performance monitoring of large scale cloud environments.
Starting with Linux, the Host sFlow project has added support for FreeBSD, Solaris and Windows. Installed on hypervisors, Host sFlow reports on the performance of the hypervisor and all the virtual machines. Currently supported hypervisors include: Microsoft Hyper-V, Citrix XenServer, Xen Cloud Platform (XCP), KVM and libvirt. In addition, the Host sFlow agent is extensible and a growing number of projects implement sFlow instrumentation in virtual switches (Open vSwitch, Hyper-V extensible virtual switch), popular web servers (Apache, NGINX), Memcached, application servers (Java, Tomcat) and even in-house applications written in scripting languages (PHP, Python, Ruby, Perl).
Popular open source projects such as Ganglia and Graphite offer scalable collection and reporting of sFlow metrics from large scale compute, virtual machine pools, web farms, Java application servers and Memcache clusters.
Deploying Host sFlow agents on servers extends the sFlow monitoring built into networking devices from leading vendors, including: IBM, HP, Dell, Cisco, Juniper, Brocade, F5, Alcatel-Lucent, Arista Networks, Allied Telesis, Extreme Networks, Fortinet, Hitachi, Huawei, NEC, ZTE and others. The combination of sFlow in servers and switches delivers integrated, end-to-end visibility into cloud computing, software defined networking and converged storage.
Saturday, September 15, 2012
Configuring AlaxalA switches
The following commands configure an AlaxalA switch, sampling at 1-in-1024, polling counters every 30 seconds and sending sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343):
See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.
(config)# sflow destination 10.0.0.50 6343 (config)# sflow extended-information-type switch (config)# sflow sample 1024 (config)# sflow polling-interval 30For each port:
(config)# interface gigabitethernet 0/4 (config-if)# sflow forward ingressA previous posting discussed the selection of sampling rates. Additional information can be found on the AlaxalA web site.
See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.
Wednesday, September 12, 2012
Snowflakes, IPFIX, NetFlow and sFlow
Snow flakes by Wilson Bentley |
The following table examines the approaches taken by the IPFIX and sFlow standards by contrasting how they handle four basic aspects of measurement.
Note: The IPFIX standard is based on Cisco's NetFlow™ version 9 protocol and most of the points of comparison apply equally to NetFlow.
IPFIX | sFlow | |
---|---|---|
Packet | IPFIX currently defines over 50 fields relating to packet header (see IP Flow Information Export (IPFIX) Entities):
|
The sFlow standard specifies a single way to report packet attributes, the packet header, ensuring that every vendor and product produces compatible results. Every sFlow compatible device deployed since the sFlow standard was published in 2001 provides visibility into every protocol that has ever, or will ever, run over Ethernet. The packet header includes all the protocol fields exported by IPFIX as well as fields associated with emerging protocols such as FCoE, AoE, TRILL, NVGRE and VxLAN that have yet to by defined in IPFIX. |
Time | IPFIX has over 30 elements that can be used to represent time (see IP Flow Information Export (IPFIX) Entities):
|
The sFlow standard requires that data be sent immediately. The stateless nature of the protocol means that data can be combined and timestamps added by the central sFlow collector without any need for timestamps or time synchronization among the agents. Note: The sFlow datagrams do contain a time stamp, the agent uptime in milliseconds at the time the datagram was sent. |
Sampling | IPFIX currently defines eight different algorithms for packet sampling (see IANA Packet Sampling Parameters):
| The sFlow standard mandates a single, statistically valid, sampling algorithm. All sFlow compliant vendors and products, implement the same algorithm and produce accurate, interoperable results. |
URL | There is no-standard IPFIX element for exporting a URL. However, IPFIX does allow vendor extensions, resulting in multiple schemes for exporting URL data. Examples include:
| The sFlow standard mandates a set of HTTP counters and transaction attributes that ensures consistent reporting from HTTP aware entities such as web servers (Apache, Tomcat, NGINX etc.) and load balancers (F5 etc.), irrespective of vendor or internal architecture. Each URL is exported as part of the standard transaction record that includes: client IP, server IP, referrer, authuser, user-agent, mime-type, status, request-bytes, response-bytes, response time. In addition, the sFlow standard defines a unified data model that links measurements from network devices, servers and application instances to provide a comprehensive, data center wide, view of performance. |
From the examples in the table, it is apparent that IPFIX and sFlow standards take two very different approaches. The IPFIX standard is descriptive, defining a standard set of attributes that vendors can use to describe the information that they choose to export. The result is that vendors use IPFIX to differentiate each product, reporting a unique and inconsistent set of measurements based on its internal architecture and product features. In contrast, the sFlow standard is prescriptive, defining a set of measurements that every vendor must implement. While IPFIX provides a way to describe each "snowflake", the sFlow standard results from vendors working together to identifying common measurements and implement them in an interoperable way.
Henry Ford transformed the auto industry by moving from hand-made, custom parts to standardized components and processes that allowed for mass production. The data center is undergoing a similar transformation, from small, static, custom environments to large scale, commoditized, flexible, cloud architectures. The sFlow standard delivers the universal performance measurements needed for automation, enjoys broad vendor support, and along with other disruptive technologies like 10G Ethernet, merchant silicon, Software Defined Networking (SDN), OpenFlow, networked storage and virtualization is enabling this transformation.
Tuesday, September 11, 2012
Vendor support
Cisco's recent support for the sFlow standard should come as no surprise. The graph trends the rapid growth in vendor support for sFlow over the last decade. Today, in addition to Cisco, virtually every other major vendor ships products with sFlow, including: HP, IBM, Dell, Juniper, Brocade, Arista, Huawei, Hitachi, AlaxalA, NEC, Alcatel-Lucent, Fortinet, D-Link, NETGEAR, Extreme Networks, Allied Telesis, ZTE, ZyXEL and LG-ERICCSON.
Growth would have been even faster but industry consolidation has combined a number of sFlow vendors; 3Com and H3C are now combined with ProCurve in Hewlett-Packard, Blade Network Technologies is now part of IBM and Force10 joins PowerConnect as part of Dell. However, this consolidation of US vendors is more than offset by adoption of the sFlow standard among emerging Asian vendors, including: Huawei, ZTE and Edge-Core Networks. Additionally, the graph doesn't count merchant silicon vendors, including Broadcom, Marvell and Intel, that implement sFlow support in the ASICs used by many of the switch vendors.
The rise in vendor support for sFlow was initially driven adoption of 1G Ethernet and more recent growth has been driven by the accelerating deployment of 10G Ethernet. Looking forward, the growth in number of vendors will slow down - there are very few vendors left that do not support sFlow. However, expect vendors to expand the range of products that support sFlow as new 10G, 40G and 100G Ethernet switches are developed to address increasing demand for bandwidth. Also expect to see increased support for sFlow in wireless networks.
Finally, the sFlow standard provides the end-to-end, multi-vendor visibility needed for effective control of resources in the data center and new technologies like OpenFlow and Software Defined Networking (SDN) are unlocking this potential by allowing networks to automatically adapt to the changing real-time traffic patterns reported by sFlow.
Monday, September 10, 2012
Configuring IBM RackSwitch switches
The following commands configure an IBM RackSwitch, sampling packets at 1-in-1024, polling counters every 30 seconds and sending sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343):
See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.
RS(config)# sflow server 10.0.0.50 RS(config)# sflow port 6343 RS(config)# sflow enableFor each port:
RS(config)# interface port 1 RS(config-if)# sflow polling 30 RS(config-if)# sflow sampling 1024A previous posting discussed the selection of sampling rates. Additional information can be found on the IBM web site.
See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.
Wednesday, September 5, 2012
GPU performance monitoring
NVIDIA's Compute Unified Device Architecture (CUDA™) dramatically increases computing performance by harnessing the power of the graphics processing unit (GPU). Recently, NVIDIA published the sFlow NVML GPU Structures specification, defining a standard set of metrics for reporting GPU health and performance, and extended the Host sFlow agent to export the GPU metrics.
The following displays the sFlow metrics using sflowtool, the GPU metrics are highlighted:
[pp@test] /usr/local/bin/sflowtool
startDatagram =================================
datagramSourceIP 10.0.0.150
datagramSize 512
unixSecondsUTC 1346360234
datagramVersion 5
agentSubId 100000
agent 10.0.0.150
packetSequenceNo 1
sysUpTime 3000
samplesInPacket 1
startSample ----------------------
sampleType_tag 0:2
sampleType COUNTERSSAMPLE
sampleSequenceNo 1
sourceId 2:1
counterBlock_tag 0:2001
adaptor_0_ifIndex 1
adaptor_0_MACs 1
adaptor_0_MAC_0 000000000000
adaptor_1_ifIndex 2
adaptor_1_MACs 1
adaptor_1_MAC_0 e0cb4e98f891
adaptor_2_ifIndex 3
adaptor_2_MACs 1
adaptor_2_MAC_0 e0cb4e98f890
counterBlock_tag 0:2005
disk_total 145102770176
disk_free 46691696640
disk_partition_max_used 76.06
disk_reads 477615
disk_bytes_read 13102692352
disk_read_time 2227298
disk_writes 2370522
disk_bytes_written 193176428544
disk_write_time 445531146
counterBlock_tag 0:2004
mem_total 12618829824
mem_free 2484174848
mem_shared 0
mem_buffers 971259904
mem_cached 8214761472
swap_total 12580810752
swap_free 12580810752
page_in 6400433
page_out 94324428
swap_in 0
swap_out 0
counterBlock_tag 5703:1
nvml_device_count 1
nvml_processes 0
nvml_gpu_mS 0
nvml_mem_mS 0
nvml_mem_bytes_total 6441598976
nvml_mem_bytes_free 6429614080
nvml_ecc_errors 0
nvml_energy_mJ 74569
nvml_temperature_C 54
nvml_fan_speed_pc 30
counterBlock_tag 0:2003
cpu_load_one 0.040
cpu_load_five 0.240
cpu_load_fifteen 0.350
cpu_proc_run 0
cpu_proc_total 229
cpu_num 8
cpu_speed 1600
cpu_uptime 896187
cpu_user 21731800
cpu_nice 120230
cpu_system 5686620
cpu_idle 2844149774
cpu_wio 2992230
cpuintr 570
cpu_sintr 222180
cpuinterrupts 166594944
cpu_contexts 266986130
counterBlock_tag 0:2006
nio_bytes_in 0
nio_pkts_in 0
nio_errs_in 0
nio_drops_in 0
nio_bytes_out 0
nio_pkts_out 0
nio_errs_out 0
nio_drops_out 0
counterBlock_tag 0:2000
hostname test0
UUID 00000000000000000000000000000000
machine_type 3
os_name 2
os_release 2.6.35.14-106.fc14.x86_64
endSample ----------------------
endDatagram =================================
Note: Currently only the Linux version of Host sFlow includes the GPU support and the agent needs to be compiled from sources on a system that includes the NVML library.
The inclusion of GPU metrics in Host sFlow offers an extremely scaleable, lightweight solution for monitoring compute cluster performance. In addition to exporting a comprehensive set of standard performance metrics, the Host sFlow agent also offers a convenient API for exporting custom application metrics.
The sFlow standard isn't limited to monitoring compute resources; most network switch vendors include sFlow support, providing detailed visibility into cluster communication patterns and network utilization. Combining sFlow from switches, servers and applications delivers a comprehensive view of cluster performance.
Tuesday, September 4, 2012
Configuring ZyXEL switches
The following commands configure a ZyXEL switch, sampling packets at 1-in-1024, polling counters every 30 seconds and sending sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343):
See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.
sysname(config)# sflow sysname(config)# sflow collector 10.0.0.50 udp-port 6343 sysname(config)# interface port-channel 1-24 sysname(config-interface)# sflow collector 10.0.0.50 poll-interval 30 sampling-rate 1024 sysname(config-interface)# exit sysname(config)# exitA previous posting discussed the selection of sampling rates. Additional information can be found on the ZyXEL web site.
See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.
Friday, August 31, 2012
Cisco adds sFlow support
Cisco Nexus 3000 series switches |
Example: sFlowTrend Top connections chart |
Since the Nexus 3000 series switches are the first Cisco products with sFlow, the rest of this article is addressed to Cisco network administrators who are likely to be unfamiliar with sFlow technology. As a Cisco network administrator, you are likely to have experience with using Cisco's Switched Port Analyzer (SPAN) technology to selectively monitor traffic in Cisco edge switches and with Cisco's Netflow technology for monitoring TCP/IP traffic in Cisco routers.
By adding sFlow support to the Nexus 3000 series, Cisco eliminates the need for probes, providing wire-speed 10 Gigabit monitoring of all switch ports - the functional equivalent of forty-eight 10 Gigabit probes and four 40 Gigabit probes in a Nexus 3064 - embedded in the switch hardware at no extra cost. If you are familiar with RMON probes, sFlow is functionally equivalent to deploying an RMON probe for each switch port.
Based on the name, you might think that sFlow is just another version of Cisco Netflow. However, this is not the case - sFlow differs significantly from NetFlow and understanding these differences is important if you want to get the most out of sFlow:
- sFlow exports interface counters, eliminating the need for SNMP polling - extremely useful when you have tens of thousands of edge switch ports to monitor.
- sFlow exports packet headers not flow records. By exporting packet headers, sFlow is able to provide full layer 2 - 7 visibility into all types of traffic flowing at the network edge, including: MAC addresses, VLANs, TRILL, tunnels (GRE, VXLAN etc.), Ethernet SAN traffic (FCoE and AoE), IPv6 in addition to the TCP/IP information typically reported by NetFlow. You can even use sFlow with Wireshark for remote packet capture.
- sFlow is highly scalable. Unlike NetFlow, which is typically enabled on selected links at the core, sFlow is enabled on every port, on every switch, for full end-to-end network visibility. The sFlow measurements are implemented in silicon and won't impact switch CPU. The scalability of sFlow allows tens of thousands of 10G switch ports in the top of rack switches, as well as their 40 Gigabit uplink ports, to be centrally monitored. In addition, sFlow is available in 100 Gigabit switches, ensuring visibility as higher speed interconnects are deployed to support the growing 10 Gigabit edge.
- sFlow is easy to configure and manage. Eliminating complexity is essential for large scale web 2.0, big data, virtualization and cloud deployments.
- sFlow is a multi-vendor standard supported by almost every network equipment vendor. You can mix and match Cisco Nexus 3000 series switches with best in class solutions from other vendors and still maintain comprehensive, interoperable, data center wide visibility.
- sFlow is not just for switches. The sFlow standard also provides visibility into server, storage, virtual machine and application performance, helping to break down management silos by providing a consistent view of performance to operations and development teams (see DevOps).
- sFlow functionality is determined by the choice of sFlow analyzer. With Flexible NetFlow, much of the analysis is performed on the network device, limiting the functionality of NetFlow collectors to simply recording the data and generating reports. As a result, NetFlow collectors end up being fairly generic in functionality. In contrast, sFlow shifts analysis from the switches to a central sFlow analyzer which determines how to process the data and present the results, see Choosing an sFlow analyzer. The result is a greater diversity of solutions and there is likely to be an sFlow analyzer that is particularly well adapted to your requirements. While many NetFlow collectors claim sFlow support, their support tends to be limited, ignoring sFlow specific features and treating sFlow as if it were basic NetFlow version 5.
Trying out sFlow is easy, just upgrade to the latest NX-OS release, configure sFlow export, and install the free sFlowTrend analyzer to gain real-time visibility - providing immediate answers to the Who, What, Where, When, Why and How questions that are the key to effective management.
Configuring Cisco switches
The following commands configure a Cisco switch (10.0.0.250), sampling packets at 1-in-5000, polling counters every 20 seconds and sending sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343):
A previous posting discussed the selection of sampling rates. Additional information can be found on the Cisco web site.
See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.
switch# configure terminal switch(config)# feature sflow switch(config)# sflow agent-ip 10.0.0.250 switch(config)# sflow sampling-rate 5000 switch(config)# sflow counter-poll-interval 20 switch(config)# sflow collector-ip 10.0.0.50 vrf default switch(config)# sflow data-source interface ethernet 1/1 ... switch(config)# sflow data-source interface ethernet 1/24 switch(config)# copy running-config startup-config
A previous posting discussed the selection of sampling rates. Additional information can be found on the Cisco web site.
See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.
Thursday, August 30, 2012
Configuring a ZTE switch
The following commands configure a ZTE switch (10.0.0.254) to sample packets at 1-in-1024, poll counters every 30 seconds and send sFlow to an analyzer (10.0.0.50) using the default sFlow port 6343:
Note: For bi-directional sampling, the ingress and egress sampling rates must be set to the same value since sFlow doesn't support asymmetric sampling on an interface. Ingress only or egress only sampling is supported, so you can disable sampling in either direction. The ZTE documentation doesn't describe how to configure counter polling. Counter export is a required component of the sFlow protocol, so this is either an omission in documentation, or a defect that should be fixed in a future release.
See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.
sflow enable sflow agent-config ipv4-address 10.0.0.254 sflow collector-config ipv4-address 10.0.0.50 6343For each interface:
interface gei_1/1 sflow-sample-rate ingress 1024 sflow-sample-rate egress 1024 exitA previous posting discussed the selection of sampling rates. Additional information can be found on the ZTE web site.
Note: For bi-directional sampling, the ingress and egress sampling rates must be set to the same value since sFlow doesn't support asymmetric sampling on an interface. Ingress only or egress only sampling is supported, so you can disable sampling in either direction. The ZTE documentation doesn't describe how to configure counter polling. Counter export is a required component of the sFlow protocol, so this is either an omission in documentation, or a defect that should be fixed in a future release.
See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.
Wednesday, August 29, 2012
Configuring Edge-Core switches
The following commands configure an Edge-Core switch to sample packets at 1-in-1024, poll counters every 30 seconds and send sFlow to an analyzer (10.0.0.50) using the default sFlow port 6343:
See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.
sflow receiver 1 10.0.0.50 port 6343For each interface:
interface te1 sflow flow-sampling 1024 1 sflow counter-sampling 20 1A previous posting discussed the selection of sampling rates. Additional information can be found on the Edge-Core web site.
See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.
Monday, August 27, 2012
Push vs Pull
Push-me-pull-you from Doctor Doolittle |
- Push, metrics are periodically sent by each monitored system to a central collector. Examples of push architectures include: sFlow, Ganglia, Graphite, collectd and StatsD.
- Pull, a central collector periodically requests metrics from each monitored system. Examples of pull architectures include: SNMP, JMX, WMI and libvirt.
Push | Pull | |
---|---|---|
Discovery | Agent automatically sends metrics as soon as it starts up, ensuring that it is immediately detected and continuously monitored. Speed of discovery is independent of number of agents. | Discovery requires collector to periodically sweep address space to find new agents. Speed of discovery depends on discovery sweep interval and size of address space. |
Scalability | Polling task fully distributed among agents, resulting in linear scalability. Lightweight central collector listens for updates and stores measurements. Minimal work for agents to periodically send fixed set of measurements. Agents are stateless, exporting data as soon as it is generated. | Workload on central poller increases with the number of devices polled. Additional work on poller to generate requests and maintaining session state in order to match requests and responses. Additional work for agents to parse and process requests. Agents often required to maintain state so that metrics can be retrieved at a later time by the poller. |
Security | Push agents are inherently secure against remote attacks since they do not listen for network connections. | Polling protocol can potentially open up system to remote access and denial of service attacks. |
Operational Complexity | Minimal configuration required for agents: polling interval and address of collector. Firewalls need to be configured for unidirectional communication of measurements from agents to collector. | Poller needs to be configured with list of devices to poll, security credentials to access the devices and the set of measurements to retrieve. Firewalls need to be configured to allow bi-directional communication between poller and agents. |
Latency | The low overhead and distributed nature of the push model permits measurement to be sent more frequently, allowing the management system to quickly react to changes. In addition, many push protocols, like sFlow, are implemented on top of UDP, providing non-blocking, low-latency transport of measurements. | The lack of scalability in polling typically means that measurements are retrieved less often, resulting in a delayed view of performance that makes the management system less responsive to changes. The two way communication involved in polling increases latency as connections are established and authenticated before measurements can be retrieved. |
Flexibility | Relatively inflexible: pre-determined, fixed set of measurements are periodically exported. | Flexible: poller can ask for any metric at any time. |
The push model is particularly attractive for large scale cloud environments where services and hosts are constantly being added, removed, started and stopped. Maintaining lists of devices to poll for statistics in these environments is challenging and the discovery, scalability, security, low-latency and the simplicity of the push model make it a clear winner.
The sFlow standard is particularly well suited to large scale monitoring of cloud infrastructures, delivering the comprehensive visibility into the performance of network, compute and application resources needed for effective management and control.
In practice, a hybrid approach provides the best overall solution. The core set of standard metrics needed to manage performance and detect problems is pushed using sFlow and a pull protocol is used to retrieve diagnostic information from specific devices when a problem is detected.
Friday, August 24, 2012
Configuring Huawei switches
The following commands configure a Huawei switch (10.0.0.254) to sample packets at 1-in-1024, poll counters every 30 seconds and send sFlow to an analyzer (10.0.0.50) using the default sFlow port 6343:
See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.
system-view sflow collector 1 ip 10.0.0.50 port 6343 sflow agent ip 10.0.0.254For each interface:
system-view interface gigabitethernet 1/0/2 sflow flow-sampling collector 1 sflow flow-sampling rate 1024 sflow counter-sampling collector 1 sflow counter-sampling interval 30A previous posting discussed the selection of sampling rates. Additional information can be found on the Huawei web site.
See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.