sFlow: 2010

Saturday, December 18, 2010

Visibility in the cloud

One of the challenges in moving a virtual machine from a private data center to a public cloud like Amazon Elastic Compute Cloud (EC2) or Rackspace Cloud is maintaining visibility into performance.

The article, Cloud-scale performance monitoring, describes how the sFlow standard delivers the visibility needed to manage the cloud infrastructure. In the case of a private cloud, where the physical infrastructure and virtual machines are dedicated to a single organization, the visibility provided by the infrastructure can be shared with internal customers and used to manage the services deployed in the cloud.

However, in a public cloud the infrastructure is owned and operated by the cloud service provider and customers are typically given very little visibility into the shared infrastructure hosting their virtual machines.

For example, the diagram at the top of this article shows three virtual machines, VM1, VM2 and VM3, hosted on two physical servers, Server 1 and Server 2. If these virtual machines were hosted in a private cloud all the elements of the physical and virtual infrastructure shown in the diagram can be instrumented with sFlow providing visibility to the management team.

However, move the three virtual machines to a public cloud and only the virtual machines are visible. A Management Boundary separates service provider resources from the customer resources and it is no longer possible to know which virtual machines are hosted on which physical servers or to see network and system performance using sFlow from switches and servers.

The diagram above shows the elements from the example that are visible in a public cloud deployment. The example is representative of a typical small scale deployment: the Vyatta virtual appliance (VM3) provides routing and firewall capabilities, VM1 is configured as a web server and VM2 as a database server. One of the benefits of moving to the public cloud is the ability to scale up the number of servers to meet demand. The article, How Zynga Survived FarmVille, describes using a public cloud provider to handle rapidly changing workloads. The architecture mentioned in the article is a widely adopted, scale-out, implementation of the elements shown in the diagram - see Memcached for additional details, large scale deployments of this architecture may involve thousands of servers.

In order to provide visibility in a public cloud deployment, each virtual machine must be responsible for monitoring its own performance. The Vyatta virtual appliance already includes support for sFlow. Installing Host sFlow agents on the virtual machines extends visibility to include network and system performance throughout the virtual machine cluster - see Cluster performance.

A key benefit of deploying services in the public cloud is the ability to dynamically add and remove capacity. In this environment, sFlow monitoring helps control costs by providing the data needed to closely match capacity to demand. In addition, many organizations operate hybrid clouds with some workloads running in a private cloud and others running in the public cloud. sFlow simplifies management by delivering integrated visibility across all the physical and virtual elements in the private and public cloud, providing the measurements needed to manage costs by striking the optimal balance between public and private cloud capacity.

Friday, December 17, 2010

ULOG

(Netfilter diagram from Wikimedia)

The Host sFlow agent recently added support for netfilter based traffic monitoring. The netfilter/iptables packet filtering framework is an integral part of recent Linux kernels, providing the mechanisms needed to implement firewalls and perform address translation.

Included within the netfilter framework is a packet sampling facility. In addition to sampling packets, the netfilter framework captures the forwarding path associated with each sampled packet, providing the essential elements needed to implement sFlow standard traffic monitoring on a Linux system.

Instructions for installing Host sFlow are provided in the article, Installing Host sFlow on a Linux server. In many cases configuring traffic monitoring on servers is unnecessary since sFlow capable physical and virtual switches already provide end-to-end network visibility (see Hybrid server monitoring). However, if traffic data isn't available from the switches, either because they don't support sFlow, or because they are managed by a different organization, then traffic monitoring on the servers is required.

This article describes the additional steps needed to configure sFlow traffic monitoring using netfilter. The following steps configure 1-in-1000 sampling of packets on a Fedora 14 server. The sampling rate of 1-in-1000 was selected based on the 1Gbit speed of the network adapter. See the article, Sampling rates, for suggested sampling rates.

First, list the existing iptables rules:

[root@fedora14 ~]# iptables --list --line-numbers --verbose
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1        0     0 ACCEPT     all  --  lo     any     anywhere             anywhere            
2       93  8415 ACCEPT     all  --  any    any     anywhere             anywhere            state RELATED,ESTABLISHED 
3        1    84 ACCEPT     icmp --  any    any     anywhere             anywhere            
4        1    64 ACCEPT     tcp  --  any    any     anywhere             anywhere            state NEW tcp dpt:ssh 
5        9  1138 REJECT     all  --  any    any     anywhere             anywhere            reject-with icmp-host-prohibited 

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1        0     0 REJECT     all  --  any    any     anywhere             anywhere            reject-with icmp-host-prohibited 

Chain OUTPUT (policy ACCEPT 68 packets, 9509 bytes)
num   pkts bytes target     prot opt in     out     source               destination

Rules are evaluated in order, so it is important to find the correct place to apply sampling. The first rule in the INPUT chain accepts all traffic associated with the internal loopback interface (lo). This rule is needed because many applications use the loopback interface for inter-process communications. Since we are only interested in external traffic, the ULOG rule should be inserted as rule 2 in this rule chain:

iptables -I INPUT 2 -m statistic --mode random --probability 0.001 -j ULOG --ulog-nlgroup 5

There are currently no rules in the OUTPUT chain, so we can simply add the ULOG rule:

iptables -A OUTPUT -m statistic --mode random --probability 0.001 -j ULOG --ulog-nlgroup 5

Note: Sampling rates are expressed as probabilities, so the sampling rate of 1-in-1000 translates to a probability of 0.001. Only add one sFlow sampling rule to each chain. Duplicate sampling rules will result in biased measurements since the probability of sampling a packet will vary depending on where it matches in the chain. Use the same sampling probability in both INPUT and OUTPUT chains for the same reason.

Note: There are 32 netlink groups (1-32) that can be used to transmit ULOG messages. Check to see if there are any other ULOG statements in iptables and make sure to select a distinct group for sFlow sampling. In this case group 5 has been selected.

Listing the table again confirms that the changes are correct:

[root@fedora14 ~]# iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            
ULOG       all  --  anywhere             anywhere            statistic mode random probability 0.001000 ULOG copy_range 0 nlgroup 5 queue_threshold 1 
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED 
ACCEPT     icmp --  anywhere             anywhere            
ACCEPT     tcp  --  anywhere             anywhere            state NEW tcp dpt:ssh 
REJECT     all  --  anywhere             anywhere            reject-with icmp-host-prohibited 

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
REJECT     all  --  anywhere             anywhere            reject-with icmp-host-prohibited 

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
ULOG       all  --  anywhere             anywhere            statistic mode random probability 0.001000 ULOG copy_range 0 nlgroup 5 queue_threshold 1

In many deployments, servers are running in a secure network behind a firewall and so the overhead of running a stateful firewall on each server is unnecessary. In this case a very simple, monitoring only, configuration of iptables provides traffic visibility with minimal impact on server performance:

[root@fedora14 ~]# iptables --list 
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ULOG       all  --  anywhere             anywhere            statistic mode random probability 0.001000 ULOG copy_range 0 nlgroup 5 queue_threshold 1 

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
ULOG       all  --  anywhere             anywhere            statistic mode random probability 0.001000 ULOG copy_range 0 nlgroup 5 queue_threshold 1

Once the rules are correct, they should be saved so that they will automatically be reinstalled if the server is rebooted.

[root@fedora14 ~]# service iptables save

The Host sFlow agent needs to be configured to export the samples (by editing the /etc/hsflowd.conf file). The following configuration instructs the Host sFlow agent to use DNS-SD to automatically configure sFlow receivers and polling intervals. The additional ULOG settings tell the agent which ULOG nlgroup to listen to for packet samples as well as the sampling probability that was configured in iptables:

sflow {
  DNSSD = on

  # ULOG settings
  ulogProbability = 0.001
  ulogGroup = 5
}

Note: Make sure that the sampling probability specified in the Host sFlow configuration matches the probability used in the iptables rules. Any discrepancies will result in incorrectly scaled traffic measurements.

Next, restart the Host sFlow agent so that it picks up the new configuration:

[root@fedora14 ~]# service hsflowd restart

Note: The Host sFlow agent can resample ULOG captured packets in order to achieve the sampling rate specified using DNS-SD, or through the sampling setting in the /etc/hsflowd.conf file. Choose a relatively aggressive ULOG sampling probability that reduces the overhead of monitoring, but allows a wide range of sampling rates to be set. For example, configuring the ULOG probability to 0.01 will allow Host sFlow agent sampling rates to be set to 100, 200, 300, 400 etc. The Host sFlow agent will choose the nearest sampling rate it can achieve, so if you configure a sampling rate of 290, it would actually sample with a rate of 300 (i.e. sample every third ULOG packet).

At this point traffic data from the server should start appearing in the sFlow analyzer. The following chart shows top connections monitored using ULOG/Host sFlow:

Finally, sFlow monitoring of servers is part of an overall solution that simplifies management by unifying network, storage, server and application performance monitoring within a single scalable system (see sFlow Host Structures). Implementing an sFlow monitoring solution helps break down management silos, ensuring the coordination of resources needed to manage a converged infrastructure.

Saturday, December 4, 2010

Baseline

(chart from SCOM: How self-tuning threshold baseline is computed)

Calculating a baseline is a common technique in network and system management. The article, SCOM: How self-tuning threshold baseline is computed, describes how a value is monitored over time allowing a statistical envelope of likely values to be calculated. If the actual value falls outside the envelope then an alert is generated.

With any statistical baseline there is always a possibility that a normal value will fall outside the baseline envelope and trigger a false alarm. There is a tradeoff between making the baseline sensitive enough to quickly report an anomaly while avoiding excessive numbers of false alarms. For example, suppose that the value is monitored every minute. If the envelope covers 99.9% of values then between 1 and 2 false alarms per day would be expected. Reducing the sensitivity by choosing an envelope that covers 99.99% reduces the false positive rate to approximately 1 per week.

However, calculating a more accurate baseline is complicated by the need to monitor for a longer period. In the above example it would take at least a week to calculate the 99.99% baseline. Further complicating the calculation of longer term baselines is that the approach assumes a predictable and relatively static demand on the system. If demand is changing rapidly then the false alarm rate will go up since by the time the baseline is calculated it will no longer reflect the current behavior of the system.

The problem of false alarms creates a scalability problem when the time based, or temporal, baseline approach described above is used to monitor large numbers of items since the number of false alarms will increase as the number of items being monitored increases. For example, if there is only 1 false alarm per week per item being monitored, then the frequency of false alarms will go up with the number of items being monitored: going from 1 item to 1,000 items increases the false alarm rate to 1 every 10 minutes, increasing the number of items to 10,000 generates a false alarm every minute and finally, increasing the number of items to 100,000 generates a false alarm every 6 seconds.

The following chart shows how the accuracy of temporal baseline declines with system size as the number of false alarms drowns out useful alerts.

An alternative approach to calculating baselines is shown on the graph. Instead of treating each item separately and comparing its current and past values, a spatial baseline compares items with each other and identifies items that differ from their peers. As a result, the accuracy of a spatial baseline increases as the number of items increases.

In addition, a spatial baseline requires no training period, allowing anomalies to be immediately identified. For example, when monitoring a converged data center environment a spatial baseline can be immediately applied as new resources added to a service pool whereas a temporal baseline approach would require time to calculate a baseline for the new member of the pool. In fact the addition of resources to the pool could cause a flurry of temporal baseline false alarms as the load profile of existing members of the resource pool changes, putting them outside their historic norms.

The table above compares performance metrics between servers within a cluster (see Top servers). It is immediately apparent from the chart that the server at the top of the chart has metrics that differ significantly from the other members of the 1,000 server cluster, indicating that the server is experiencing a performance anomaly.

To summarize, the following table compares temporal and spacial baseline techniques as they apply to small and large scale system monitoring:

The challenge in implementing a spatial baseline approach to anomaly detection is efficiently collecting metrics from all the systems in order to be able to compare them and create a baseline.

The sFlow standard is widely implemented by data center equipment vendors, providing an efficient solution that is ideally suited to managing performance in large scale converged, virtualized and cloud data center environments. The sFlow architecture provides a highly scalable mechanism for centrally collecting metrics from all the network, server and storage resources in the data center that is ideally suited to spatial baselining.

Wednesday, December 1, 2010

XCP 1.0 beta

This article describes the steps needed to configure sFlow monitoring on the latest Xen Cloud Platform (XCP) 1.0 beta release.

First, download and install the XCP 1.0 beta on a server. The simplest way to build an XCP server is to use the binaries from the XCP 1.0 beta download page.

Next, enable the Open vSwitch by opening a console and typing the following commands:

xe-switch-network-backend openvswitch
reboot

The article, Configuring Open vSwitch, describes the steps to manually configure sFlow monitoring. However, manual configuration is not recommended since additional configuration is required every time a bridge is added to the vSwitch, or the system is rebooted.

The Host sFlow agent automatically configures the Open vSwitch and provides additional performance information from the physical and virtual machines (see Cloud-scale performance monitoring).

Download the Host sFlow agent and install it on your server using the following commands:

rpm -Uvh hsflowd_XCP_xxx.rpm
service hsflowd start
service sflowovsd start

The following steps configure all the sFlow agents to sample packets at 1-in-512, poll counters every 20 seconds and send sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343).

Note: A previous posting discussed the selection of sampling rates.

The default configuration method for sFlow is DNS-SD; enter the following DNS settings in the site DNS server:

analyzer A 10.0.0.50

_sflow._udp SRV 0 0 6343 analyzer
_sflow._udp TXT (
"txtvers=1"
"polling=20"
"sampling=512"
)

Note: These changes must be made to the DNS zone file corresponding to the search domain in the XCP server /etc/resolv.conf file. If you need to add a search domain to the DNS settings, do not edit the resolv.conf file directly since changes will be lost on a reboot, instead either follow the directions in How to Add a Permanent Search Domain Entry in the Resolv.conf File of a XenServer Host, or simply edit the DNSSD_domain setting in hsflowd.conf to specify the domain to use to retrieve DNS-SD settings.

Once the sFlow settings are added to the DNS server, they will be automatically picked up by the Host sFlow agents. If you need to change the sFlow settings, simply change them on the DNS server and the change will automatically be applied to all the XCP systems in the data center.

Manual configuration is an option if you do not want to use DNS-SD. Edit the Host sFlow agent configuration file, /etc/hsflowd.conf, on each XCP server:

sflow{
  DNSSD = off
  polling = 20
  sampling = 512
  collector{
    ip = 10.0.0.50
    udpport = 6343
  }
}

After editing the configuration file you will need to restart the Host sFlow agent:

service hsflowd restart

An sFlow Analyzer is needed to receive the sFlow data and report on performance (see Choosing an sFlow analyzer). The free sFlowTrend analyzer is a great way to get started, see sFlowTrend adds server performance monitoring to see examples.

March 3, 2011 Update: XCP 1.0 has now been released, download the production version from the XCP 1.0 Download page. The installation procedure hasn't changed - follow these instructions to enable sFlow on XCP 1.0.

Tuesday, November 16, 2010

Complexity kills

(slide from NetFlow/IPFIX Various Thoughts)

The July 2010 presentation NetFlow/IPFIX Various Thoughts from the IETF 3rd NMRG Workshop on NetFlow/IPFIX Usage in Network Management describes some of the challenges that still need to be addressed in NetFlow/IPFIX. In particular the slide above describes how increased flexibility has resulted in greater complexity when trying to configure and deploy NetFlow/IPFIX monitoring systems.

In contrast, sFlow implementations have very few configuration options. While there are superficial similarities between sFlow and NetFlow/IPFIX, the two approaches to network performance management reflect profound differences between the design goals of the two standards (see Standards).

NetFlow/IPFIX was developed to export WAN traffic measurements and is typically deployed in IP routers. Configuring routers is a complex task, requiring configuration of subnets, routing protocols, WAN interfaces etc. Many of the functions in a router are implemented in software, providing a flexible platform that permits complex measurements to be made. Over time, options have been added to NetFlow/IPFIX in order to export increasingly complex measurements used to characterize WAN traffic.

sFlow evolved to provide end-to-end monitoring of high-speed layer 2/3 Ethernet switches. Ethernet switches offer plug-and-play connectivity and require very little configuration. Unlike routers, switches perform most of their functions in hardware, relying on software only to perform simple management tasks. The need to embed measurement in hardware resulted in a standard that is very simple with minimal configuration options. However, the basic sFlow measurements, while simple to configure and implement in switches, provide a rich source of information about the performance of switched networks. Instead of relying on the switches to analyze the traffic, raw data is sent to a central sFlow analyzer (see Choosing an sFlow analyzer). The sFlow architecture results in a highly scalable system that can monitor the large numbers of high-speed switch ports found in layer 2 networks (see Superlinear).

The goal of convergence is to simplify data centers, creating flexible pools of storage and computing running over a flat, high bandwidth, low latency, Ethernet fabric (see Convergence). Eliminating complexity is essential if the scalability and flexibility of a converged infrastructure is to be realized.

Microsoft's Chief Software Architect, Ray Ozzie, eloquently describes the dangers of complexity in Dawn of a New Day, "Complexity kills. Complexity sucks the life out of users, developers and IT. Complexity makes products difficult to plan, build, test and use. Complexity introduces security challenges. Complexity causes administrator frustration."

Maintaining visibility in large scale, dynamic data center environments requires a measurement technology that is designed for the task. sFlow is a mature, multi-vendor standard supported by most switch vendors that delivers the scalable plug and play visibility needed to manage performance in converged data center environments.

Finally, the end to end visibility that sFlow provides is a critical element in building scalable systems. Measurement eliminates uncertainty and reduces the complexity of managing large systems (see Scalability). An effective monitoring system is the foundation for automation: reducing costs, improving efficiency and optimizing performance in the data center.

Sunday, November 14, 2010

Shrink ray

(image from Despicable Me)

In the movie, Despicable Me, a shrink ray features prominently, making it possible to steal the Moon by shrinking it small enough to fit in the villian's pocket.

The ability to handle large, high-speed networks is one of the key benefits of the sFlow standard. The scalability results because sFlow's packet sampling technology acts like a shrink ray, shrinking down network traffic so that it is easier to analyze, reducing even the largest network to a manageable size.

Shrinking an image is another way of illustrating the scaling function that an sFlow monitoring system performs. When shrinking an image, sampling and compression operations reduce the amount of data needed to store the image while preserving the essential features of the original.

Choosing the right sampling rate is the key to a successful sFlow deployment. The sampling rate acts as the network shrink factor, reducing the resources needed to manage the network while preserving the essential features needed for a clear picture of network activity. For example, a sampling rate of 1-in-8192 shrinks even the busiest network down to a manageable size (see AMX-IX).

Monday, November 1, 2010

NUMA

SMP architecture

(image from Introduction to Parallel Computing)

As the number of processor cores increases, system architectures have moved from Symmetric Multi-Processing (SMP) to Non-Uniform Memory Access (NUMA). SMP systems are limited in scalability by contention for access to the shared memory. In a NUMA system, memory is divided among groups of CPU's, increasing the bandwidth and reducing latency of access to memory within a module at the cost of increased latency for non-local memory access. Intel Xeon (Nahalem) and AMD Opteron (Magny-Cours) based servers provide commodity examples of the NUMA architecture.

NUMA architecture

(image from Introduction to Parallel Computing)

System software running on a NUMA architecture needs to be aware of the processor topology in order to properly allocate memory and processes to maximize performance (see Process Scheduling Challenges in the Era of Multi-Core Processors). Since NUMA based servers are widely deployed, most server operating systems are NUMA aware and take location into account when scheduling tasks and allocating memory.

Virtualization platforms also need to be location aware when allocating resources to virtual machines on NUMA systems. The article, How to optimize VM memory and processor performance, describes some of the issues involved in allocating virtual machine vCPUs to NUMA nodes.

(image from How to optimize VM memory and processor performance)

Ethernet networks share similar NUMA like properties; sending data over a short transmission path offers lower latency and higher bandwidth than sending the data over a longer transmission path. While bandwidth within an Ethernet switch is high (multi-Terrabit capacity backplanes are not uncommon), the bandwidth of Ethernet links connecting switches is only 1Gbit/s or 10Gbit/s (with 40Gbit/s and 100Gbit/s on the horizon). Shortest path bridging (see 802.1aq and Trill) further increases the amount of bandwidth, and reduces the latency of communication, between systems that are "close".

Virtualization and the need to support virtual machine mobility (e.g. vMotion/XenMotion/Xen Live Migration) is driving the adoption of large, flat, high-speed, layer-2, switched Ethernet fabrics in the data center. A layer-2 fabric allows a virtual machine to keeps its IP address and maintain network connections when it moves (performing a "live" migration). However, while a layer-2 fabric provides transparent connectivity that allows virtual machines to move, the performance of the virtual machine is highly dependent on its communication patterns and location.

As servers are pooled into large clusters, virtual machines can easily be moved, not just between NUMA nodes within a servers, but between servers within the cluster. For optimal performance the cluster orchestration software needs to be aware of the network topology and workloads in order to place each VM in the optimal location. The paper, Tashi: Location-aware Cluster Management, describes a network aware cluster management system, currently supporting Xen and KVM.

The inclusion of the sFlow standard in network switches and virtualization platforms (see XCP, XenServer and KVM) provides the visibility into each virtual machine's current workload and dependencies, including tracking the virtual machine as it migrates across the data center.

In the article, Network visibility in the data center, an example was presented showing how virtual machine migration could cause a cascade of performance problems. The illustration above demonstrates how virtual machine migration can be used to optimize performance. In this example sFlow monitoring identifies that two virtual machines, VM1 and VM2, are exchanging significant amounts of traffic across the core of the network. In addition, sFlow data from the servers shows that while the server currently hosting VM1 is close to capacity, there is spare capacity on the server hosting VM2. Migrating VM1 to VM2's server reduces network traffic through the core as well as reducing the latency of communication between VM1 and VM2.

Note: For many protocols low latency is extremely important, examples include: Memcached, FCoE, NFS, iSCSI, and RDMA over Converged Ethernet (RoCE). It's the Latency, Stupid is an excellent, if somewhat dated article describing the importance of low latency in networks. The article, Latency Is Everywhere And It Costs You Sales - How To Crush It, presents a number of examples demonstrating the value of low latency and discusses strategies for reducing latency.

The virtual machine migration examples illustrate the value of the integrated view of network, storage, system and application performance that sFlow provides (see sFlow Host Structures). More broadly, visibility is the key to controlling costs, improving efficiency, reducing power and optimizing performance in the data center.

Finally, there are two interesting trends taking data centers in opposite directions. From the computing side, there is a move from SMP to NUMA systems in order to increase scalability and performance. On the networking side there is a trend toward creating non-blocking architectures, analogous to a move from the current NUMA structure of networking to an SMP model. While there is an appeal to hiding the network from applications in order to create a "uniform" cloud; the physics of data transmission is inescapable: the shorter the communication path, the greater the bandwidth and the lower the latency. Instead of trying to hide the network, a better long term strategy is to make the network structure and performance visible to system software so that it appears as additional tiers in the NUMA hierarchy, allowing operating systems, hypervisors and cluster orchestration software to optimally position workloads and manage the network resources needed to deliver cloud services. Bringing network resources under the control of a unified "cloud operating system" will dramatically simplify management and ensures the tight coordination of resources needed for optimal performance.

Thursday, October 28, 2010

Memcached missed keys

This article follows on from the sFlow for Memcached and Memcached hot keys articles and examines how sFlow can be used to improve the cache hit rate in a Memcached cluster.

In a typical Memcached deployment, a cache miss results in an expensive query to the database (see sFlow for Memcached). Since the database is usually the performance bottleneck, anything that can be done to reduce the number of misses can significantly boost the overall performance of the service. Memcached performance counters make it easy to calculate cache hit/miss rates and ratios, but don't provide insight into which keys are associated with misses. Keeping track of missed keys within Memcached would be prohibitively expensive since you would need to use memory to store information about the potentially infinite set of keys not in the cache, consuming memory that would be more usefully assigned to increase the cache size (and improve the cache hit rate).

When instrumented with sFlow, Memcached operations are sampled and the records are sent to a central location for analysis so there is no memory is taken away from the cache in order to identify top missed keys (see Superlinear for a more general discussion about the scalability of sFlow's architecture).

The article, Memcached hot keys contains a script that identifies the most frequently used keys in Memcached operations. The topkeys.pl script does not distinguish between operations and contains keys involved in hits and misses. Since sFlow reports the status code of each operation (see sFlow for memcached) it is straightforward to modify the topkeys.pl script to report only on misses (i.e. report on operations where memcached_status is NOT_FOUND=8).

The following Perl script, topmissedkeys.pl, runs sflowtool and processes the output to display the top 20 missed keys every minute:

#!/usr/bin/perl -w
use strict;
use POSIX;

sub min ($$) { $_[$_[0] > $_[1]] }
my $key_value = "";
my %key_count = ();
my $start = time();
open(PS, "/usr/local/bin/sflowtool|") || die "Failed: $!\n";
while( <PS> ) {
  my ($attr,$value) = split;
  if('memcache_op_key' eq $attr) {
    $value =~ s/\%([A-Fa-f0-9]{2})/pack('C', hex($1))/seg;
    $key_value = $value;
  }
  if('memcache_op_status' eq $attr) {
    if('8' eq $value) {
      $key_count{$key_value}++;
    }
  }
  if('endDatagram' eq $attr) {
    my $now = time();
    if($now - $start >= 60) {
      printf "=== %s ===\n", strftime('%m/%d/%Y %H:%M:%S', localtime);
      my @sorted = sort { $key_count {$b} <=> $key_count {$a}} keys %key_count;
      for(my $i = 0; $i < min(20,@sorted); $i++) {
        my $key = $sorted[$i];
        printf "%2d %3d %s\n", $i + 1, $key_count{$key}, $key;
      }
      %key_count = ();
      $start = $now;
    }
  }
}

close(PS);

The resulting output displays a sorted table of the top missed keys:

./topmissedkeys.pl
=== 10/27/2010 23:27:40 ===
 1   4 /tmp/hsperfdata_inmsf
 2   3 /tmp/hsperfdata_pp
 3   1 /etc/at.deny
 4   1 /etc/profile.d
 5   1 /etc/java
 6   1 /etc/gssapi_mech.conf
 7   1 /etc/cron.daily
 8   1 /etc/capi.conf
 9   1 /etc/passwd
10   1 /etc/gpm-root.conf
11   1 /etc/X11
12   1 /etc/hsflowd.conf
13   1 /etc/makedev.d
14   1 /etc/fonts
15   1 /etc/dovecot.conf
16   1 /etc/alchemist
17   1 /etc/yum.conf
18   1 /etc/printcap
19   1 /etc/smrsh
20   1 /etc/ld.so.cache

The table shows that the top missed key is /tmp/hsperfdata_inmsf, having occurred in 4 samples during the minute.

This example can be extended to display top clients, top operations, etc. associated with cache misses. In order to generate quantitatively accurate results, see Packet Sampling Basics, to see how to properly scale sFlow data.

Monday, October 25, 2010

Ganglia

The open source Ganglia Monitoring System is widely used to monitor high-performance computing systems such as clusters and Grids. The recent addition of sFlow support makes Ganglia an attractive option for monitoring servers in cloud computing environments (see Cloud-scale performance monitoring).

The diagram shows the elements of the solution. Each server sends sFlow to the Ganglia gmond process which builds an in-memory database containing the server statistics. The Ganglia gmetad process periodically queries the gmond database and updates trend charts that are made available through a web interface. The sFlow server performance data seamlessly integrates with Ganglia since the standard sFlow server metrics are based on Ganglia's core set of metrics (see sFlow Host Structures).

The Host sFlow agent is a free, open source sFlow implementation. The Host sFlow agent reports on the performance of physical and virtual servers and currently supports Linux and Windows servers as well as the XenServer, Xen/XCP, KVM and libvirt virtualization platforms.

Note: To try out Ganglia's sFlow reporting, you will need to download and compile Ganglia from sources since the feature is currently in the development branch (see http://sourceforge.net/projects/ganglia/develop).

The following entry in the gmond configuration file (/etc/gmond.conf) opens a port to receive sFlow data:

/* sFlow channel */
udp_recv_channel {
  port = 6343
}

The integration of network, system and application monitoring (see sFlow Host Structures) makes sFlow ideally suited for converged infrastructure monitoring. Using a single multi-vendor standard for both network and system performance monitoring reduces complexity and provides the integrated view of performance needed for effective management (see Management silos).

Jul. 7, 2011 Update: The latest Ganglia release now includes sFlow support, see Ganglia 3.2 released.

Thursday, October 21, 2010

Installing Host sFlow on a Linux server

The Host sFlow agent supports Linux performance monitoring, providing a lightweight, scalable solution for monitoring large numbers of Linux servers.

The following steps demonstrate how to install and configure the Host sFlow agent on a Linux server, sending sFlow to an analyzer with IP address 10.0.0.50.

Note: If there are any firewalls between the Linux servers and the sFlow analyzer, you will need to ensure that packets to the sFlow analyzer (UDP port 6343) are permitted.

First go to the Host sFlow web site and download the RPM file for your Linux distribution. If an RPM doesn't exist, you will need to download the source code.

If you are installing from RPM, the following commands will install and start the Host sFlow agent:

rpm -Uvh hsflowd_XXX.rpm
service hsflowd start

If you are building from sources, then using the following commands:

tar -xzf hsflowd-X.XX.tar.gz
cd hsflowd-X.XX
make
make install
make schedule
service hsflowd start

The default configuration method used for sFlow is DNS-SD; enter the following DNS settings in the site DNS server:

analyzer A 10.0.0.50

_sflow._udp SRV 0 0 6343 analyzer
_sflow._udp TXT (
"txtvers=1"
"polling=20"
"sampling=512"
)

Note: These changes must be made to the DNS zone file corresponding to the search domain in the Linux server's /etc/resolv.conf file. Alternatively, you can explicitly configure the domain using the DNSSD_domain setting in /etc/hsflowd.conf.

Once the sFlow settings are added to the DNS server, they will be automatically picked up by the Host sFlow agents. If you need to change the sFlow settings, simply change them on the DNS server and the change will automatically be applied to all the Linux systems in the data center.

Manual configuration is an option if you do not want to use DNS-SD. Edit the Host sFlow agent configuration file, /etc/hsflowd.conf, on each Linux server:

sflow{
  DNSSD = off
  polling = 20
  sampling = 512
  collector{
   ip = 10.0.0.50
   udpport = 6343
  }
}

After editing the configuration file you will need to restart the Host sFlow agent:

service hsflowd restart

For a complete sFlow monitoring solution you should also collect sFlow from the switches connecting the servers to the network (see Hybrid server monitoring). The sFlow standard is designed to seamlessly integrate monitoring of networks and servers (see sFlow Host Structures).

An sFlow analyzer is needed to receive the sFlow data and report on performance (see Choosing an sFlow analyzer). The free sFlowTrend analyzer is a great way to get started, see sFlowTrend adds server performance monitoring to see examples.

Update: The inclusion of iptables/ULOG support in the Host sFlow agent provides an efficient way to monitor detailed traffic flows if you can't monitor your top of rack switches or if you have virtual machines in a public cloud (see Amazon Elastic Compute Cloud (EC2) and Rackspace cloud servers).

Update: See Configuring Host sFlow for Linux via /etc/hsflowd.conf for the latest configuration information. The Host sFlow agent now supports Linux bridge, macvlan, ipvlan, adapters, Docker, and TCP round trip time.

Installing Host sFlow on a Windows server

The Host sFlow agent supports Windows performance monitoring, providing a lightweight, scalable solution for monitoring large numbers of Windows servers.

The following steps demonstrate how to install and configure the Host sFlow agent on a Windows server, sending sFlow to an analyzer with IP address 10.0.0.50.

Note: If there are any firewalls between the Windows servers and the sFlow analyzer, you will need to ensure that packets to the sFlow analyzer (UDP port 6343) are permitted.

First go to the Host sFlow web site, http://host-sflow.sourceforge.net

Click on the DOWNLOAD NOW button.

If you are using a browser on the Windows server, you should automatically be offered the file hsflowd_windows_setup.msi, click on the Download Now! button to download this file.

Note: If the Windows install file isn't displayed, click on the View all files button and select the file from the list.

Click on the Run button to install the software on your current server. Otherwise, click Save to store the file and copy it to the target system.

Click on the Run button to confirm that you want to install the software.

Click the Next> button

Confirm the installation location and click the Next> button

Enter the IP address of your sFlow analyzer, in this case 10.0.0.50, then click the Next> button.

Click on the Next> button to confirm that you want to install the software.

The Host sFlow agent is now installed, click the Close button to finish.

For a complete sFlow monitoring solution you should also collect sFlow from the switches connecting the servers to the network (see Hybrid server monitoring). The sFlow standard is designed to seamlessly integrate monitoring of networks and servers (see sFlow Host Structures).

An sFlow analyzer is needed to receive the sFlow data and report on performance (see Choosing an sFlow analyzer). The free sFlowTrend analyzer is a great way to get started, see sFlowTrend adds server performance monitoring to see examples.

Friday, October 15, 2010

KVM

Performance of the KVM (Kernel-based Virtual Machine) virtualization system can be monitored using sFlow through a combination of Host sFlow and Open vSwitch software (see Cloud-scale performance monitoring).

The Host sFlow agent provides visibility into the performance of the physical and virtual servers (see sFlow Host Structures). Download the Host sFlow agent and install it on your KVM server using the following commands:

rpm -Uvh hsflowd_KVM-XXX.rpm
service hsflowd start

Note: The virtual machine statistics exported by the Host sFlow agent are equivalent to statistics that can be obtained using the libvirt based tools (virsh, virt-manager) typically used in KVM environments (see libvirt). However, the Host sFlow agent also exports detailed statistics from the physical server as well as coordinating traffic measurements made by the vSwitch.

The Open vSwitch natively supports sFlow, providing visibility into network traffic. To include traffic monitoring, download the Open vSwitch and follow the instructions in How to Install Open vSwitch on Linux and How to Use Open vSwitch with KVM.

Note: There are many good reasons to use the Open vSwitch as your default switch. In addition to sFlow support, the Open vSwitch implements the OpenFlow protocol, providing distributed switching and network virtualization functionality.

The article, Configuring Open vSwitch, describes the steps to manually configure sFlow monitoring. However, manual configuration is not recommended since additional configuration is required every time a bridge is added to the vSwitch, or the system is rebooted. Instead, the Host sFlow agent can automatically configure the sFlow settings in the Open vSwitch, just run the following command to enable this feature:

service sflowovsd start

Note: While the Open vSwitch currently provides the best option for monitoring traffic between virtual machines, the combination of the Host sFlow agent and an sFlow capable switch offers a good alternative (see Hybrid server monitoring). The emerging IEEE 802.1Qbg Edge Virtual Bridging (EVB) standard will make it possible for the switch to monitor traffic between virtual machines (see VEPA).

The following steps configure all the sFlow agents to sample packets at 1-in-512, poll counters every 20 seconds and send sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343).

Note: A previous posting discussed the selection of sampling rates.

The default configuration method used for sFlow is DNS-SD; enter the following DNS settings in the site DNS server:

analyzer A 10.0.0.50

_sflow._udp SRV 0 0 6343 analyzer
_sflow._udp TXT (
"txtvers=1"
"polling=20"
"sampling=512"
)

Note: These changes must be made to the DNS zone file corresponding to the search domain in the KVM server's /etc/resolv.conf file.

Once the sFlow settings are added to the DNS server, they will be automatically picked up by the Host sFlow agents. If you need to change the sFlow settings, simply change them on the DNS server and the change will automatically be applied to all the KVM systems in the data center.

Manual configuration is an option if you do not want to use DNS-SD. Edit the Host sFlow agent configuration file, /etc/hsflowd.conf, on each KVM server:

sflow{
  DNSSD = off
  polling = 20
  sampling = 512
  collector{
    ip = 10.0.0.50
    udpport = 6343
  }
}

After editing the configuration file you will need to restart the Host sFlow agent:

service hsflowd restart

libvirt

The open source libvirt project has created a common set of tools for managing virtualization resources on different virtualization platforms (currently Xen, QEMU, KVM, LXC, OpenVZ, User Mode Linux, VirtualBox and VMware ESX and GSX). An important element of libvirt is the definition of a standard set of metrics for monitoring performance of virtualization domains (virtual machines).

The sFlow standard includes the libvirt performance metrics (see sFlow Host Structures), providing consistency between different virtualization platforms and between sFlow and libvirt based performance monitoring systems.

The benefit of using sFlow for virtual machine performance monitoring is the improved scalability (see Cloud-scale performance monitoring) and the integration of virtualization metrics with network and system performance monitoring (see Management silos).

Friday, October 8, 2010

XCP 0.5

Xen.org mascot

A previous article talked about Xen Cloud Platform (XCP). The latest XCP 0.5 release includes the Open vSwitch, providing native support for sFlow monitoring of network traffic. This article will describe the steps needed to enable sFlow on the XCP 0.5 release.

First, download and install XCP 0.5 on a server. The simplest way to build an XCP server is use the Base ISO from the Xen Cloud Platform Source page.

The article, Configuring Open vSwitch, describes the steps to manually configure sFlow monitoring. However, manual configuration is not recommended since additional configuration is required every time a bridge is added to the vSwitch, or the system is rebooted.

The Host sFlow agent automatically configures the Open vSwitch and provides additional performance information from the physical and virtual machines (see Cloud-scale performance monitoring).

Download the Host sFlow agent and install it on your XCP server using the following commands:

rpm -Uvh hsflowd_XCP_05-XXX.rpm
service hsflowd start
service sflowovsd start

The following steps configure all the sFlow agents to sample packets at 1-in-512, poll counters every 20 seconds and send sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343).

Note: A previous posting discussed the selection of sampling rates.

The default configuration method used for sFlow is DNS-SD; enter the following DNS settings in the site DNS server:

analyzer A 10.0.0.50

_sflow._udp SRV 0 0 6343 analyzer
_sflow._udp TXT (
"txtvers=1"
"polling=20"
"sampling=512"
)

Note: These changes must be made to the DNS zone file corresponding to the search domain in the XCP server's /etc/resolv.conf file.

Once the sFlow settings are added to the DNS server, they will be automatically picked up by the Host sFlow agents. If you need to change the sFlow settings, simply change them on the DNS server and the change will automatically be applied to all the XenServer systems in the data center.

Manual configuration is an option if you do not want to use DNS-SD. Edit the Host sFlow agent configuration file, /etc/hsflowd.conf, on each XCP server:

sflow{
  DNSSD = off
  polling = 20
  sampling = 512
  collector{
    ip = 10.0.0.50
    udpport = 6343
  }
}

After editing the configuration file you will need to restart the Host sFlow agent:

service hsflowd restart

Citrix XenServer

The Citrix XenServer server virtualization solution now includes support for the sFlow standard, delivering the lightweight, scalable performance monitoring solution needed to optimize workloads, isolate performance problems and charge for services. The sFlow standard is widely supported by data center switch vendors: using sFlow to monitor the virtual servers provides an integrated view of network and system performance and reduces costs by eliminating layers of monitoring (see Management silos).

This article describes the steps to enable sFlow in a XenServer environment. The latest XenServer release, XenServer 5.6 Feature Pack 1 (a.k.a. "Cowley"), includes the Open vSwitch and these instructions apply specifically to this release. If you are interested in enabling sFlow on an older version of XenServer, you will need to manually install the Open vSwitch (see How to Install Open vSwitch on Citrix XenServer).

First, enable the Open vSwitch by opening a console and typing the following commands:

xe-switch-network-backend openvswitch
reboot

Note: There are many good reasons to enable the Open vSwitch as your default switch. In addition to sFlow support, the Open vSwitch implements the OpenFlow protocol, providing distributed switching and network virtualization functionality (see The Citrix Blog: The Open vSwitch - Key Ingredient of Enterprise Ready Clouds).

The article, Configuring Open vSwitch, describes the steps to manually configure sFlow monitoring. However, manual configuration is not recommended since additional configuration is required every time a bridge is added to the vSwitch, or the system is rebooted.

The Host sFlow agent automatically configures the Open vSwitch and provides additional performance information from the physical and virtual machines (see Cloud-scale performance monitoring).

Download the Host sFlow agent and install it on your server using the following commands:

rpm -Uvh hsflowd_XenServer_56FP1-XXX.rpm
service hsflowd start
service sflowovsd start

The following steps configure all the sFlow agents to sample packets at 1-in-512, poll counters every 20 seconds and send sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343).

Note: A previous posting discussed the selection of sampling rates.

The default configuration method used for sFlow is DNS-SD; enter the following DNS settings in the site DNS server:

analyzer A 10.0.0.50

_sflow._udp SRV 0 0 6343 analyzer
_sflow._udp TXT (
"txtvers=1"
"polling=20"
"sampling=512"
)

Note: These changes must be made to the DNS zone file corresponding to the search domain in the XenServer /etc/resolv.conf file. If you need to add a search domain to the DNS settings, do not edit the resolv.conf file directly since the changes will be lost on a system reboot, instead either follow the directions in How to Add a Permanent Search Domain Entry in the Resolv.conf File of a XenServer Host, or simply edit the DNSSD_domain setting in hsflowd.conf to specify the domain to use to retrieve DNS-SD settings.

Once the sFlow settings are added to the DNS server, they will be automatically picked up by the Host sFlow agents. If you need to change the sFlow settings, simply change them on the DNS server and the change will automatically be applied to all the XenServer systems in the data center.

Manual configuration is an option if you do not want to use DNS-SD. Edit the Host sFlow agent configuration file, /etc/hsflowd.conf, on each Xenserver:

sflow{
  DNSSD = off
  polling = 20
  sampling = 512
  collector{
    ip = 10.0.0.50
    udpport = 6343
  }
}

After editing the configuration file you will need to restart the Host sFlow agent:

service hsflowd restart

An sFlow analyzer is needed to receive the sFlow data and report on performance (see Choosing an sFlow analyzer). The free sFlowTrend analyzer is a great way to get started, see sFlowTrend adds server performance monitoring to see examples.

Feb. 23, 2011 Update: The Host sFlow agent is now available as a XenServer supplemental pack. For installation instructions see XenServer supplemental pack.

sFlowTrend adds server performance monitoring

The latest release of the free sFlowTrend application adds support for sFlow Host Structures, integrating network and server monitoring in a single tool.

Note: The open source Host sFlow agent provides an easy way to add sFlow host monitoring to your servers (see Host sFlow 1.0 released).

The above screen capture shows the new sFlowTrend Host statistics tab which displays physical and virtual server summary statistics (see Top Servers). The relationship between physical and virtual hosts is clearly displayed, the virtual machines are show as nested within the physical host.

Clicking on a physical host brings up a detailed trend of performance on the physical host (see below):

Similarly, clicking on a virtual host displays virtual server statistics (see below):

However, visibility isn't limited to traditional server performance statistics. Since the sFlow standard is widely supported by switch vendors, we can also look at the network traffic associated with the hosts (see Hybrid server monitoring).

In this case we can even use sFlow to look inside a server and see the traffic between virtual machines since the hypervisor contains the Open vSwitch. The top connections through the virtual switch are shown below:

Have you every wondered what happens to the network traffic when you migrate a virtual machine? The following chart shows the spike in network traffic that occurs during a virtual machine migration (clearly something to be aware of when provisioning network capacity):

An integrated view of network and server performance is critical for managing servers, storage and networking in converged environments (see Convergence).

Download a copy of sFlowTrend to try out sFlow monitoring of your switches and servers.