Wednesday, February 23, 2011

XenServer supplemental pack


The Host sFlow agent is now available as a XenServer 5.6fp1 supplemental pack. This article describes the steps needed to install the supplemental pack and configure sFlow monitoring in a XenServer environment.

First, enable the Open vSwitch by opening a console and typing the following commands:

xe-switch-network-backend openvswitch
reboot

Next, download the Host sFlow XenServer supplemental pack (xenserver-hsflowd-X_XX.iso).

Then, either copy the file to the host and run the following commands:

mkdir /tmp/iso 
mount -o loop xenserver-hsflowd-X_XX.iso /tmp/iso 
cd /tmp/iso 
./install.sh 
cd 
umount /tmp/iso

Alternatively, burn the ISO file onto a CD and run the following commands to install:

mount /dev/cdrom /mnt
cd /mnt
./install.sh
cd
umount /mnt

Next, use the following commands to start the monitoring daemons:

service hsflowd start
service sflowovsd start

The following steps configure all the sFlow agents to sample packets at 1-in-512, poll counters every 20 seconds and send sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343).

Note: A previous posting discussed the selection of sampling rates.

The default configuration method used for sFlow is DNS-SD; enter the following DNS settings in the site DNS server:

analyzer A 10.0.0.50

_sflow._udp SRV 0 0 6343 analyzer
_sflow._udp TXT (
"txtvers=1"
"polling=20"
"sampling=512"
)

Note: These changes must be made to the DNS zone file corresponding to the search domain in the XenServer /etc/resolv.conf file. If you need to add a search domain to the DNS settings, do not edit the resolv.conf file directly since the changes will be lost on a system reboot, instead either follow the directions in How to Add a Permanent Search Domain Entry in the Resolv.conf File of a XenServer Host, or simply edit the DNSSD_domain setting in hsflowd.conf to specify the domain to use to retrieve DNS-SD settings.

Once the sFlow settings are added to the DNS server, they will be automatically picked up by the Host sFlow agents. If you need to change the sFlow settings, simply change them on the DNS server and the change will automatically be applied to all the XenServer systems in the data center.

Alternatively, manual configuration is an option if you do not want to use DNS-SD. Simply edit the Host sFlow agent configuration file, /etc/hsflowd.conf, on each Xenserver:

sflow{
  DNSSD = off
  polling = 20
  sampling = 512
  collector{
    ip = 10.0.0.50
    udpport = 6343
  }
}

After editing the configuration file you will need to restart the Host sFlow agent:

service hsflowd restart

An sFlow analyzer is needed to receive the sFlow data and report on performance (see Choosing an sFlow analyzer). The free sFlowTrend analyzer is a great way to get started, see sFlowTrend adds server performance monitoring to see examples.

Tuesday, February 22, 2011

Windows load average


The latest version of Host sFlow calculates and exports load average metrics from Windows® systems. Since load averages aren't natively supported by Windows, this article provides an overview of load measurement and describes how the Windows Host sFlow agent calculates load averages.

Why the picture of a grocery store checkout line? When choosing which checkout line to join in a grocery store, the length of each lines gives a pretty good indication of how long the wait is for the register. Similarly, with computer systems, the number of tasks waiting for the CPU is a useful measure of system load.

The Windows Task Manager is a familiar tool for monitoring the performance of Windows systems. The Performance tab shows current CPU utilization and a trend of utilization over time.


In this example, the CPU utilization has spiked to 100%, but it's hard to gauge whether the server is overloaded. Going back to the grocery store analogy, we can see that the cashier is busy serving a customer, but we don't know how long the line is.

The following chart, generated using sFlowTrend, displays load average data collected from a Host sFlow agent installed on a Windows server:


The chart shows a trend of 1 minute, 5 minute and 15 minute load averages. Each load average is a moving average of the number of threads waiting to be serviced. A load average of 1.0 on a single processor roughly corresponds to a CPU utilization of 100%. A load average of three, as in this case, indicates that there is more work than the single processor can handle. Upgrading to a server with more CPUs would reduce the load and increase the throughput. Using the grocery store analogy, if the manager sees that lines are building up, it is time to start opening up additional registers.

For additional information, the article, Understanding Linux CPU Load - when should you be worried?, provides a good introduction to load averages and provides useful rules of thumb for sizing servers based on load average.

The white papers, UNIX Load Average Part 1: How It Works and UNIX Load Average Part 2: Not Your Average Average, give a detailed description of load averages and how to calculate them. The key to calculating a load averages on Windows is the ability to monitor the depth of the processor queue. The Microsoft article, Observing Processor Queue Length, describes the Windows System\Processor Queue Length metric, the critical measurement that allows the Host sFlow agent to calculate load averages.

Load averages are an important part of the standard set of metrics that the sFlow standard defines for monitoring the performance of servers. A standard set of metrics simplifies management of multi-vendor, multi-OS environments, while the scalability of sFlow provides real-time, centralized visibility into all the servers in the data center, making it easy to rapidly identify performance problems.

Finally, server monitoring is only one component of the sFlow standard. sFlow measurements from network devices, servers and applications combine to deliver the integrated, end-to-end visibility into performance that is essential in converged, virtualized and cloud environments.

Saturday, February 12, 2011

EC2


The article, Visibility in the cloud, provides a general discussion of how to monitor cloud infrastructure. This article uses the Amazon Elastic Compute Cloud (EC2) service to provide a concrete example of implementing sFlow monitoring in a public cloud.

There are a number of APIs and tools available for managing large cloud server deployments in the Amazon cloud. However, the web interface provides the quickest solution for setting up the small number of cloud servers used in this example:


In this case two Amazon Linux 64 bit instances have been created. In order to provide sFlow monitoring, open source Host sFlow agents were installed on each server.

Note: Amazon does include basic performance monitoring through its CloudWatch service. However, there is a charge for minute granularity reporting and alerts. Implementing a monitoring solution based on the sFlow standard is free and provides minute granularity reporting. In addition, implementing a standards based approach to performance monitoring provides a solution that is portable between public cloud providers (see Rackspace cloudservers for examples of sFlow monitoring in the Rackspace cloud) and private clouds.

The firewall configurations were modified (changes shown in red) to implement packet sampling:

[root@ip-10-117-46-49 ~]# more /etc/sysconfig/iptables
# Generated by iptables-save v1.4.7 on Sat Feb 12 18:41:17 2011
*filter
:INPUT ACCEPT [52:3952]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [28:2896]
-A INPUT -m statistic --mode random --probability 0.010000 -j ULOG --ulog-nlgroup 5 
-A OUTPUT -m statistic --mode random --probability 0.010000 -j ULOG --ulog-nlgroup 5 
COMMIT
# Completed on Sat Feb 12 18:41:17 2011

Note: On Linux systems, Host sFlow uses the iptables ULOG facility to monitor network traffic, see ULOG for a more detailed discussion.

The Host sFlow agents were configured to poll counters every 30 seconds and pick up the packet samples via ULOG, sending the resulting sFlow to collector, 10.117.46.49:

[ec2-user@ip-10-244-162-76 ~]$ more /etc/hsflowd.conf
sflow {
  DNSSD = off
  polling = 30
  sampling = 400

  collector {
    ip = 10.117.46.49
  }

  ulogGroup = 5
  ulogProbability = 0.01
}

Deploying an sFlow analyzer into the cloud provides real-time reports of performance across all the server instances in the cloud. For example, the following chart shows a cluster-wide view of performance:


The following chart displays the top network connections to the cluster:


In addition to monitoring server and network performance, sFlow can also be used to monitor performance of the scale-out applications that are typically deployed in the cloud, including: web farms, memcached and membase clusters.

The sFlow standard is extremely well suited for cloud performance monitoring. The scalability of sFlow allows tens of thousands of cloud servers to be centrally monitored. With sFlow, data is continuously sent from the cloud servers to the sFlow analyzer, providing a real-time view of performance across the cloud.

The sFlow push model is much more efficient than typical monitoring architectures that require the management system to periodically poll servers for statistics. Polling breaks down in highly dynamic cloud environments where servers can appear and disappear. With sFlow, cloud servers are automatically discovered and continuously monitored as soon as they are created. The sFlow messages act as a server heartbeat, providing rapid notification when a server is deleted and stops sending sFlow.

Finally, sFlow provides the detailed, real-time, visibility into network, server and application performance needed to manage performance and control costs. For anyone interested in more information on sFlow, the sFlow presentation provides a strategic view of the role that sFlow monitoring plays in converged, virtualized and cloud environments.

Thursday, February 10, 2011

Configuring Arista switches

The following commands configure an Arista Networks switch (10.0.0.250), sampling packets at 1-in-20,000, polling counters every 20 seconds and sending sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343):

sflow source 10.0.0.250
sflow destination 10.0.0.50
sflow polling-interval 20
sflow sample 20000
sflow run

A previous posting discussed the selection of sampling rates. Additional information can be found on the Arista Networks web site.

See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.