sFlow: January 2011

Sunday, January 30, 2011

Rackspace cloudservers

The article, Visibility in the cloud, provides a general discussion of how to monitor cloud infrastructure. This article uses the Rackspace cloudservers™ hosting service to provide a concrete example of implementing sFlow monitoring in a public cloud.

There are a number of APIs and tools available for managing large cloud server deployments in the Rackspace cloud. However, the web interface provides the quickest solution for setting up the small number of cloud servers used in this example:

In this example, three cloud servers have been created: two Fedora Linux servers and a Windows 2003 server. The following diagram shows the network topology that connects the cloud servers:

Each cloud server is provides with a public IP address and a private IP address. The private network is intended for inter-server communication and there are no usage charges. Bandwidth on the public network is metered and usage-based charges apply.

In this example, the sFlow analyzer has been installed on the server, Web. In order to provide sFlow monitoring, open source Host sFlow agents were installed on the Linux and Windows cloud servers. The sFlow agents were configured to send sFlow to the private address on Web (10.180.164.230).

By default, Rackspace creates Linux cloud servers with a restrictive firewall configuration. The firewall configurations were modified (changes shown in red) to implement packet sampling and allow sFlow datagrams to be received from the private network interface (eth1).

[root@web ~]# more /etc/sysconfig/iptables
# Firewall configuration written by system-config-firewall
# Manual customization of this file is not recommended.
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -i lo -j ACCEPT
-A INPUT -m statistic --mode random --probability 0.01 -j ULOG --ulog-nlgroup 1
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -p udp --dport 6343 -i eth1 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
-A OUTPUT -m statistic --mode random --probability 0.01 -j ULOG --ulog-nlgroup 1
COMMIT

Note: On Linux systems, Host sFlow uses the iptables ULOG facility to monitor network traffic, see ULOG for a more detailed discussion.

The Host sFlow agents were configured to poll counters every 30 seconds and pick up the packet samples via ULOG, sending the resulting sFlow to collector, 10.180.164.230:

[root@web ~]# more /etc/hsflowd.conf 
sflow {
  DNSSD = off
  polling = 30
  sampling = 400

  collector {
    ip = 10.180.164.230
  }

  ulogGroup = 1
  ulogProbability = 0.01
}

Two sFlow analyzers were installed on cloud servers in order to demonstrate different aspects of sFlow analysis: the open source Ganglia cluster monitoring application and the commercial Traffic Sentinel application from InMon Corp. Both applications are easily installed on a Linux cloud server. Both tools also provide web-based interfaces, making them well suited to cloud deployment.

An advantage of using the sFlow standard for server monitoring is that it provides a multi-vendor solution. Windows and Linux servers export standard metrics that link network and system performance and allow a wide variety of analysis applications to be used.

The following web browser screen shot shows Ganglia displaying the performance of the cloud servers:

The charts present a cluster-wide view of performance, with statistics combined from all the servers.

Drilling down to an individual server provides a detailed view of the server's performance:

Traffic Sentinel provides similar functionality when presenting server performance. The following screen shows a cluster-wide view of performance:

In addition, the top servers page, shown below, provides a real-time view comparing the performance of the busiest servers in the cluster.

The sFlow standard originated as a way to monitor network performance and is supported by most switch vendors. The following chart demonstrates some of the visibility into network traffic available using sFlow:

The chart shows a protocol breakdown of the network traffic to the cloud servers. For a more detailed view, the following application map shows how network monitoring can be used to track the complex relationships between the cloud servers:

In addition to monitoring server and network performance, sFlow can also be used to monitor performance of the scale-out applications that are typically deployed in the cloud, including: web farms, memcached and membase clusters.

The sFlow standard is extremely well suited for cloud performance monitoring. The scalability of sFlow allows tens of thousands of cloud servers to be centrally monitored. With sFlow, data is continuously sent from the cloud servers to the sFlow analyzer, providing a real-time view of performance across the cloud.

The sFlow push model is much more efficient than typical monitoring architectures that require the management system to periodically poll servers for statistics. Polling breaks down in highly dynamic cloud environments where servers can appear and disappear. With sFlow, cloud servers are automatically discovered and continuously monitored as soon as they are created. The sFlow messages act as a server heartbeat, providing rapid notification when a server is deleted and stops sending sFlow.

Finally, sFlow provides the detailed, real-time, visibility into network, server and application performance needed to manage performance and control costs. For anyone interested in more information on sFlow, the sFlow presentation provides a strategic view of the role that sFlow monitoring plays in converged, virtualized and cloud environments.

Wednesday, January 26, 2011

Proxmox

Proxmox VE is an open source, bare metal, virtualization platform supporting KVM and OpenVZ.

This article describes how to install and configure sFlow monitoring on a Proxmox VE server using the open source Host sFlow agent.

First install the development tools needed to compile the Host sFlow agent

apt-get update
apt-get install gcc make

Next, download the Host sFlow agent sources and compile and install the agent using the following commands:

tar -xvzf hsflowd-X.XX.tar.gz
cd hsflowd-X.XX
make
make install

The next steps involve configuring sFlow monitoring on the server. In this example, we will configure the sFlow agent to sample packets at 1-in-400, poll counters every 20 seconds and send sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343).

The following commands configure ULOG monitoring of traffic:

iptables -A INPUT -m statistic --mode random --probability 0.0025 -j ULOG --ulog-nlgroup 5
iptables -A FORWARD -m statistic --mode random --probability 0.0025 -j ULOG --ulog-nlgroup 5
iptables -A OUTPUT -m statistic --mode random --probability 0.0025 -j ULOG --ulog-nlgroup 5

Note: These commands assume that there are no other firewall rules. For a more detailed description of configuring ULOG monitoring, see ULOG. Instrumenting the FORWARD rule set is important since Proxmox VE uses routing to connect venet interfaces on OpenVZ appliances and the FORWARD rule set is used to monitor the routed traffic.

Now edit the Host sFlow configuration file, /etc/hsflowd.conf:

sflow{
  DNSSD = off
  polling = 20
  sampling = 400
  collector{
    ip = 10.0.0.50
    udpport = 6343
  }
  ulogGroup = 5
  ulogProbability = 0.0025
}

Note: It is essential that the Host sFlow ulogGroup setting and the iptables --ulog-nlgroup settings match (in this case they are all set to 5). It is also essential that the Host sFlow ulogProbability and the iptables --probability settings match (in this case they are all 0.0025, i.e. 1-in-400).

In this case, we are using manual configuration of the Host sFlow agent. In a case where large numbers of servers need to be configured, automatic DNS-SD configuration should be considered.

Finally, start the sFlow agent:

/etc/init.d/hsflowd start

An sFlow analyzer is needed to receive the sFlow data and report on performance (see Choosing an sFlow analyzer). The free sFlowTrend analyzer is a great way to get started, see sFlowTrend adds server performance monitoring to see examples.

Note: There are currently some limitations with sFlow on Proxmox VE. Proxmox VE does not currently support libvirt, so the Host sFlow agent is unable to report on virtual machine statistics. In addition, the default virtual switch does not include sFlow support, so detailed visibility into per virtual interface counters and traffic flows is not available. However, the Open vSwitch replaces the default Linux bridge and provides sFlow support along with other advanced features. Open vSwitch is now included on Xen Cloud Platform and XenServer and could be an option for Proxmox VE in the future.

Tuesday, January 25, 2011

Membase

Membase is a scale-out NoSQL persistent data store built on Memcached. Membase simplifies typical Memcached deployments by combining the memory cache and database servers into a single cluster of Membase servers.

Monitoring the performance of a Membase cluster provides an interesting demonstration of the value of the integrated monitoring of network, system and application performance that is possible using sFlow instrumentation embedded in each of the layers.

The following diagram demonstrates how sFlow can be used to fully instrument Membase servers:

The memcache protocol is used to access Membase and there is a Memcached server running at the core of each Membase server. The article, Memcached, describes how sFlow can be included in a Memcached server and the articles, Memcached hot keys and Memcached missed keys, describe some applications. Integrating sFlow in Memcached server included in Membase provides detailed visibility into memcache performance counters and operations.

Installing the Host sFlow agent on the servers within the Membase cluster provides visibility into server performance, see Cluster performance and Top servers. Finally, sFlow provides visibility into cluster-wide network performance, see Hybrid server monitoring and ULOG.

The measurements from applications, servers and network combine to provided an integrated view of performance (see sFlow Host Structures).

The following diagram shows a simple experimental setup constructed to explore performance:

A cluster of three Membase servers, 10.0.0.152, 10.0.0.153 and 10.0.0.156 provides scale-out storage to an Apache web server, 10.0.0.150. Client requests to the web server invoke application code in the web server that makes memcache requests to Membase server 10.0.0.152 which acts as a proxy, retrieving data from members of the cluster. In this setup Membase has been configured with a replication factor of 2 (each item stored on a server is replicated to the other two servers).

Note: The Apache web server has also been instrumented with sFlow, providing visibility into web requests (see HTTP), server performance and network activity.

The Membase web console shows the three members of the cluster:

Looking at network traffic using the free sFlowTrend analyzer clearly shows the relationships between the elements. The following Circles chart shows three clusters of machines, three Membase servers, one web server and four clients.

In the circles view, each cluster of servers is represented as a circle of dots, each dot representing a server. The line thicknesses correspond to data rates and the colors represent the different protocols. The thin yellow line (TCP port 80) is web traffic from the clients to the web server. The thick red line (TCP port 11211) is memcache traffic between the web server and the Membase cluster. Orange lines (TCP port 11210) within the Membase cluster are memcache replication and proxy traffic between the members of the cluster.

Note: The chart is an example of using sFlow for Application mapping.

One of the features of Membase is the ability to easily add and remove servers from the operational cluster. The server 10.0.0.158 was removed from the cluster and then added back a few minutes later. The following chart shows a spike in traffic as the server was removed and a much larger spike when the server was added back.

The web, memcache and proxy traffic associated with each server is clearly visible in the chart. The essential functions of the Membase cluster are critically dependent on network bandwidth and latency and managing network capacity is essential to ensure performance of the Membase cluster.

Note: Membase is an example of a broad class of scale-out applications which all share a dependency on the network for inter-cluster communication and scalability. Other examples discussed on this blog include: scale-out compute and storage clusters and virtualization pools with virtual machine migration.

The following chart from the Membase web console shows the throughput of the cluster during the interval of the test:

The graph shows a surprising 40% increase in performance when the server was removed and even more surprisingly the graph shows the increased performance persisting after the server was added back to the cluster.

Looking at performance data from the servers should give a clue as to the source of the non-linear performance characteristics. The following chart shows data from the Host sFlow agent installed on one of the Membase servers.

The CPU load jumped from 45% to 100% at the time the server was removed from the cluster. While there were spikes in User and System loads when the server was removed and added to the cluster, there is a persistent increase in I/O Wait load.

The following chart shows that the CPU load on the web server also experienced an increase in I/O Wait load.

In this experiment, the servers are all virtual machines hosted on a single server. The following chart shows performance data from the Host sFlow agent installed on the hypervisor:

Again, there is a jump in I/O Wait load corresponding to the increase in cluster throughput.

In a virtualized environment, network and CPU loads are closely related since networking is handled by a software virtual switch running on the server. Virtual machines and network loads compete for the same physical CPUs, creating a non-linear feedback loop. This step change in performance appears to be what is referred to as a "phase change".

The article, On Traffic Phase Effects in Packet-Switched Gateways, describes phase changes that occur in physical networks. What appears to be happening here is that the spike in network load caused by the cluster reconfiguration created larger batches of network traffic for the virtual switch to handle. As the larger batches of packets pass through the switch, they provide more work for each virtual machine during its next CPU time slot, which in turn creates a larger batch of packets for the virtual switch, sustaining the cycle. The result is more efficient use of resources and an overall increase in throughput.

While it is unlikely that this specific behavior will occur in production Membase installations, it does demonstrate the complex relationship between applications, systems and networking. There are many examples of large scale performance problems caused by non-linear behavior in multi-layer, scale-out application architectures (e.g. Gmail outage, Facebook outage, Cache miss storm/stampede etc.).

The sFlow standard provides scalable monitoring of all the application, storage, server and network elements in the data center, both physical and virtual. Implementing an sFlow monitoring solution helps break down management silos, ensuring the coordination of resources needed to manage converged infrastructures, optimize performance and avoid service failures.

Tuesday, January 18, 2011

Presentation

The way in which convergence, virtualization and cloud computing are transforming the data center; and the vital role that the sFlow standard plays in managing performance and reducing costs has been a theme on this blog over the last year.

This 12 minute video draws the threads together in a single presentation and provides a high level overview of the topics. For more information, browse the data center articles on this blog or visit sFlow.org.

Friday, January 7, 2011

HTTP

The mod-sflow project is an open source implementation of sFlow monitoring for the Apache web server. The module exports the counter and transaction structures discussed in sFlow for HTTP.

The advantage of using sFlow is the scalability it offers for monitoring the performance of large web server clusters or load balancers where request rates are high and conventional logging solutions generate too much data or impose excessive overhead. Real-time monitoring of HTTP provides essential visibility into the performance of large-scale, complex, multi-layer services constructed using Representational State Transfer (REST) architectures. In addition, monitoring HTTP services using sFlow is part of an overall performance monitoring solution that provides real-time visibility into applications, servers and switches (see sFlow Host Structures).

The mod-sflow software is designed to integrate with the Host sFlow agent to provide a complete picture of server performance. Download, install and configure Host sFlow before proceeding to install mod-sflow - see Installing Host sFlow on a Linux Server. There are a number of options for analyzing cluster performance using Host sFlow, including Ganglia and sFlowTrend.

Note: mod-sflow picks up its configuration from the Host sFlow agent. The Host sFlow sampling.http setting can be used to override the default sampling setting to set a specific sampling rate for HTTP requests.

Next, download the mod-sflow sources from https://github.com/sflow/mod-sflow. The following commands compile and install mod-sflow:

tar -xvzf mod-sflow-XXX.tar.gz
cd mod-sflow-XXX
apxs -c -i -a mod_sflow.c sflow_api.c
service httpd restart

Once installed, mod-sflow will stream measurements to a central sFlow Analyzer. Currently the only software that can decode HTTP sFlow is sflowtool. Download, compile and install the latest sflowtool sources on the system your are using to receive sFlow from the servers in the Apache cluster.

Running sflowtool will display output of the form:

[pp@test]$ /usr/local/bin/sflowtool
startDatagram =================================
datagramSourceIP 10.0.0.111
datagramSize 116
unixSecondsUTC 1294273499
datagramVersion 5
agentSubId 6486
agent 10.0.0.150
packetSequenceNo 6
sysUpTime 44000
samplesInPacket 1
startSample ----------------------
sampleType_tag 0:2
sampleType COUNTERSSAMPLE
sampleSequenceNo 6
sourceId 3:65537
counterBlock_tag 0:2201
http_method_option_count 0
http_method_get_count 247
http_method_head_count 0
http_method_post_count 2
http_method_put_count 0
http_method_delete_count 0
http_method_trace_count 0
http_methd_connect_count 0
http_method_other_count 0
http_status_1XX_count 0
http_status_2XX_count 214
http_status_3XX_count 35
http_status_4XX_count 0
http_status_5XX_count 0
http_status_other_count 0
endSample   ----------------------
startSample ----------------------
sampleType_tag 0:1
sampleType FLOWSAMPLE
sampleSequenceNo 3434
sourceId 3:65537
meanSkipCount 2
samplePool 7082
dropEvents 0
inputPort 0
outputPort 1073741823
flowBlock_tag 0:2100
extendedType socket4
socket4_ip_protocol 6
socket4_local_ip 10.0.0.150
socket4_remote_ip 10.1.1.63
socket4_local_port 80
socket4_remote_port 61401
flowBlock_tag 0:2201
flowSampleType http
http_method 2
http_protocol 1001
http_uri /favicon.ico
http_host 10.0.0.150
http_referrer http://10.0.0.150/membase.php
http_useragent Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-us) AppleW
http_bytes 284
http_duration_uS 335
http_status 404
endSample   ----------------------
endDatagram   =================================

The -H option causes sflowtool to output the HTTP request samples using the combined log format:

[pp@test]$ /usr/local/bin/sflowtool -H
10.1.1.63 - - [05/Jan/2011:22:39:50 -0800] "GET /membase.php HTTP/1.1" 200 3494 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-us) AppleW"
10.1.1.63 - - [05/Jan/2011:22:39:50 -0800] "GET /favicon.ico HTTP/1.1" 404 284 "http://10.0.0.150/membase.php" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-us) AppleW"

Converting sFlow to combined logfile format allows existing log analyzers to be used to analyze the sFlow data. For example, the following commands use sflowtool and webalizer to create reports:

/usr/local/bin/sflowtool -H | rotatelogs log/http_log &
webalizer -o report log/*

The resulting webalizer report shows top URLs:

Note: The log analyzer reports are useful for identifying top URLs, clients etc. However, values will need to be scaled up by the sampling rate - see Packet Sampling Basics to see how to properly scale sFlow data.

The mod-sflow performance counters can be retrieved using HTTP in addition to being exported by sFlow. To enable web access to the counters, create the following /etc/httpd/conf.d/sflow.conf file and restart httpd:

<IfModule mod_sflow.c>
  <Location /sflow>
    SetHandler sflow
  </Location>
</IfModule>

Note: Use Apache Allow/Deny directives to limit access to the counter page.

The counters are now accessible using the URL, http://<server>/sflow

Web access to the counters makes them accessible to a wide variety of performance monitoring tools. For example, the following Perl script allows Cacti to make web requests to mod-sflow and trend HTTP counters:

#!/usr/bin/perl

use LWP::UserAgent;

if($#ARGV == -1) { exit 1; }

my $host = $ARGV[0];
my $ua = new LWP::UserAgent;
my $req = new HTTP::Request GET => "http://$host/sflow";
my $res = $ua->request($req);
if(!$res->is_success) { exit 1; }

my %count = ();
foreach $line (split /\n/, $res->content) {
  my @toks = split(' ', $line);
  $count{$toks[1]} = $toks[2];
}

print "option:$count{'method_option_count'} get:$count{'method_get_count'} head:
$count{'method_head_count'} post:$count{'method_post_count'} put:$count{'method_
put_count'} delete:$count{'method_delete_count'} trace:$count{'method_trace_coun
t'} connect:$count{'method_connect_count'} other:$count{'method_other_count'}";

The article Simplest Method of Going from Script to Graph (Walkthrough) provides instructions for installing the script and configuring Cacti charting. The following screen capture shows Cacti trend charts of the HTTP performance counters:

Finally, the real potential of HTTP sFlow is as part of a broader performance management system providing real-time visibility into applications, servers, storage and networking across the entire data center.

For example, the diagram above shows typical elements in a Web 2.0 data center (e.g. Facebook, Twitter, Wikipedia, Youtube, etc.). A cluster of web servers handles requests from users. Typically, the application logic for the web site will run on the web servers in the form of server side scripts (PHP, Ruby, ASP etc). The web applications access the database to retrieve and update user data. However, the database can quickly become a bottleneck, so a cache is used to store the results of database queries. The combination of sFlow from all the web servers, Memcached servers and network switches provides end-to-end visibility into performance that scales to handle even the largest data center.