Friday, December 30, 2011

Using Ganglia to monitor Memcache clusters


The Ganglia charts show Memcache performance metrics collected using sFlow. Enabling sFlow monitoring in Memcache servers provides a highly scalable solution for monitoring the performance of large Memcache clusters. Embedded sFlow monitoring simplifies deployments by eliminating the need to poll for metrics. Instead, metrics are pushed directly from each Memcache server to the central Ganglia collector. Currently, there is an implementation of sFlow for Memcached, see http://host-sflow.sourceforge.net/relatedlinks.php.

The article, Ganglia 3.2 released, describes the basic steps needed to configure Ganglia as an sFlow collector. Once configured, Ganglia will automatically discover and track new Memcache servers as they are added to the network.

Note: To try out Ganglia's sFlow/Memcache reporting, you will need to download Ganglia 3.3.

By default, Ganglia will automatically start displaying the Memcache metrics. However, there are two optional configuration settings available in the gmond.conf file that can be used to modify how Ganglia handles the sFlow Memcache metrics.

sflow{
  accept_memcache_metrics = no
  multiple_memcache_instances = no
}

Setting the accept_memcache_metrics flag to no will cause Ganglia to ignore sFlow Memcache metrics.

The multiple_memcache_instances setting must be set to yes in cases where there are multiple Memcache instances running on each server in the cluster. Each Memcache instance will be identified by the server port included in the title of the charts. For example, the following chart is reporting on the Memcache server listening on port 11211 on host ganglia:


Ganglia and sFlow offers a comprehensive view of the performance of a cluster of Memcache servers, providing not just Memcache related metrics, but also the server CPU, memory, disk and network IO performance metrics needed to fully characterize cluster performance.

Note: A Memcache sFlow agent does more than simply export performance counters, it also exports detailed data on Memcache operations that can be used to monitor hot keys, missed keys, top clients etc. The operation data complements the counter data displayed in Ganglia, helping to identify the root cause of problems. For example, Ganglia was showing that the Memcache miss rate was high and an examination of the transactions identified a mistyped key in the application code as the root cause. In addition, Memcache performance is critically dependent on network latency and packet loss - here again, sFlow provides the necessary visibility since most switch vendors already include support for the sFlow standard.

Thursday, December 29, 2011

Using Ganglia to monitor Java virtual machines


The Ganglia charts show the standard sFlow Java virtual machine metrics. The combination of Ganglia and sFlow provides a highly scalable solution for monitoring the performance of clustered Java application servers. The sFlow Java agent for stand-along Java services, or Tomcat sFlow for web-based servlets, simplify deployments by eliminating the need to poll for metrics using a Java JMX client. Instead, metrics are pushed directly from each Java virtual machine to the central Ganglia collector.

Note: The Tomcat sFlow agent also allows Ganglia to report HTTP performance metrics.

The article, Ganglia 3.2 released, describes the basic steps needed to configure Ganglia as an sFlow collector. Once configured, Ganglia will automatically discover and track new servers as they are added to the network. The articles, Java virtual machine and Tomcat, describes the steps needed to instrument existing Java applications and Apache Tomcat servlet engines respectively. In both cases the sFlow agent is included when starting the Java virtual machine and requires minimal configuration and no change to the application code.

Note: To try out Ganglia's sFlow/Java reporting, you will need to download Ganglia 3.3.

By default, Ganglia will automatically start displaying the Java virtual machine metrics. However, there are two optional configuration settings available in the gmond.conf file that can be used to modify how Ganglia handles the sFlow Java metrics.

sflow{
  accept_jvm_metrics = yes
  multiple_jvm_instances = no
}

Setting the accept_jvm_metrics flag to no will cause Ganglia to ignore Java virtual machine metrics.

The multiple_jvm_instances setting must be set to yes in cases where there are multiple Java virtual machine instances running on each server in the cluster. Charts associated with each java virtual machine instance will be identified by a unique "hostname" included in the title of its charts. For example, the following chart is identified as being associated with the apache-tomcat java virtual machine on host xenvm4.sf.inmon.com:


Ganglia and sFlow offers a comprehensive view of the performance of a cluster of Java servers, providing not just Java related metrics, but also the server CPU, memory, disk and network IO performance metrics needed to fully characterize cluster performance.

Wednesday, December 28, 2011

Using Ganglia to monitor web farms


The Ganglia charts show HTTP performance metrics collected using sFlow. Enabling sFlow monitoring in web servers provides a highly scalable solution for monitoring the performance of large web farms. Embedded sFlow monitoring simplifies deployments by eliminating the need to poll for metrics or tail log files. Instead, metrics are pushed directly from each web server to the central Ganglia collector. Currently, there are implementation of sFlow for Apache, NGINX, Tomcat and node.js web servers, see http://www.sflow.net/relatedlinks.php.

The article, Ganglia 3.2 released, describes the basic steps needed to configure Ganglia as an sFlow collector. Once configured, Ganglia will automatically discover and track new web servers as they are added to the network.

Note: To try out Ganglia's sFlow/HTTP reporting, you will need to download Ganglia 3.3.

By default, Ganglia will automatically start displaying the HTTP metrics. However, there are two optional configuration settings available in the gmond.conf file that can be used to modify how Ganglia handles the sFlow HTTP metrics.

sflow{
  accept_http_metrics = yes
  multiple_http_instances = no
}

Setting the accept_http_metrics flag to no will cause Ganglia to ignore sFlow HTTP metrics.

The multiple_http_instances setting must be set to yes in cases where there are multiple HTTP instances running on each server in the cluster. Charts associated with each HTTP instance are identified by the server port included in the title of its charts. For example, the following chart is reporting on the web server listening on port 8080 on host xenvm4.sf.inmon.com:


Ganglia and sFlow provide a comprehensive view of the performance of a cluster of web servers, providing not just HTTP related metrics, but also the server CPU, memory, disk and network IO performance metrics needed to fully characterize cluster performance.

Note: An HTTP sFlow agent does more than simply export performance counters, it also exports detailed transaction data that can be used to monitor top URLs, top Referers, top clients, response times etc. The transaction data complements the counter data displayed in Ganglia, helping to identify the root cause of problems. For example, Ganglia was showing a sudden increase in HTTP requests and an examination of the transactions demonstrated that the increase was a denial of service attack, identifying the targeted URL and the list of attacker IP addresses.

Thursday, December 22, 2011

Merchant silicon


The following chart, from Commoditization of Ethernet Switches: How Value is Flowing into Silicon, shows the rapidly increasing market share of network switches based on Broadcom, Marvell and Intel (Fulcrum)  chipsets (often referred to as "merchant silicon") as switch vendors move from proprietary ASICs to off-the-shelf designs.
Off-the-shelf vs. Internal Silicon Design
As an example, many vendors now base their 10 Gigabit top of rack switches on Broadcom chipsets. Often vendors don't disclose when they are using merchant silicon, however, based on news reports, similarities in specifications and rumors, the following switches appear to use similar Broadcom chipsets: IBM BNT RackSwitch G8264, Juniper QFX3500, Cisco Nexus 3064, Arista 7050S-64, HP 5900-AF, Alcatel-Lucent Omniswitch 6900 and Dell Force10 S4810.

In addition to reducing costs, the move to merchant silicon helps increase multi-vendor interoperability and support for standards. For example, the sFlow standard is widely implemented in merchant chipsets and the adoption of merchant silicon for 10 Gigabit top of rack switches has greatly increased the presence of sFlow in data centers. The Network World article, OpenFlow, Merchant Silicon, and the Future of Networking, suggests that the rising popularity of merchant silicon is also helping to drive adoption of the OpenFlow standard.

Together, the sFlow and OpenFlow standards transform data center networking by providing the integrated visibility and control needed to adapt to changing workloads in converged, virtualized and cloud environments.

Thursday, December 8, 2011

Routing Open vSwitch into the mainline

The December 1st issue of LWN.net kernel development news includes the article Routing Open vSwitch into the mainline describing the Open vSwitch and reporting that "Open vSwitch was pulled into the networking tree on December 3; expect it in the 3.3 kernel."

This is exciting news! The inclusion of Open vSwitch support in the mainline Linux kernel integrates the advanced network visibility and control capabilities (through support of sFlow and OpenFlow) needed for virtualizing networking in cloud environments. Open vSwitch is already the default switch in the Xen Cloud Platform (XCP) and Citrix XenServer 6 and inclusion within the Linux kernel will help to further unify networking across open source virtualization systems, including: XenKVM, Proxmox VE and VirtualBox. In addition, integrated sFlow and OpenFlow support has also been demonstrated for the upcoming Windows 8 version of Microsoft's Hyper-V virtualization platform and sFlow and OpenFlow are also widely supported by network equipment vendors.

Broad support for open standards like sFlow and OpenFlow is critical, integrating the visibility and control capabilities within physical and virtual network elements that allows orchestration systems such as OpenStack, openQRM, and OpenNebula to automate and optimize management of network and server resources in cloud data centers.

Monday, December 5, 2011

sflowtool


The sflowtool command line utility is used to convert standard sFlow records into a variety of different formats. While there are a large number of native sFlow analysis applications, familiarity with sflowtool is worthwhile since it allows a wide variety of additional tools to analyze sFlow data as well as opening up the data to custom scripting.

First download, compile and install sflowtool using the following commands:

[root@xenvm4 ~]# wget http://www.inmon.com/bin/sflowtool-3.22.tar.gz
[root@xenvm4 ~]# tar -xvzf sflowtool-3.22.tar.gz
[root@xenvm4 ~]# cd sflowtool-3.22
[root@xenvm4 sflowtool-3.22]# ./configure
[root@xenvm4 sflowtool-3.22]# make
[root@xenvm4 sflowtool-3.22]# make install

Update 14 August 2015: Download the latest version of sflowtool from GitHub, https://github.com/sflow/sflowtool/archive/master.zip

The default behavior of sflowtool is to convert sFlow into ASCII text:

[root@xenvm4 ~]# sflowtool
startDatagram =================================
datagramSourceIP 10.0.0.111
datagramSize 144
unixSecondsUTC 1321922602
datagramVersion 5
agentSubId 0
agent 10.0.0.20
packetSequenceNo 3535127
sysUpTime 270660704
samplesInPacket 1
startSample ----------------------
sampleType_tag 0:2
sampleType COUNTERSSAMPLE
sampleSequenceNo 228282
sourceId 0:14
counterBlock_tag 0:1
ifIndex 14
networkType 6
ifSpeed 100000000
ifDirection 0
ifStatus 3
ifInOctets 4839078
ifInUcastPkts 15205
ifInMulticastPkts 0
ifInBroadcastPkts 4294967295
ifInDiscards 0
ifInErrors 0
ifInUnknownProtos 4294967295
ifOutOctets 149581962744
ifOutUcastPkts 158884229
ifOutMulticastPkts 4294967295
ifOutBroadcastPkts 4294967295
ifOutDiscards 101
ifOutErrors 0
ifPromiscuousMode 0
endSample   ----------------------
endDatagram   =================================

The text output of flowtool is easily processed using scripts. The following example provides a basic skeleton for processing the output of sflowtool in Perl:

#!/usr/bin/perl -w
use strict;
use POSIX;

open(PS, "/usr/local/bin/sflowtool|") || die "Failed: $!\n";
while( <PS> ) {  
  my ($attr,$value) = split;
 
  # process attribute  
}

close(PS);

Examples of scripts using sflowtool on this blog include Memcached hot keys and Memcached missed keys. Other examples include converting sFlow for Graphite and RRDtool.

The sFlow standard extends to application layer monitoring, including visibility into HTTP performance. Implementations of sFlow for popular web servers, including Apache, NGINX, Tomcat and node.js offer real-time visibility into large web farms.

The -H option causes sflowtool to output the HTTP request samples using the combined log format, making the data accessible to most log analyzers.

[root@xenvm4 ~]# sflowtool -H
10.0.0.70 - - [22/Nov/2011:12:36:32 -0800] "GET http://sflow.org/images/h-photo.jpg HTTP/1.1" 304 0 "http://sflow.org/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.120 Safari/535.2"
10.0.0.70 - - [22/Nov/2011:12:36:32 -0800] "GET http://sflow.org/inc/nav.js HTTP/1.1" 304 0 "http://sflow.org/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.120 Safari/535.2"
10.0.0.70 - - [22/Nov/2011:12:36:32 -0800] "GET http://sflow.org/images/participant-foundry.gif HTTP/1.1" 304 0 "http://sflow.org/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.120 Safari/535.2"

For example, the following commands use sflowtool and webalizer to create reports:

/usr/local/bin/sflowtool -H | rotatelogs log/http_log &
webalizer -o report log/*

The resulting webalizer report shows top URLs:


The sFlow standard operates by randomly sampling packet headers. The sflowtool -t option allows sFlow to be used for remote packet capture, converting packet header information from sFlow to standard pcap format that can be used with packet analysis applications.

The following example uses sflowtool and tcpdump to display a packet trace:

[root@xenvm4 ~]# sflowtool -t | tcpdump -r - -vv
reading from file -, link-type EN10MB (Ethernet)
10:30:01.000000 arp who-has 10.0.0.66 tell 10.0.0.220
10:30:07.000000 IP (tos 0x0, ttl  64, id 49952, offset 0, flags [DF], proto: TCP (6), length: 1500) xenserver1.sf.inmon.com.39120 > openfiler.sf.inmon.com.iscsi-target: . 2757963136:2757964584(1448) ack 4136690254 win 3050 
10:30:07.000000 IP (tos 0x0, ttl  64, id 49953, offset 0, flags [DF], proto: TCP (6), length: 1500) xenserver1.sf.inmon.com.39120 > openfiler.sf.inmon.com.iscsi-target: . 1448:2896(1448) ack 1 win 3050 
10:30:07.000000 IP (tos 0x0, ttl  64, id 49954, offset 0, flags [DF], proto: TCP (6), length: 1500) xenserver1.sf.inmon.com.39120 > openfiler.sf.inmon.com.iscsi-target: . 2896:4344(1448) ack 1 win 3050

The Wireshark article describes how to use sflowtool and Wireshark to graphically display packet information.


sflowtool can also be used to convert sFlow to NetFlow version 5. The following command converts sFlow records into NetFlow records and sends them to UDP port 9991 on netflow.inmon.com:

[root@xenvm4 ~]# sflowtool -c netflow.inmon.com -d 9991

Converting sFlow to NetFlow provides compatibility with NetFlow analyzers.  However, converting sFlow to NetFlow results in a significant loss of information and it is better to use a native sFlow analyzer to get the full value of sFlow. In many cases traffic analysis software supports both sFlow and NetFlow, so conversion is unnecessary.

Finally, sFlow provides information on network, server, virtual machine and application performance and the sflowtool source code offers developers a useful starting point for adding sFlow support to network, server and application performance monitoring software - see Developer resources for additional information.