Monday, July 11, 2011

Ganglia and cloud performance

The Ganglia 3.2 release includes support for collecting and displaying server performance metrics sent using the sFlow standard. Ganglia's focus has traditionally been to monitor clusters and grids, however, the scalability and automatic discovery capabilities of Ganglia also make it highly suited to monitoring pools of virtual machines.

Visibility in the cloud discusses the different challenges managing virtual machines hosted within a public cloud and management of the cloud infrastructure. The article, Rackspace cloudservers, shows how Ganglia and sFlow can be used to monitor the performance of virtual machines hosted in a public cloud. This article examines how Ganglia and sFlow can be used by service providers and private cloud operators to monitor the performance of the cloud infrastructure.

Currently sFlow agents are available for XCP (Xen Cloud Platform), Citrix XenServer and KVM/libvirt virtualization platforms. When monitoring a hypervisor using sFlow, Ganglia will display the following hypervisor specific metrics in addition to the familiar CPU, memory, disk and network statistics:


The Domain Count trends the number of virtual machines running on the server. The Hypervisor Free Memory chart shows how much free memory is available to run additional virtual machines.

In addition, sFlow also reports basic CPU, Memory, Disk I/O and Network I/O for every virtual machine running on the hypervisor without the need to install agents on the virtual machines. However, these additional statistics are currently discarded by default since the Ganglia user interface expects every server to report a common set of metrics and the data available from virtual machines is limited, resulting in missing charts. In addition, sFlow uniquely identifies virtual machines by their UUID (Universally Unique Identifier) but Ganglia currently expects hosts to be identified by IP addresses and hostnames (which may not be known for virtual machines).

Ganglia 3.2 provides an experimental override, allowing the additional per virtual machine performance metrics to be collected. The following entries in the Ganglia gmond configuration file (/etc/gmond.conf) configures sFlow monitoring and enables the additional per virtual machine performance metrics:

globals {
/* Listen, but don't send metrics */
  mute = yes
  deaf = no
  ...

/* sFlow channel */
udp_recv_channel {
  port = 6343
}

/* Enable virtual machine statistics */
sflow {
  accept_vm_metrics = yes
}

Once enabled, each virtual machine will appear as a member of the cluster. Selecting a virtual machine displays its metrics:


Note: The virtual machine metrics reported by sFlow are consistent with libvirt.

Ganglia has great potential for monitoring virtual machine pools. The experimental support for virtual machine monitoring in Ganglia 3.2 provides a starting point, laying the foundation for further development.

7 comments:

  1. I d like to know if that is also possible for LXC.

    ReplyDelete
    Replies
    1. The Host sFlow agent supports libvirt as a method for retrieving metrics. The libvirt project claims support for LXC, so you might want to try installing libvirt on your server and then compiling Host sFlow from sources. It should link to the libvirt library and export per container metrics. If you run into difficulties, please send questions to the Host sFlow mailing list.

      There is no change needed to Ganglia - if the Host sFlow agents send the LXC metrics, they will automatically show up in charts.

      Delete
    2. I tried this out, but i didn't get any metrics at all. are you sure that is all i need or there is an additional configuration ?

      Delete
    3. You can verify that the data is being generated using sflowtool to display the sFlow contents. You should also check that there are no firewalls between the LXC server and the system running gmond that would block UDP port 6343 (sFlow).

      Delete
    4. I mean no data about containers, the host metrics are there.

      Delete
    5. Did you install libvirt? Can you see the metrics in virsh?

      Delete
    6. sorry i was working on something else this month, thanks for your reply, i think that libvirt lxc-driver doesn't export these metrics yet so host sflow won't report them to ganglia.

      Delete