Thursday, December 29, 2011

Using Ganglia to monitor Java virtual machines

The Ganglia charts show the standard sFlow Java virtual machine metrics. The combination of Ganglia and sFlow provides a highly scalable solution for monitoring the performance of clustered Java application servers. The sFlow Java agent for stand-along Java services, or Tomcat sFlow for web-based servlets, simplify deployments by eliminating the need to poll for metrics using a Java JMX client. Instead, metrics are pushed directly from each Java virtual machine to the central Ganglia collector.

Note: The Tomcat sFlow agent also allows Ganglia to report HTTP performance metrics.

The article, Ganglia 3.2 released, describes the basic steps needed to configure Ganglia as an sFlow collector. Once configured, Ganglia will automatically discover and track new servers as they are added to the network. The articles, Java virtual machine and Tomcat, describes the steps needed to instrument existing Java applications and Apache Tomcat servlet engines respectively. In both cases the sFlow agent is included when starting the Java virtual machine and requires minimal configuration and no change to the application code.

Note: To try out Ganglia's sFlow/Java reporting, you will need to download Ganglia 3.3.

By default, Ganglia will automatically start displaying the Java virtual machine metrics. However, there are two optional configuration settings available in the gmond.conf file that can be used to modify how Ganglia handles the sFlow Java metrics.

  accept_jvm_metrics = yes
  multiple_jvm_instances = no

Setting the accept_jvm_metrics flag to no will cause Ganglia to ignore Java virtual machine metrics.

The multiple_jvm_instances setting must be set to yes in cases where there are multiple Java virtual machine instances running on each server in the cluster. Charts associated with each java virtual machine instance will be identified by a unique "hostname" included in the title of its charts. For example, the following chart is identified as being associated with the apache-tomcat java virtual machine on host

Ganglia and sFlow offers a comprehensive view of the performance of a cluster of Java servers, providing not just Java related metrics, but also the server CPU, memory, disk and network IO performance metrics needed to fully characterize cluster performance.

1 comment:

  1. Hello,
    Great tool, I just had a quick question. I use jmx-sflow-agents to monitor JVM data in a hadoop cluster. Basically, jmx-sflow-agents send data to the host sflow on the node which sends data to a gmond server. I then monitor this with g-web, which works great. Since there is multiple JVMs per node, I have multiple_jvm_instances set to "yes". This is working, and when I go to view graphs, I can see JVM data in the form of xxxxx@yyyyy.zzzz, such as 11233@hadoop.test.

    I was wondering, is there any way to see aggregate JVM data like you can CPU or physical memory data? For example, if you go to view idle cpu graphs, there is an entry on the graph for each sFlow host. Can we do the same thing for JVM metrics for each JVM monitored? If not, how is jmx-sflow-agent useful for monitoring hadoop clusters other than just the namenode, resourcemanager, etc...? Thanks for the feedback! Great tool!