Wednesday, December 28, 2011

Using Ganglia to monitor web farms

The Ganglia charts show HTTP performance metrics collected using sFlow. Enabling sFlow monitoring in web servers provides a highly scalable solution for monitoring the performance of large web farms. Embedded sFlow monitoring simplifies deployments by eliminating the need to poll for metrics or tail log files. Instead, metrics are pushed directly from each web server to the central Ganglia collector. Currently, there are implementation of sFlow for Apache, NGINX, Tomcat and node.js web servers, see

The article, Ganglia 3.2 released, describes the basic steps needed to configure Ganglia as an sFlow collector. Once configured, Ganglia will automatically discover and track new web servers as they are added to the network.

Note: To try out Ganglia's sFlow/HTTP reporting, you will need to download Ganglia 3.3.

By default, Ganglia will automatically start displaying the HTTP metrics. However, there are two optional configuration settings available in the gmond.conf file that can be used to modify how Ganglia handles the sFlow HTTP metrics.

  accept_http_metrics = yes
  multiple_http_instances = no

Setting the accept_http_metrics flag to no will cause Ganglia to ignore sFlow HTTP metrics.

The multiple_http_instances setting must be set to yes in cases where there are multiple HTTP instances running on each server in the cluster. Charts associated with each HTTP instance are identified by the server port included in the title of its charts. For example, the following chart is reporting on the web server listening on port 8080 on host

Ganglia and sFlow provide a comprehensive view of the performance of a cluster of web servers, providing not just HTTP related metrics, but also the server CPU, memory, disk and network IO performance metrics needed to fully characterize cluster performance.

Note: An HTTP sFlow agent does more than simply export performance counters, it also exports detailed transaction data that can be used to monitor top URLs, top Referers, top clients, response times etc. The transaction data complements the counter data displayed in Ganglia, helping to identify the root cause of problems. For example, Ganglia was showing a sudden increase in HTTP requests and an examination of the transactions demonstrated that the increase was a denial of service attack, identifying the targeted URL and the list of attacker IP addresses.

No comments:

Post a Comment