sFlow: Using Ganglia to monitor Memcache clusters

Friday, December 30, 2011

Using Ganglia to monitor Memcache clusters

The Ganglia charts show Memcache performance metrics collected using sFlow. Enabling sFlow monitoring in Memcache servers provides a highly scalable solution for monitoring the performance of large Memcache clusters. Embedded sFlow monitoring simplifies deployments by eliminating the need to poll for metrics. Instead, metrics are pushed directly from each Memcache server to the central Ganglia collector. Currently, there is an implementation of sFlow for Memcached, see http://host-sflow.sourceforge.net/relatedlinks.php.

The article, Ganglia 3.2 released, describes the basic steps needed to configure Ganglia as an sFlow collector. Once configured, Ganglia will automatically discover and track new Memcache servers as they are added to the network.

Note: To try out Ganglia's sFlow/Memcache reporting, you will need to download Ganglia 3.3.

By default, Ganglia will automatically start displaying the Memcache metrics. However, there are two optional configuration settings available in the gmond.conf file that can be used to modify how Ganglia handles the sFlow Memcache metrics.

sflow{
  accept_memcache_metrics = no
  multiple_memcache_instances = no
}

Setting the accept_memcache_metrics flag to no will cause Ganglia to ignore sFlow Memcache metrics.

The multiple_memcache_instances setting must be set to yes in cases where there are multiple Memcache instances running on each server in the cluster. Each Memcache instance will be identified by the server port included in the title of the charts. For example, the following chart is reporting on the Memcache server listening on port 11211 on host ganglia:

Ganglia and sFlow offers a comprehensive view of the performance of a cluster of Memcache servers, providing not just Memcache related metrics, but also the server CPU, memory, disk and network IO performance metrics needed to fully characterize cluster performance.

Note: A Memcache sFlow agent does more than simply export performance counters, it also exports detailed data on Memcache operations that can be used to monitor hot keys, missed keys, top clients etc. The operation data complements the counter data displayed in Ganglia, helping to identify the root cause of problems. For example, Ganglia was showing that the Memcache miss rate was high and an examination of the transactions identified a mistyped key in the application code as the root cause. In addition, Memcache performance is critically dependent on network latency and packet loss - here again, sFlow provides the necessary visibility since most switch vendors already include support for the sFlow standard.

7 comments:

Darwin ValenciaSeptember 26, 2012 at 11:35 PM
Been scouring the net for reference and help to my problem with our Ganglia monitoring. So far this is the only reference that is close to what I am looking for.

I have a server with 2 memcache instance running on separate ports 11211 and 11311. I would like to monitor these 2 instance but only managed to get 1 working but not both.

I've already change the multiple_memcache_instances from no to yes in my gmond.conf but still not able to see the 2 instances running.

I tried also copying the memcached.pyconf to another configuration memcache_db.pyconf and change the port value there to 11311. But still no luck.

Thanks for the help.
ReplyDelete
Replies
Darwin ValenciaSeptember 27, 2012 at 2:00 AM
Yes. See transcript below

[XXX ~]$ telnet 11211
Trying 0.0.43.203...
telnet: connect to address 0.0.43.203: Invalid argument
telnet: Unable to connect to remote host: Invalid argument
[darwinv@LSDWFE02 ~]$ telnet localhost 11211
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
stats
STAT pid 9706
STAT uptime 897
STAT time 1348736305
STAT version 1.2.8
STAT pointer_size 64
STAT rusage_user 0.734888
STAT rusage_system 0.863868
STAT curr_items 3779
STAT total_items 4027
STAT bytes 14980199
STAT curr_connections 131
STAT total_connections 208
STAT connection_structures 145
STAT cmd_flush 0
STAT cmd_get 24510
STAT cmd_set 4027
STAT get_hits 20478
STAT get_misses 4032
STAT evictions 0
STAT bytes_read 16301688
STAT bytes_written 710713042
STAT limit_maxbytes 4294967296
STAT threads 2
STAT accepting_conns 1
STAT listen_disabled_num 0
END
quit
Connection closed by foreign host.
[XXX ~]$

[XXX ~]$ telnet localhost 11311
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
stats
STAT pid 9718
STAT uptime 968
STAT time 1348736383
STAT version 1.2.8
STAT pointer_size 64
STAT rusage_user 9.265591
STAT rusage_system 6.967940
STAT curr_items 136856
STAT total_items 152337
STAT bytes 28827795
STAT curr_connections 77
STAT total_connections 191
STAT connection_structures 100
STAT cmd_flush 0
STAT cmd_get 261235
STAT cmd_set 152337
STAT get_hits 124215
STAT get_misses 137020
STAT evictions 0
STAT bytes_read 32631056
STAT bytes_written 38919484
STAT limit_maxbytes 1073741824
STAT threads 2
STAT accepting_conns 1
STAT listen_disabled_num 0
END
quit
Connection closed by foreign host.
[XXX ~]$
ReplyDelete
Replies
PeterSeptember 27, 2012 at 8:50 PM
Are you running the latest sFlow build (1.4.13) of Memcached, https://github.com/sflow/memcached?

What version of Ganglia are you using? What version of Host sFlow?
ReplyDelete
Replies
eVanOctober 19, 2012 at 12:52 PM
I'm trying to use sFlow(jmx-agent 0.6.1)-Ganglia(3.5.0, source build) pair for jvm monitoring.
gmond.conf
udp_recv_channel {
port = 6343
}

sflow {
accept_vm_metrics = yes
}

First of all, is there any chance to see the logs , except "-d" mode ??

gmond -m display no "vm_*" specific metrics !!! I thought that those metrics would be injected after the sflow UDP-datagrams would arrive, but ... no logs, no errors, no metrics

ReplyDelete
Replies

Add comment