In a typical Memcached deployment, a cache miss results in an expensive query to the database (see sFlow for Memcached). Since the database is usually the performance bottleneck, anything that can be done to reduce the number of misses can significantly boost the overall performance of the service. Memcached performance counters make it easy to calculate cache hit/miss rates and ratios, but don't provide insight into which keys are associated with misses. Keeping track of missed keys within Memcached would be prohibitively expensive since you would need to use memory to store information about the potentially infinite set of keys not in the cache, consuming memory that would be more usefully assigned to increase the cache size (and improve the cache hit rate).
When instrumented with sFlow, Memcached operations are sampled and the records are sent to a central location for analysis so there is no memory is taken away from the cache in order to identify top missed keys (see Superlinear for a more general discussion about the scalability of sFlow's architecture).
The article, Memcached hot keys contains a script that identifies the most frequently used keys in Memcached operations. The topkeys.pl script does not distinguish between operations and contains keys involved in hits and misses. Since sFlow reports the status code of each operation (see sFlow for memcached) it is straightforward to modify the topkeys.pl script to report only on misses (i.e. report on operations where memcached_status is NOT_FOUND=8).
The following Perl script, topmissedkeys.pl, runs sflowtool and processes the output to display the top 20 missed keys every minute:
#!/usr/bin/perl -w use strict; use POSIX; sub min ($$) { $_[$_[0] > $_[1]] } my $key_value = ""; my %key_count = (); my $start = time(); open(PS, "/usr/local/bin/sflowtool|") || die "Failed: $!\n"; while( <PS> ) { my ($attr,$value) = split; if('memcache_op_key' eq $attr) { $value =~ s/\%([A-Fa-f0-9]{2})/pack('C', hex($1))/seg; $key_value = $value; } if('memcache_op_status' eq $attr) { if('8' eq $value) { $key_count{$key_value}++; } } if('endDatagram' eq $attr) { my $now = time(); if($now - $start >= 60) { printf "=== %s ===\n", strftime('%m/%d/%Y %H:%M:%S', localtime); my @sorted = sort { $key_count {$b} <=> $key_count {$a}} keys %key_count; for(my $i = 0; $i < min(20,@sorted); $i++) { my $key = $sorted[$i]; printf "%2d %3d %s\n", $i + 1, $key_count{$key}, $key; } %key_count = (); $start = $now; } } } close(PS);
The resulting output displays a sorted table of the top missed keys:
./topmissedkeys.pl === 10/27/2010 23:27:40 === 1 4 /tmp/hsperfdata_inmsf 2 3 /tmp/hsperfdata_pp 3 1 /etc/at.deny 4 1 /etc/profile.d 5 1 /etc/java 6 1 /etc/gssapi_mech.conf 7 1 /etc/cron.daily 8 1 /etc/capi.conf 9 1 /etc/passwd 10 1 /etc/gpm-root.conf 11 1 /etc/X11 12 1 /etc/hsflowd.conf 13 1 /etc/makedev.d 14 1 /etc/fonts 15 1 /etc/dovecot.conf 16 1 /etc/alchemist 17 1 /etc/yum.conf 18 1 /etc/printcap 19 1 /etc/smrsh 20 1 /etc/ld.so.cache
The table shows that the top missed key is /tmp/hsperfdata_inmsf, having occurred in 4 samples during the minute.
This example can be extended to display top clients, top operations, etc. associated with cache misses. In order to generate quantitatively accurate results, see Packet Sampling Basics, to see how to properly scale sFlow data.