Thursday, October 28, 2010

Memcached missed keys


This article follows on from the sFlow for Memcached and Memcached hot keys articles and examines how sFlow can be used to improve the cache hit rate in a Memcached cluster.

In a typical Memcached deployment, a cache miss results in an expensive query to the database (see sFlow for Memcached). Since the database is usually the performance bottleneck, anything that can be done to reduce the number of misses can significantly boost the overall performance of the service. Memcached performance counters make it easy to calculate cache hit/miss rates and ratios, but don't provide insight into which keys are associated with misses. Keeping track of missed keys within Memcached would be prohibitively expensive since you would need to use memory to store information about the potentially infinite set of keys not in the cache, consuming memory that would be more usefully assigned to increase the cache size (and improve the cache hit rate).

When instrumented with sFlow, Memcached operations are sampled and the records are sent to a central location for analysis so there is no memory is taken away from the cache in order to identify top missed keys (see Superlinear for a more general discussion about the scalability of sFlow's architecture).

The article, Memcached hot keys contains a script that identifies the most frequently used keys in Memcached operations. The topkeys.pl script does not distinguish between operations and contains keys involved in hits and misses. Since sFlow reports the status code of each operation (see sFlow for memcached) it is straightforward to modify the topkeys.pl script to report only on misses (i.e. report on operations where memcached_status is NOT_FOUND=8).

The following Perl script, topmissedkeys.pl, runs sflowtool and processes the output to display the top 20 missed keys every minute:

#!/usr/bin/perl -w
use strict;
use POSIX;

sub min ($$) { $_[$_[0] > $_[1]] }
my $key_value = "";
my %key_count = ();
my $start = time();
open(PS, "/usr/local/bin/sflowtool|") || die "Failed: $!\n";
while( <PS> ) {
  my ($attr,$value) = split;
  if('memcache_op_key' eq $attr) {
    $value =~ s/\%([A-Fa-f0-9]{2})/pack('C', hex($1))/seg;
    $key_value = $value;
  }
  if('memcache_op_status' eq $attr) {
    if('8' eq $value) {
      $key_count{$key_value}++;
    }
  }
  if('endDatagram' eq $attr) {
    my $now = time();
    if($now - $start >= 60) {
      printf "=== %s ===\n", strftime('%m/%d/%Y %H:%M:%S', localtime);
      my @sorted = sort { $key_count {$b} <=> $key_count {$a}} keys %key_count;
      for(my $i = 0; $i < min(20,@sorted); $i++) {
        my $key = $sorted[$i];
        printf "%2d %3d %s\n", $i + 1, $key_count{$key}, $key;
      }
      %key_count = ();
      $start = $now;
    }
  }
}

close(PS);

The resulting output displays a sorted table of the top missed keys:

./topmissedkeys.pl
=== 10/27/2010 23:27:40 ===
 1   4 /tmp/hsperfdata_inmsf
 2   3 /tmp/hsperfdata_pp
 3   1 /etc/at.deny
 4   1 /etc/profile.d
 5   1 /etc/java
 6   1 /etc/gssapi_mech.conf
 7   1 /etc/cron.daily
 8   1 /etc/capi.conf
 9   1 /etc/passwd
10   1 /etc/gpm-root.conf
11   1 /etc/X11
12   1 /etc/hsflowd.conf
13   1 /etc/makedev.d
14   1 /etc/fonts
15   1 /etc/dovecot.conf
16   1 /etc/alchemist
17   1 /etc/yum.conf
18   1 /etc/printcap
19   1 /etc/smrsh
20   1 /etc/ld.so.cache

The table shows that the top missed key is /tmp/hsperfdata_inmsf, having occurred in 4 samples during the minute.

This example can be extended to display top clients, top operations, etc. associated with cache misses. In order to generate quantitatively accurate results, see Packet Sampling Basics, to see how to properly scale sFlow data.

No comments:

Post a Comment