This article follows on from sFlow for Memcached, describing how to analyze sFlow data from a Memcached cluster in order to determine the "hot keys".
Currently the only software that can decode sFlow from Memcached servers is sflowtool. Download, compile and install the latest sflowtool sources on the system you are going to use to receive sFlow from the servers in the Memcached cluster.
Running sflowtool will display output of the form:
[pp@test]$ /usr/local/bin/sflowtool startDatagram ================================= datagramSourceIP 10.0.0.112 datagramSize 144 unixSecondsUTC 1285928093 datagramVersion 5 agentSubId 0 agent 10.0.0.112 packetSequenceNo 62 sysUpTime 995000 samplesInPacket 1 startSample ---------------------- sampleType_tag 0:1 sampleType FLOWSAMPLE sampleSequenceNo 28 sourceId 3:65537 meanSkipCount 1 samplePool 28 dropEvents 0 inputPort 0 outputPort 1073741823 flowBlock_tag 0:2100 extendedType socket4 socket4_ip_protocol 6 socket4_local_ip 10.0.0.112 socket4_remote_ip 10.0.0.111 socket4_local_port 11211 socket4_remote_port 40091 flowBlock_tag 0:2200 flowSampleType memcache memcache_op_protocol 1 memcache_op_cmd 7 memcache_op_key ms%5Fsetting%2Ec memcache_op_nkeys 1 memcache_op_value_bytes 26953 memcache_op_duration_uS 0 memcache_op_status 1 endSample ---------------------- endDatagram =================================
The text displayed above represents a sampled Memcached operation. The value of memcache_op_cmd is 7, indicating that this is a GET operation (see sFlow for memcached). The value of memcached_op_key shows the Memcached key is ms%5Fsetting%2Ec
Note: sflowtool prints keys using URL encoding, so the actual key was ms_setting.c
Turning this raw data into something more useful requires a script. The following Perl script, topkeys.pl, runs sflowtool and processes the output to display the top 20 keys every minute:
#!/usr/bin/perl -w use strict; use POSIX; sub min ($$) { $_[$_[0] > $_[1]] } my %key_count = (); my $start = time(); open(PS, "/usr/local/bin/sflowtool|") || die "Failed: $!\n"; while( <PS> ) { my ($attr,$value) = split; if('memcache_op_key' eq $attr) { $value =~ s/\%([A-Fa-f0-9]{2})/pack('C', hex($1))/seg; $key_count{$value}++; } my $now = time(); if($now - $start >= 60) { printf "=== %s ===\n", strftime('%m/%d/%Y %H:%M:%S', localtime); my @sorted = sort { $key_count {$b} <=> $key_count {$a}} keys %key_count; for(my $i = 0; $i < min(20,@sorted); $i++) { my $key = $sorted[$i]; printf "%2d %3d %s\n", $i + 1, $key_count{$key}, $key; } %key_count = (); $start = $now; } } close(PS);
The resulting output displays a sorted table of the hot keys:
./topkeys.pl === 10/01/2010 13:59:28 === 1 13 stopping.html.de 2 11 sitemap.html.de 3 10 install.html.de 4 9 index.html.de 5 9 invoking.html.de 6 9 new_features_2_0.html.de 7 7 upgrading.html.de 8 5 bind.html 9 4 invoking.html 10 4 dso.html 11 3 sitemap.html 12 3 server-wide.html 13 3 filter.html 14 3 logs.html 15 2 env.html 16 2 sections.html 17 2 index.html 18 2 suexec.html 19 2 configuring.html 20 2 new_features_2_0.html
The table shows that the top key is stopping.html.de, having occurred in 13 samples during the minute.
This example can be extended to display top clients, top operations, etc. In order to generate quantitatively accurate results, see Packet Sampling Basics, to see how to properly scale sFlow data.
No comments:
Post a Comment