Monday, January 21, 2013

Memcache hot keys and cluster load balancing

Figure 1: Link saturation due to hot Memcache key (from Etsy Code as Craft blog
The article, mctop – a tool for analyzing memcache get traffic, on Etsy's Code as Craft blog describes the problems that can occur in Memcache clusters when traffic associated with hot/popular keys results in network congestion and poor application performance.

Figure 1 from the Etsy article shows traffic on the link to a server hosting a hot key. The clipping on the chart as traffic reaches 960 Mbits/s indicates that the link is saturated. The chart also shows the drop in traffic when client code was modified to reduce accesses to the hot key.

When looking at access patterns in a Memcache cluster, it is important to understand how clients decide which member of the cluster to access for a particular key. Generally, a hash function is computed on the key, and based on the value of the hash, the client selects a server in the cluster, e.g.
index = hash(key) % cluster.size
selected_server = cluster[index]
value = selected_server.get(key)
Using a hash function randomly distributes keys across the cluster and allows clients to independently determine which server to access for a given key, resulting in a cache architecture than can be scaled-out to very large sizes.
Figure 2: Memcache traffic between clients and servers
Figure 2 illustrates how hot keys cause traffic patterns to concentrate on a single server in the cluster. Each line color represents traffic associated with a particular key. While there may be millions of keys in the cache, most keys are infrequently accessed. A much smaller number of frequently accessed keys - the hot keys - dominates when looking at traffic patterns. For example, traffic associated with a hot key, shown in red in the diagram, is driven by frequent access to the key by many clients.

There are interesting similarities between traffic patterns generated by hot keys and the challenge of Load balancing LAG/ECMP groups described in a previous article. The article describes how performance aware software defined networking (SDN) can be used to detect and redirect large traffic flows. The remainder of this article will examine whether SDN techniques can be applied to manage Memcache performance.

The first step is to include instrumentation by deploying Memcache servers with integrated support for the sFlow standard (just as switches supporting the sFlow standard are used to provide real-time measurements in the LAG/ECMP article). The sFlow-RT analytics engine is used to generate actionable metrics, for example to alert when traffic to a key exceeds a threshold.

The following Python script,, generates notifications of hot keys and missed keys that can be used to detect performance problems and identify the particular keys and servers affected:
import requests
import json

hotkey = {'keys':'memcachekey', 'value':'bytes'}
missedkey = {'keys':'memcachekey', 'value':'requests', 'filter':'memcachestatus=NOT_FOUND'}
hotkeythreshold = {'metric':'hotkey', 'value':1000000}
missedkeythreshold = {'metric':'missedkey', 'value':20}

r = requests.put(rt + '/flow/hotkey/json',data=json.dumps(hotkey))
r = requests.put(rt + '/flow/missedkey/json',data=json.dumps(missedkey))
r = requests.put(rt + '/threshold/hotkey/json',data=json.dumps(hotkeythreshold))
r = requests.put(rt + '/threshold/missedkey/json',data=json.dumps(missedkeythreshold))
eventurl = rt + '/events/json?maxEvents=10&timeout=60'
eventID = -1
while 1 == 1:
  r = requests.get(eventurl + "&eventID=" + str(eventID))
  if r.status_code != 200: break
  events = r.json()
  if len(events) == 0: continue

  eventID = events[0]["eventID"]
  for e in events:
    r = requests.get(rt + '/metric/' + e['agent'] + '/' + e['dataSource'] + '.' + e['metric'] + '/json')
    metrics = r.json()
    if len(metrics) > 0:
      evtMetric = metrics[0]
      evtKeys = evtMetric.get('topKeys',None)
      if(evtKeys and len(evtKeys) > 0):
        topKey = evtKeys[0]
        key = topKey.get('key', None)
        value = topKey.get('value',None)
        print e['agent'] + ',' + key + ',' + str(value)
The following output shows the results produced as the script generates a notifications:
$ python 
The script has identified a hot key, session.time, identified the member of the cluster hosting the key,,  and the amount of traffic to the key, 1481 bytes/second.

Note: The script also identified the key, session.uesr_id, as having a high miss rate. It is pretty clear that this is a typo and a client is using the wrong key name, it should be session.user_id. Correcting the error increases the cache hit rate, reduced load on the databases and improves application response time, see Memcached missed keys.

The problems identified in the hot key example on the Etsy blog and the missed key example above were corrected manually; by changing how the application logic around accessing the key in the first case, and correcting a typo in the second. However, it's interesting to consider whether it might be possible to automatically load balance traffic on the Memcache cluster by programmatically changing how keys are mapped to servers in the cluster, similar to the way in which OpenFlow is used in the LAG article to control switch forwarding.

Memcache clients already need to know the ordered list of the servers in the cluster in order to implement the hash-based load balancing mechanism. If in addition, a set of matching rules (analogous to OpenFlow match rules) could be applied to keys to specifically override the server selection decision, then it would be possible to automatically load balance the cluster by evenly distributing high usage keys, e.g.
selected_server = lookuprule(key)
if(!selected_server) {
  index = hash(key) % cluster.size
  selected_server = cluster[index]
value = selected_server.get(key)
Note: A workable solution isn't quite this simple. A viable solution would also need to move the data to the new server before applying a new rule in order to avoid a cache miss storm.

Memcache hot keys is an interesting example that demonstrates the close relationship between network, server and application performance. A silo'ed organization would find it difficult to address this issue: the networking team looking at the problem in isolation would see it as a capacity problem and might propose an expensive and disruptive upgrade, the Memcache administrator (without network visibility) might just see a chronic and inexplicable performance problem, and the application team (relying on the Memcache cluster to improve performance and scaleability of the web site) would see chronic performance problems affecting site users.
Figure 3: Typical Web 2.0 application architecture
While this article focused primarily on Memcache performance, the cache is only one element in a more complex application architecture. Figure 3 shows the elements in a Web 2.0 data center (e.g. Facebook, Twitter, Wikipedia, Youtube, etc.). A cluster of web servers handles requests from users. Typically, the application logic for the web site will run on the web servers in the form of server side scripts (PHP, Ruby, ASP etc). The web applications access the database to retrieve and update user data. Since the database can quickly become a bottleneck, the cache is used to store the results of database queries.

Combining sFlow solutions for monitoring network devices, hosts, web servers, Memcache servers and the applications built on this infrastructure delivers the unified visibility needed to manage data center wide performance and lays the foundation for a software defined data center (SDDC).

No comments:

Post a Comment