Sunday, September 26, 2010

Memcached


The diagram shows typical elements in a Web 2.0 data center (e.g. Facebook, Twitter, Wikipedia, Youtube, etc.). A cluster of web servers handles requests from users. Typically, the application logic for the web site will run on the web servers in the form of server side scripts (PHP, Ruby, ASP etc). The web applications access the database to retrieve and update user data. However, the database can quickly become a bottleneck, so a cache is used to store the results of database queries.

The cache is a critical element for scalability and the cache itself must be highly scalable. The cache consists of open source Memcached software deployed on a cluster of servers. Memcached is an example of a scale-out system: adding additional servers to a Memcached cluster linearly increases the capacity of the cache.

The following example shows how the cache is typically integrated into the web application logic:

function get_foo(foo_id)
    foo = memcached_get("foo:" . foo_id)
    return foo if defined foo

    foo = fetch_foo_from_database(foo_id)
    memcached_set("foo:" . foo_id, foo)
    return foo
end

To give some perspective on the challenge of managing performance in this environment, consider the scale of the problem. In December 2008, Facebook's Memcached cluster consisted of over 800 servers providing over 28 terabytes of cache memory and able to handle over 160 million requests per second (see Scaling memcached at Facebook).

One particularly challenging task is identifying the most popular keys in an operational Memcached cluster. Determining which cache keys are popular is critical to managing and optimizing Memcached performance. Consider the following example: if all the web servers are making frequent reference to a shared cache item and it is removed from the cache, the result can be a "stampede" to the database to retrieve the value. If this situation isn't managed carefully the result can be a cascading failure as the database becomes overloaded (see Cache Miss Storm).

Memcached users can now add sFlow monitoring to their Memcached servers in order to improve visibility (see sFlow for memcached). The following code extract shows the minimal impact of sFlow instrumentation on the performance of Memcached request processing loop:

#define SFLOW_SAMPLE_TEST(c) (unlikely((--c->thread->sflow_skip)==0))
...
if(SFLOW_SAMPLE_TEST(c)) {
  sflow_sample(&sFlow, c, SFMC_PROT_ASCII,
               sflow_map_nread(c->cmd),
               ITEM_key(it),
               it->nkey,
               SFLOW_TOKENS_UNKNOWN,
               (ret == STORED) ? it->nbytes : 0,
               SFLOW_DURATION_UNKNOWN,
               sflow_map_status(ret));
}
...

If you examine the code, you can see that the overhead of adding sFlow transaction sampling is essentially the cost of maintaining one additional performance counter. Memcached already maintains over 30 performance counters and these counters are also exported by the sFlow agent using sFlow's efficient "counter push" mechanism (see Link utilization). Overall, the sFlow agent imposes negligible overhead on the Memcached server since no data analysis is performed locally. With sFlow monitoring the raw data is streamed to a central sFlow Analyzer (see Choosing an sFlow analyzer).

The Memcached transaction samples allow the sFlow Analyzer to easily identify popular keys, who is using the keys, what operations are being performed and the status of those operations (see Scalability and accuracy of packet sampling). In addition, the Memcached performance counters allow the analyzer to report on overall load, cache hit/miss rates etc. This data is available across all the machines in the Memcached cluster, providing the integrated view of cluster performance needed to improve cache efficiency and avoid stampedes.

However, managing Memcached performance involves more than simply monitoring the application statistics. The sFlow standard also provides server metrics (see Cluster performance and Top servers), identifying performance problems with the servers in the Memcached cluster before they affect the performance of the Memcached application. In addition, sFlow is implemented by most switch vendors, providing the detailed visibility into network performance needed to avoid congestion and packet loss problems that can severely impact the performance of Memcached.

Memcached demonstrates how the performance of all the elements in the data center are interrelated.  The sFlow standard addresses challenge by unifying network, server and application performance monitoring to deliver the integrated view needed for effective management (see sFlow Host Structures).

Feb. 15, 2011 Update: For more recent articles on Memcached, including identifying hot/missed keys, click on the Memcache label below.

No comments:

Post a Comment