Saturday, October 6, 2012

Thread pools

Figure 1: Thread pool
The thread pool pattern, illustrated in figure 1, is common to many parallel processing applications. A number of worker threads, organized in a thread pool, take tasks from a task queue. Once a thread has completed a task, it waits for a new task to appear on the task queue. Keeping track of the number of active threads in the pool is essential. If tasks wait in the queue because there aren't enough workers then requests will be delayed, or possibly dropped if the queue fills up.

The recently finalized sFlow Application Structures specification defines a standard set of metrics for reporting on thread pools:
  • Active Threads The number of threads in the thread pool that are actively processing a request.
  • Idle Threads The number of threads in the thread pool that are waiting for a request.
  • Maximum Threads The maximum number of threads that can exist in the thread pool.
  • Delayed Tasks The number of tasks that could not be served immediately, but spent time in the task queue.
  • Dropped Tasks The number of tasks that were dropped because the task queue was full.
The Apache web server uses a thread pool and is a useful demonstration of the value of the sFlow thread pool metrics. The Apache thread pool can be accessed using mod_status, which makes the thread pool visible as a web page. The following screen capture shows the server-status page generated by mod_status:
The grid of characters is used to visualize the the state of the pool (referred to as the "scoreboard"), each cell in the grid represents a slot for a thread and the size of the grid shows the maximum number of threads that are permitted in the pool. The summary line above the grid states that 6 requests are currently being processed and that there are 69 idle workers (i.e. there are six "W" characters and sixty nine "_" characters in the grid).

While the server-status page isn't designed to be machine readable, the information is critical and there are numerous performance monitoring tools that make HTTP requests and extract the worker pool statistics from the text. A much more efficient way to retrieve the information is to use the Apache sFlow module, which in addition to reporting the thread pool statistics will export HTTP counters, URLs, response times, status codes, etc.

The article, Using Ganglia to monitor web farms, describes how to use the open source Ganglia performance monitoring software to collect and report on web server clusters using sFlow. Ganglia now includes support for the sFlow thread pool metrics.
Figure 2: Ganglia chart showing active threads from an Apache web server
Figure 2 trends the number of active workers in the pool. If the number of active workers approaches the maximum allowed, then additional servers may need to be added to the cluster. An increase in active threads could also indicate a performance problem with backend systems (a slow database holding up worker threads) or may be the result of a Denial of Service (DoS) attack (e.g. Sloloris).

Monitoring thread pools using sFlow is very useful, but only scratches the surface of what is possible. The sFlow standard is widely support be network equipment vendors and can be combined with sFlow metrics from hosts, services and applications to provide a comprehensive view of data center performance.

1 comment: