Tuesday, February 22, 2011

Windows load average

The latest version of Host sFlow calculates and exports load average metrics from Windows® systems. Since load averages aren't natively supported by Windows, this article provides an overview of load measurement and describes how the Windows Host sFlow agent calculates load averages.

Why the picture of a grocery store checkout line? When choosing which checkout line to join in a grocery store, the length of each lines gives a pretty good indication of how long the wait is for the register. Similarly, with computer systems, the number of tasks waiting for the CPU is a useful measure of system load.

The Windows Task Manager is a familiar tool for monitoring the performance of Windows systems. The Performance tab shows current CPU utilization and a trend of utilization over time.

In this example, the CPU utilization has spiked to 100%, but it's hard to gauge whether the server is overloaded. Going back to the grocery store analogy, we can see that the cashier is busy serving a customer, but we don't know how long the line is.

The following chart, generated using sFlowTrend, displays load average data collected from a Host sFlow agent installed on a Windows server:

The chart shows a trend of 1 minute, 5 minute and 15 minute load averages. Each load average is a moving average of the number of threads waiting to be serviced. A load average of 1.0 on a single processor roughly corresponds to a CPU utilization of 100%. A load average of three, as in this case, indicates that there is more work than the single processor can handle. Upgrading to a server with more CPUs would reduce the load and increase the throughput. Using the grocery store analogy, if the manager sees that lines are building up, it is time to start opening up additional registers.

For additional information, the article, Understanding Linux CPU Load - when should you be worried?, provides a good introduction to load averages and provides useful rules of thumb for sizing servers based on load average.

The white papers, UNIX Load Average Part 1: How It Works and UNIX Load Average Part 2: Not Your Average Average, give a detailed description of load averages and how to calculate them. The key to calculating a load averages on Windows is the ability to monitor the depth of the processor queue. The Microsoft article, Observing Processor Queue Length, describes the Windows System\Processor Queue Length metric, the critical measurement that allows the Host sFlow agent to calculate load averages.

Load averages are an important part of the standard set of metrics that the sFlow standard defines for monitoring the performance of servers. A standard set of metrics simplifies management of multi-vendor, multi-OS environments, while the scalability of sFlow provides real-time, centralized visibility into all the servers in the data center, making it easy to rapidly identify performance problems.

Finally, server monitoring is only one component of the sFlow standard. sFlow measurements from network devices, servers and applications combine to deliver the integrated, end-to-end visibility into performance that is essential in converged, virtualized and cloud environments.

No comments:

Post a Comment