Saturday, April 10, 2010
The image above shows the output of the Linux "top" command. Each row in the table corresponds to a process and the values in the row indicate how much of the system resources (memory and CPU) are consumed by the process. Sorting the table quickly identifies the top consumers of system resources.
Identifying "top" processes is a staple of system management and most operating systems have a tool that displays a sorted table of processes (e.g. Unix top, Windows Task Manager, OS X Activity Monitor).
When managing a data center full of servers, a top servers tool provides similar benefits, rapidly identifying servers with performance problems.
In the top servers table shown above, each row corresponds to a server in the data center. Sorting the table by server load quickly finds the most heavily loaded servers.
The challenge in constructing a data center wide top servers table is finding a scalable way to collect performance metrics from all the servers in the data center so that the metrics can be combined and sorted in a single table.
The screen capture shows actual data collected from over 1,000 servers. A Host sFlow agent was installed on each server. The agent is an open source implementation of the Host sFlow standard currently being developed at sFlow.org. The agent requires minimal server resource, only 50K of memory and negligible CPU. The combined network traffic from all 1,000+ Host sFlow agents is a little over 100K bits per second.
The Host sFlow agents provide the sFlow analyzer with a real-time view of the load on all the servers in the data center, making it possible to construct the data center wide top servers table.
Host sFlow combined with sFlow monitoring built into the network switches (see Hybrid server monitoring) provides a complete picture of the performance of each server in the data center. The traffic visibility from the switches provides context for a server's performance metrics, identifying the clients making use of its services and the back end resources that it depends on.