Thursday, December 15, 2016

Monitoring Linux services

Mainstream Linux distributions have moved to systemd to manage daemons (e.g. httpd, sshd, etc.). The diagram illustrates how systemd runs each daemon within its own container so that it can maintain tight control of the daemon's resources.

This article describes how to use the open source Host sFlow agent to gather telemetry from daemons running under systemd.

Host sFlow systemd monitoring exports a standard set of metrics for each systemd service - the sFlow Host Structures extension defines metrics for Virtual Nodes (virtual machines, containers, etc.) that are used to export Xen, KVM, Docker, and Java resource usage. Exporting the standard metrics for systemd services provides interoperability with sFlow analyzers, allowing them to report on Linux services using existing virtual node monitoring capabilities.

While running daemons within containers helps systemd maintain control of the resources, it also provides a very useful abstraction for monitoring. For example, a single service (like the Apache web server) may consist of dozens of processes. Reporting on container level metrics abstracts away the per-process details and gives a view of the total resources consumed by the service. In addition, service metadata (like the service name) provides a useful way of identifying and grouping services, for example, making it easy to report on total CPU consumed by the web service across a pool of servers.

Systemd monitoring is easy to set up.

First download and install the latest software release.

Next, enable the systemd module by adding the highlighted line in the /etc/hsflowd.conf file:
sflow{
  collector{ ip=10.0.0.1 }
  systemd{}
}
This is a minimal configuration that sends sFlow telemetry to a collector running on host 10.0.0.1. The Host sFlow agent is capable of gathering an extensive set of network, system and application level metrics. See Configuring Host sFlow for Linux for a full set of options.

Finally, start the agent:
sudo systemctl enable hsflowd.service
sudo systemctl start hsflowd.service
For the best accuracy, enable systemd cgroup accounting by adding the following entries to the /etc/systemd/system.conf file and rebooting the server:
DefaultCPUAccounting=yes
DefaultBlockIOAccounting=yes
DefaultMemoryAccounting=yes
The Host sFlow agent will automatically detect when cgroup accounting has been enabled. However, if cgroup accounting hasn't been enabled, it is still able to compute and export statistics, although it might miss contributions from short lived processes.

Once the agents have been configured, verify that sFlow telemetry is being received at the collector using sflowtool. The simplest way to run sflowtool is using Docker:
docker run -p 6343:6343/udp sflow/sflowtool
The following output shows the statistics exported for the apache2 service:
startSample ----------------------
sampleType_tag 0:2
sampleType COUNTERSSAMPLE
sampleSequenceNo 50
sourceId 3:112270
counterBlock_tag 0:2103
vdsk_capacity 0
vdsk_allocation 0
vdsk_available 0
vdsk_rd_req 0
vdsk_rd_bytes 0
vdsk_wr_req 0
vdsk_wr_bytes 0
vdsk_errs 0
counterBlock_tag 0:2102
vmem_memory 16674816
vmem_maxMemory 0
counterBlock_tag 0:2101
vcpu_state 1
vcpu_cpu_mS 180
vcpu_cpuCount 0
counterBlock_tag 0:2002
parent_dsClass 2
parent_dsIndex 1
counterBlock_tag 0:2000
hostname apache2.service
UUID 92-53-c6-17-60-65-52-a2-ac-f7-76-cb-7b-63-d9-23
machine_type 3
os_name 2
os_release 4.4.0-45-generic
endSample   ----------------------
Install Host sFlow agents on all the hosts in the data center for comprehensive visibility.

No comments:

Post a Comment