Friday, December 16, 2016

Using Ganglia to monitor Linux services

The screen capture from the Ganglia monitoring tool shows metrics for services running on a Linux host. Monitoring Linux services describes how the open source Host sFlow agent has been extended to export standard Virtual Node metrics from services running under systemd. Ganglia already supports these standard metrics and the article Using Ganglia to monitor virtual machine pools describes the configuration steps needed to enable this feature.

Thursday, December 15, 2016

Monitoring Linux services

Mainstream Linux distributions have moved to systemd to manage daemons (e.g. httpd, sshd, etc.). The diagram illustrates how systemd runs each daemon within its own container so that it can maintain tight control of the daemon's resources.

This article describes how to use the open source Host sFlow agent to gather telemetry from daemons running under systemd.

Host sFlow systemd monitoring exports a standard set of metrics for each systemd service - the sFlow Host Structures extension defines metrics for Virtual Nodes (virtual machines, containers, etc.) that are used to export Xen, KVM, Docker, and Java resource usage. Exporting the standard metrics for systemd services provides interoperability with sFlow analyzers, allowing them to report on Linux services using existing virtual node monitoring capabilities.

While running daemons within containers helps systemd maintain control of the resources, it also provides a very useful abstraction for monitoring. For example, a single service (like the Apache web server) may consist of dozens of processes. Reporting on container level metrics abstracts away the per-process details and gives a view of the total resources consumed by the service. In addition, service metadata (like the service name) provides a useful way of identifying and grouping services, for example, making it easy to report on total CPU consumed by the web service across a pool of servers.

Systemd monitoring is easy to set up.

First download and install the latest software release.

Next, enable the systemd module by adding the highlighted line in the /etc/hsflowd.conf file:
  collector{ ip= }
This is a minimal configuration that sends sFlow telemetry to a collector running on host The Host sFlow agent is capable of gathering an extensive set of network, system and application level metrics. See Configuring Host sFlow for Linux for a full set of options.

Finally, start the agent:
sudo systemctl enable hsflowd.service
sudo systemctl start hsflowd.service
For the best accuracy, enable systemd cgroup accounting by adding the following entries to the /etc/systemd/system.conf file and rebooting the server:
The Host sFlow agent will automatically detect when cgroup accounting has been enabled. However, if cgroup accounting hasn't been enabled, it is still able to compute and export statistics, although it might miss contributions from short lived processes.

Once the agents have been configured, verify that sFlow telemetry is being received at the collector using sflowtool. The simplest way to run sflowtool is using Docker:
docker run -p 6343:6343/udp sflow/sflowtool
The following output shows the statistics exported for the apache2 service:
startSample ----------------------
sampleType_tag 0:2
sampleSequenceNo 50
sourceId 3:112270
counterBlock_tag 0:2103
vdsk_capacity 0
vdsk_allocation 0
vdsk_available 0
vdsk_rd_req 0
vdsk_rd_bytes 0
vdsk_wr_req 0
vdsk_wr_bytes 0
vdsk_errs 0
counterBlock_tag 0:2102
vmem_memory 16674816
vmem_maxMemory 0
counterBlock_tag 0:2101
vcpu_state 1
vcpu_cpu_mS 180
vcpu_cpuCount 0
counterBlock_tag 0:2002
parent_dsClass 2
parent_dsIndex 1
counterBlock_tag 0:2000
hostname apache2.service
UUID 92-53-c6-17-60-65-52-a2-ac-f7-76-cb-7b-63-d9-23
machine_type 3
os_name 2
os_release 4.4.0-45-generic
endSample   ----------------------
Install Host sFlow agents on all the hosts in the data center for comprehensive visibility.

Thursday, December 1, 2016

IPv6 Internet router using merchant silicon

Internet router using merchant silicon describes how a commodity white box switch can be used as a replacement for an expensive Internet router. The solution combines standard sFlow instrumentation implemented in merchant silicon with BGP routing information to selectively install only active routes into the hardware.

The article describes a simple self contained solution that uses standard APIs and should be able to run on a variety of Linux based network operating systems, including: Cumulus Linux, Dell OS10, Arista EOS, and Cisco NX-OS.

The diagram shows the elements of the solution. Standard sFlow instrumentation embedded in the merchant silicon ASIC data plane in the white box switch provides real-time information on traffic flowing through the switch. The sFlow agent is configured to send the sFlow to an instance of sFlow-RT running on the switch. The Bird routing daemon is used to handle the BGP peering sessions and to install routes in the Linux kernel using the standard netlink interface. The network operating system in turn programs the switch ASIC with the kernel routes so that packets are forwarded by the switch hardware and not by the kernel software.

The key to this solution is Bird's multi-table capabilities. The full Internet routing table learned from BGP peers is installed in a user space table that is not reflected into the kernel. A BGP session between sFlow-RT analytics software and Bird allows sFlow-RT to see the full routing table and combine it with the sFlow telemetry to perform real-time BGP route analytics and identify the currently active routes. A second BGP session allows sFlow-RT to push routes to Bird which in turn pushes the active routes to the kernel, programming the ASIC.

This article extends the previous example to add IPv6 routing. In this example, the following Bird configuration, /etc/bird/bird6.conf, was installed on the switch:
# Please refer to the documentation in the bird-doc package or BIRD User's
# Guide on for more information on configuring BIRD and
# adding routing protocols.

# Change this into your BIRD router ID. It's a world-wide unique identification
# of your router, usually one of router's IPv6 addresses.
router id;

# The Kernel protocol is not a real routing protocol. Instead of communicating
# with other routers in the network, it performs synchronization of BIRD's
# routing tables with the OS kernel.
protocol kernel {
 scan time 60;
        scan time 2;
 import all;
 export all;

# The Device protocol is not a real routing protocol. It doesn't generate any
# routes and it only serves as a module for getting information about network
# interfaces from the kernel. 
protocol device {
 scan time 60;

protocol direct {
        interface "*";

# Create a new table (disconnected from kernel/master) for peering routes
table peers;

protocol bgp peer_65134 {
  table peers;
  igp table master;
  local as 65136;
  neighbor fc00:136::2 as 65134;
  source address fc00:136::1;
  import all;
  export all;

protocol bgp peer_65135 {
  table peers;
  igp table master;
  local as 65136;
  neighbor fc00:136::3 as 65135;
  source address fc00:136::1;
  import all;
  export all;

# Copy default route from peers table to master table
protocol pipe {
  table peers;
  peer table master;
  import none;
  export filter {
     if net ~ [ ::/0 ] then accept;

# Reflect peers table to sFlow-RT
protocol bgp to_sflow_rt {
  table peers;
  igp table master;
  local as 65136;
  neighbor ::1 port 1179 as 65136;
  import none;
  export all;

# Receive active prefixes from sFlow-RT
protocol bgp from_sflow_rt {
  local as 65136;
  neighbor fc00:136::1 port 1179 as 65136;
  import all;
  export none;
The open source Active Route Manager (ARM) application has been installed in sFlow-RT and the following sFlow-RT configuration, /usr/local/sflow-rt/conf.d/sflow-rt.conf, adds the IPv6 BGP route reflector and control sessions with Bird:
arm.sflow.ip= =
Once configured, operation is entirely automatic. As soon as traffic starts flowing to a new route, the route is identified and installed in the ASIC. If the route later becomes inactive, it is automatically removed from the ASIC to be replaced with a different active route. In this case, the maximum number of routes allowed in the ASIC has been specified as 5,000. This number can be changed to reflect the capacity of the hardware.
The Active Route Manager application has a web interface that provides up to the second visibility into the number of routes, routes installed in hardware, amount of traffic, hardware and software resource utilization etc. In addition, the sFlow-RT REST API can be used to make additional queries.