Friday, June 3, 2016

Internet of Things (IoT) telemetry

The internet of things (IoT) is the network of physical objects—devices, vehicles, buildings and other items—embedded with electronics, software, sensors, and network connectivity that enables these objects to collect and exchange data. - ITU

The recently released Raspberry Pi Zero (costing $5) is an example of the type of embedded low power computer enabling IoT. These small devices are typically wired to one or more sensors (measuring temperature, humidity, location, acceleration, etc.) and embedded in or attached to physical devices.

Collecting real-time telemetry from large numbers of small devices that may be located within many widely dispersed administrative domains poses a number of challenges, for example:
  • Discovery - How are newly connected devices discovered?
  • Configuration - How can the numerous individual devices be efficiently configured?
  • Transport - How efficiently are measurements transported and delivered?
  • Latency - How long does it take before measurements are remotely accessible? 
This article will use the Raspberry Pi as an example to explore how the architecture of the industry standard sFlow protocol and its implementation in the open source Host sFlow agent provide a method of addressing the challenges of embedded device monitoring.

The following steps describe how to install the Host sFlow agent on Raspbian Jesse (the Debian Linux based Raspberry Pi operating system).
sudo apt-get update
sudo apt-get install libpcap-dev
git clone https://github.com/sflow/host-sflow
sudo make install
The resulting Host sFlow binary is extremely small (only 163,300 bytes in this case):
pi@raspberrypi:~ $ ls -l /usr/sbin/hsflowd 
-rwx------ 1 root root 163300 Jun  1 17:18 /usr/sbin/hsflowd
Next, specify /etc/hsflowd.conf file for the device:
sflow {
  agent = eth0
  agent.cidr=::/0
  DNSSD = on
  DNSSD_domain = .sf.inmon.com
  jsonPort = 36343
  pcap { dev = eth0 }
}
There are a number of important points to note about this configuration:
  • The configuration is not device specific - this same configuration can be pre-loaded in every device.
  • Prefer IPv6 addresses as a way of identifying the agent since they are more likely to be globally unique.
  • DNS Service Discovery (DNS-SD) is used to retrieve dynamic configuration on startup and to periodically refresh the configuration. Hosting a single copy of the configuration (in the form of SRV and TXT records on the DNS server responsible for the sf.inmon.com domain) minimize the complexity of managing large numbers of devices.
  • Network visibility provides a way to monitor the interactions between the devices on the network. The pcap entry enables a Berkeley Packet Filter to efficiently sample network traffic using instrumentation built into the Linux kernel.
  • Custom Metrics can be be sent along with the extensive set of standard sFlow metrics by including the jsonPort entry.
Now start the daemon:
sudo /etc/init.d/hsflowd start
Now add an entry to the sf.inmon.com.zone file on the DNS server:
_sflow._udp   30  SRV     0 0 6343  collector.sf.inmon.com.
In this case, the SRV record specifies that sFlow records should be sent via UDP to collector.sf.inmon.com on port 6343. The TTL is set to 30 seconds so that agents will pick up any changes within 30 seconds. A larger TTL should be used to improve scaleability if there are large numbers of devices.

The following example shows how Custom Metrics can be used to export sensor data. The temp.py script exports the CPU temperature:
#!/usr/bin/env python

import json
import socket

tempC = int(open('/sys/class/thermal/thermal_zone0/temp').read()) / 1e3
msg = {
  "rtmetric": {
    "datasource": "sensors",
    "tempC": { "type": "gaugeFloat", "value": tempC }
  }
}
sock = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
sock.sendto(json.dumps(msg),("127.0.0.1",36343))
The following crontab entry runs the script every minute:
* * * * * /home/pi/temp.py
The Host sFlow agent will automatically pick up the configuration via a DNS request, start making measurements, which are immediately send in standard sFlow UDP datagrams to the designated sFlow collector collector.sf.inmon.com. sFlow's immediate transmission of measurements minimizes the memory requirements on the agent (since data doesn't have to be stored for later retrieval) and minimizes the latency before measurements are accessible on the collector (and can be acted on).

It should also be noted that all communication is initiated by the device (DNS requests and transmission of telemetry via sFlow). This means that the radio on the device can be powered down between transmissions to save power (and extend battery life if the device is battery powered).
Raspberry Pi real-time network analytics describes how to build a low cost sFlow analyzer using a Raspberry Pi model 3 b and sFlow-RT real-time analytics software. The following command queries the sFlow-RT REST API to show the set of standard metrics being exported by the agent (2001:470:67:27d:d811:aa7e:9e54:30e9):
pi@raspberrypi:~ $ curl http://localhost:8008/metric/2001:470:67:27d:d811:aa7e:9e54:30e9/json
{
 "2.1.bytes_in": 3211.5520193372945,
 "2.1.bytes_out": 462.2822036458858,
 "2.1.bytes_read": 0,
 "2.1.bytes_written": 4537.818511431161,
 "2.1.contexts": 7006.546480008057,
 "2.1.cpu_guest": 0,
 "2.1.cpu_guest_nice": 0,
 "2.1.cpu_idle": 99.17638114546376,
 "2.1.cpu_intr": 0,
 "2.1.cpu_nice": 0,
 "2.1.cpu_num": 4,
 "2.1.cpu_sintr": 0.025342118601115054,
 "2.1.cpu_speed": 0,
 "2.1.cpu_steal": 0,
 "2.1.cpu_system": 0.456158134820071,
 "2.1.cpu_user": 0.3294475418144957,
 "2.1.cpu_utilization": 0.8236188545362393,
 "2.1.cpu_wio": 0.012671059300557527,
 "2.1.disk_free": 24435570688,
 "2.1.disk_total": 29627484160,
 "2.1.disk_utilization": 17.523977160453796,
 "2.1.drops_in": 0,
 "2.1.drops_out": 0,
 "2.1.errs_in": 0,
 "2.1.errs_out": 0,
 "2.1.host_name": "raspberrypi",
 "2.1.icmp_inaddrmaskreps": 0,
 "2.1.icmp_inaddrmasks": 0,
 "2.1.icmp_indestunreachs": 0,
 "2.1.icmp_inechoreps": 0,
 "2.1.icmp_inechos": 0,
 "2.1.icmp_inerrors": 0,
 "2.1.icmp_inmsgs": 0,
 "2.1.icmp_inparamprobs": 0,
 "2.1.icmp_inredirects": 0,
 "2.1.icmp_insrcquenchs": 0,
 "2.1.icmp_intimeexcds": 0,
 "2.1.icmp_intimestamps": 0,
 "2.1.icmp_outaddrmaskreps": 0,
 "2.1.icmp_outaddrmasks": 0,
 "2.1.icmp_outdestunreachs": 0,
 "2.1.icmp_outechoreps": 0,
 "2.1.icmp_outechos": 0,
 "2.1.icmp_outerrors": 0,
 "2.1.icmp_outmsgs": 0,
 "2.1.icmp_outparamprobs": 0,
 "2.1.icmp_outredirects": 0,
 "2.1.icmp_outsrcquenchs": 0,
 "2.1.icmp_outtimeexcds": 0,
 "2.1.icmp_outtimestampreps": 0,
 "2.1.icmp_outtimestamps": 0,
 "2.1.interrupts": 4438.56380300131,
 "2.1.ip_defaultttl": 64,
 "2.1.ip_forwarding": 2,
 "2.1.ip_forwdatagrams": 0,
 "2.1.ip_fragcreates": 0,
 "2.1.ip_fragfails": 0,
 "2.1.ip_fragoks": 0,
 "2.1.ip_inaddrerrors": 0,
 "2.1.ip_indelivers": 9.165072011280088,
 "2.1.ip_indiscards": 0,
 "2.1.ip_inhdrerrors": 0,
 "2.1.ip_inreceives": 9.215429549803606,
 "2.1.ip_inunknownprotos": 0,
 "2.1.ip_outdiscards": 0,
 "2.1.ip_outnoroutes": 0,
 "2.1.ip_outrequests": 2.1653741565112297,
 "2.1.ip_reasmfails": 0,
 "2.1.ip_reasmoks": 0,
 "2.1.ip_reasmreqds": 0,
 "2.1.ip_reasmtimeout": 0,
 "2.1.load_fifteen": 0.05,
 "2.1.load_fifteen_per_cpu": 0.0125,
 "2.1.load_five": 0.02,
 "2.1.load_five_per_cpu": 0.005,
 "2.1.load_one": 0,
 "2.1.load_one_per_cpu": 0,
 "2.1.machine_type": "arm",
 "2.1.mem_buffers": 52133888,
 "2.1.mem_cached": 383287296,
 "2.1.mem_free": 238026752,
 "2.1.mem_shared": 0,
 "2.1.mem_total": 970506240,
 "2.1.mem_used": 297058304,
 "2.1.mem_utilization": 30.608591437339783,
 "2.1.os_name": "linux",
 "2.1.os_release": "4.4.9-v7+",
 "2.1.page_in": 0,
 "2.1.page_out": 2.2157316950347465,
 "2.1.part_max_used": 31.74,
 "2.1.pkts_in": 11.582233860408904,
 "2.1.pkts_out": 2.266089233558264,
 "2.1.proc_run": 1,
 "2.1.proc_total": 237,
 "2.1.read_time": 0,
 "2.1.reads": 0,
 "2.1.swap_free": 104853504,
 "2.1.swap_in": 0,
 "2.1.swap_out": 0,
 "2.1.swap_total": 104853504,
 "2.1.tcp_activeopens": 0,
 "2.1.tcp_attemptfails": 0,
 "2.1.tcp_currestab": 6,
 "2.1.tcp_estabresets": 0,
 "2.1.tcp_incsumerrs": 0,
 "2.1.tcp_inerrs": 0,
 "2.1.tcp_insegs": 2.568234464699365,
 "2.1.tcp_maxconn": 4294967295,
 "2.1.tcp_outrsts": 0,
 "2.1.tcp_outsegs": 1.913586463893645,
 "2.1.tcp_passiveopens": 0,
 "2.1.tcp_retranssegs": 0,
 "2.1.tcp_rtoalgorithm": 1,
 "2.1.tcp_rtomax": 120000,
 "2.1.tcp_rtomin": 200,
 "2.1.udp_incsumerrors": 0,
 "2.1.udp_indatagrams": 6.596837546580723,
 "2.1.udp_inerrors": 0,
 "2.1.udp_noports": 0,
 "2.1.udp_outdatagrams": 0.2517876926175849,
 "2.1.udp_rcvbuferrors": 0,
 "2.1.udp_sndbuferrors": 0,
 "2.1.uptime": 46603,
 "2.1.uuid": "22c4ce8c-067e-4517-8c00-8d822efc4897",
 "2.1.write_time": 8.333333333333334,
 "2.1.writes": 0.6042904622822036,
 "2.ifadminstatus": "up",
 "2.ifdirection": "full-duplex",
 "2.ifindex": "2",
 "2.ifindiscards": 0,
 "2.ifinerrors": 0,
 "2.ifinoctets": 3241.1101037574294,
 "2.ifinpkts": 11.483831973405863,
 "2.ifinucastpkts": 11.483831973405863,
 "2.ifinutilization": 0.025928880830059432,
 "2.ifname": "eth0",
 "2.ifoperstatus": "up",
 "2.ifoutdiscards": 0,
 "2.ifouterrors": 0,
 "2.ifoutoctets": 406.6686813740304,
 "2.ifoutpkts": 1.8636043114737586,
 "2.ifoutucastpkts": 1.8636043114737586,
 "2.ifoututilization": 0.0032533494509922435,
 "2.ifspeed": 100000000,
 "2.iftype": "ethernetCsmacd",
 "sensors.tempC": 49.388
}
Note the custom temperature metric at the end of the list.

In addition, enabling traffic monitoring in the Host sFlow agent provides detailed flow information along with the metrics to provide visibility into interactions between the devices on the network. In this case the wired Ethernet interface (eth0) is being monitored, but monitoring the wireless interface (wlan0) would be a way to gain visibility into messages exchanged over an ad-hoc wireless mesh network connecting devices. RESTflow describes how to perform flow analytics using sFlow-RT.

In conclusion, sFlow provides a standard way to export metrics and traffic information. Most network equipment vendors already provide sFlow support and the technology has a number of architectural features that are well suited to addressing the challenges of extending visibility to and gathering telemetry from large scale IoT deployments.

No comments:

Post a Comment