The Host sFlow distributed agent article describes how sFlow agents, embedded in applications such as Apache, NGINX, Tomcat, Memcached and Java, coordinate with the Host sFlow daemon (hsflowd) in order to monitor server and application performance.
Implementing sFlow monitoring natively in widely used applications is worth the effort, since the result is a highly scaleable, easily deployed solution with minimal impact on performance. However, this approach is overkill for monitoring application logic implemented in scripting languages such as PHP, Python, Ruby or Perl.
Recently, a simple JSON API was added to hsfowd that makes it easy to integrate performance monitoring in scripting languages. The API is an implementation of the sFlow Application Structures draft, which defines a generalized set of application layer performance metrics.
Note: Even if you instrument your scripted application logic, you should still enable sFlow in the web server since the HTTP information exported from the web server complements application metrics to provide a more complete picture of performance, see HTTP.
Currently the API is only implemented in the Linux trunk. To try it out, you will need to check out the trunk and build hsflowd from sources:
svn co https://host-sflow.svn.sourceforge.net/svnroot/host-sflow/trunk host-sflow cd host-sflow make make install make schedule
Note: See Installing Host sFlow on a Linux server for additional configuration information. The JSON API will be included in the upcoming Host sFlow 1.21 release.
Uncommenting the following entry in the /etc/hsflowd.conf file opens a UDP port to receive JSON encoded metrics:
jsonPort = 36343
Note: It is recommended that you keep the default port, 36343, since any change to the port number will require a corresponding change in any scripts that send JSON messages. The jsonPort value is written into the /etc/hsflowd.auto file so that it is available to client scripts, along with other sFlow settings, see Host sFlow distributed agent. The hsflowd daemon will only accept messages generated on the same host which shouldn't be a problem since hsflowd should be installed on every host to report host performance metrics.
The following is an example of the type of JSON message that a script can send to describe the outcome of a transaction:
{"flow_sample":{ "app_name":"myapp", "app_operation":{ "operation":"user.friend", "attributes":"id=123&handle=sith", "status_descr":"OK", "status":0, "req_bytes":43, "resp_bytes":234, "uS":2000}, "app_initiator":{"actor":"123"}, "app_target":{"actor":"231"}, "extended_socket_ipv4":{ "protocol":6, "local_ip":"10.0.0.1", "remote_ip":"10.0.0.23", "local_port":123, "remote_port":43032} }}
Note: The names of the structures and attributes in the JSON message mirror the structures defined in sFlow Application Structures.
Many of the structures and attributes are optional, a message can be as simple as:
{"flow_sample:{ "app_name":"myapp", "app_operation":{ "operation":"user.friend" } }
Constructing and sending JSON messages as UDP datagrams to hsflowd is straightforward. For example, the following PHP app_operation function formats and sends the previous JSON message.
function app_operation($app_name, $op_name, $attributes="", $status=0, $status_descr="", $req_bytes=0, $resp_bytes=0, $uS=0, $sampling_rate=1) { if($sampling_rate > 1) { if(mt_rand(1,$sampling_rate) != 1) { return; } } try { $sock = fsockopen("udp://localhost",36343,$errno,$errstr); if(! $sock) { return; } fwrite($sock, '{"flow_sample":{ "app_name":"'.$app_name.'", "sampling_rate":'.$sampling_rate.', "app_operation":{ "operation":"'.$op_name.'", "attributes":"'.$attributes.'", "status_descr":"'.$status_descr.'", "status":'.$status.', "req_bytes":'.$req_bytes.', "resp_bytes":'.$resp_bytes.', "uS":'.$uS.'}}}'); fclose($sock); } catch(Exception $e) {} }
Note: This function supports transaction sampling, i.e. if you set a sampling_rate of 10 then there is a 1-in-10 chance that the operation will actually generate a measurement. Sampling allows you to reduce the measurement overhead in high transaction rate environments and still generate useful results. You should set a sampling_rate that reduces the impact of monitoring on your application to acceptable levels, although, you probably don't need to sample at all unless you are handling hundreds of operations per second. Choose a fixed sampling_rate that works for your application - choose the lowest sampling_rate that protects application performance - as you will see later, a low sampling_rate setting will allow hsflowd to maintain more accurate counters and provide a greater range of sampling rates when it exports the data.
Including the app_operation function in your PHP application library makes instrumenting PHP application logic as simple as including a single line of code in a PHP rendered web page:
<?php app_operation("myapp","user.friend"); ?>
When hsflowd receives a JSON message, it increments per application performance counters (scaled by the sampling rate if needed). The counters are periodically exported along with the other sFlow metrics (CPU, memory, disk and network I/O) that hsflowd exports.
The following output from sflowtool shows the contents of an sFlow datagram containing application counters:
startDatagram ================================= datagramSourceIP 127.0.0.1 datagramSize 112 unixSecondsUTC 1336846918 datagramVersion 5 agentSubId 100000 agent 10.0.0.150 packetSequenceNo 2670 sysUpTime 77514000 samplesInPacket 1 startSample ---------------------- sampleType_tag 0:2 sampleType COUNTERSSAMPLE sampleSequenceNo 4 sourceId 3:150002 counterBlock_tag 0:2202 application myapp status_OK 23 errors_OTHER 0 errors_TIMEOUT 0 errors_INTERNAL_ERROR 0 errors_BAD_REQUEST 0 errors_FORBIDDEN 0 errors_TOO_LARGE 0 errors_NOT_IMPLEMENTED 0 errors_NOT_FOUND 0 errors_UNAVAILABLE 0 errors_UNAUTHORIZED 0 endSample ---------------------- endDatagram =================================
The application counters can be sent to tools like Ganglia and Graphite in order to trend the performance of individual application instances, or of whole clusters of applications.
The transactions are also sampled by hsflowd based on the sampling setting in hsflowd.conf, or DNS-SD. Specific sampling rates can be set based on application name, for example, to override the default sampling rate of 400 and apply a sampling rate of 1-in-100 to the myapp, use the following setting:
sampling.app.myapp=100
Note: If the transactions were sampled before being sent to hsflowd, then they will be sub-sampled to achieve the target sampling rate. For example, if the script used a sampling rate of 1-in-10, then hsflowd would apply a 1-in-10 sampling operation in order to achieve the desired 1-in-100 sampling rate.
The following output from sflowtool shows the contents of an sFlow datagram containing an application transaction sample:
startDatagram ================================= datagramSourceIP 127.0.0.1 datagramSize 136 unixSecondsUTC 1336846925 datagramVersion 5 agentSubId 100000 agent 10.0.0.150 packetSequenceNo 2671 sysUpTime 77521000 samplesInPacket 1 startSample ---------------------- sampleType_tag 0:1 sampleType FLOWSAMPLE sampleSequenceNo 1 sourceId 3:150002 meanSkipCount 10 samplePool 10 dropEvents 0 inputPort 0 outputPort 1073741823 flowBlock_tag 0:2202 flowSampleType applicationOperation application myapp operation user.friend request_bytes 0 response_bytes 0 status SUCCESS duration_uS 0 endSample ---------------------- endDatagram =================================
Note: Anyone familiar with Etsy's StatsD tool will see a similarity in the way sFlow monitoring is embedded in scripts, see Measure Anything, Measure Everything. The main difference is that sFlow application measurements contain additional structure that allows them to be part of a large scale monitoring system linking network switches, hosts and applications together. In addition, sFlow's inclusion of sampled transaction records allows metrics to be broken out into fine detail, making it possible to see how application instances interact and get to the root cause of performance problems.
Finally, the application metrics extension to the sFlow standard and the implementation in hsflowd are still in the early stages. Please try them out and provide feedback. Any comments or suggestions regarding the sFlow metrics should be directed to the sFlow.org mailing list and comments or questions relating to the scripting API should be directed to the host-sflow mailing list.