Wednesday, May 8, 2013


The sflow/haproxy project is an implementation of the sFlow HTTP standard for the open source HAProxy high performance TCP/HTTP load balancer. Load balancers are used to virtualize scale out service pools: clients connect to a virtual IP address and service port associated with the load balancer which selects a member of the server pool to handle the request. This architecture provides operational flexibility, allowing servers to be added and removed from the pool as demand changes.
The load balancer is uniquely positioned to provide information on the overall performance of the entire service pool and link the performance seen by clients with the behavior of individual servers in the pool. The advantage of using sFlow to monitor performance is the scalability it offers when request rates are high and conventional logging solutions generate too much data or impose excessive overhead. In addition, monitoring HTTP services using sFlow is part of an integrated monitoring system that spans the data center, providing real-time visibility into application, server and network performance.

The sflow/haproxy software is designed to integrate with the Host sFlow agent to provide a complete picture of proxy performance. Download, install and configure Host sFlow before proceeding to install sflow/haproxy - see Installing Host sFlow on a Linux Server.

Note: the sflow/haproxy agent picks up its configuration from the Host sFlow agent. The sampling.http setting can be used to override the default sampling setting to set a specific sampling rate for HTTP requests.

The following commands download and install the sFlow instrumented version of HAProxy on a Linux server:
git clone
cd haproxy
make TARGET=linux26 USE_SFLOW=yes
make install
Once installed and configured, HAProxy will stream measurements to a central sFlow Analyzer. Download, compile and install the sflowtool on the system your are using to receive sFlow to see the raw data and verify that the measurements are being received.

Running sflowtool will display output of the form:
$ sflowtool
startDatagram =================================
datagramSize 564
unixSecondsUTC 1368058148
datagramVersion 5
agentSubId 80
packetSequenceNo 23
sysUpTime 417000
samplesInPacket 2
startSample ----------------------
sampleType_tag 0:2
sampleSequenceNo 1
sourceId 3:80
counterBlock_tag 0:2201
http_method_option_count 0
http_method_get_count 71
http_method_head_count 0
http_method_post_count 0
http_method_put_count 0
http_method_delete_count 0
http_method_trace_count 0
http_methd_connect_count 0
http_method_other_count 2
http_status_1XX_count 0
http_status_2XX_count 26
http_status_3XX_count 24
http_status_4XX_count 23
http_status_5XX_count 0
http_status_other_count 0
endSample   ----------------------
startSample ----------------------
sampleType_tag 0:1
sampleSequenceNo 71
sourceId 3:80
meanSkipCount 1
samplePool 71
dropEvents 0
inputPort 0
outputPort 1073741823
flowBlock_tag 0:2102
extendedType proxy_socket4
proxy_socket4_ip_protocol 6
proxy_socket4_local_port 0
proxy_socket4_remote_port 80
flowBlock_tag 0:2100
extendedType socket4
socket4_ip_protocol 6
socket4_local_port 0
socket4_remote_port 57642
flowBlock_tag 0:2206
flowSampleType http
http_method 2
http_protocol 1001
http_uri GET /games/animals.php HTTP/1.1
http_useragent Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.31 (KHTML
http_mimetype text/html; charset=UTF-8
http_request_bytes 346
http_bytes 487
http_duration_uS 13000
http_status 200
endSample   ----------------------
endDatagram   =================================
There are two types of sFlow record shown: COUNTERSAMPLE and FLOWSAMPLE data. The counters are useful for trending overall performance using tools like Ganglia and Graphite. Using sflowtool to output combined logfile format makes the data available to most logfile analyzers.
Note: The highlighted IP addresses in the FLOWSAMPLE correspond to addresses in the diagram and illustrate how request records from the proxy link clients to the back end servers. 
A native sFlow analyzer like sFlowTrend can combine the counters, flows and host performance metrics to provide an integrated view of performance.
Installing sFlow agents on the backend web servers further extends visibility: implementations are available for Apache, NGINX, Tomcat and node.js. Application logic running on the servers can also be instrumented with sFlow, see Scripting languages. Back end Memcache, Java and virtualization pools can also be instrumented with sFlow. sFlow agents embedded in physical and virtual switches provide visibility into the network.

Comprehensive end to end visibility in multi-tiered environments allows the powerful control capabilities of the load balancers to be used to greatest effect: regulating traffic between tiers, protecting overloaded backend systems, defending against denial of service attacks, moving resources from over provisioned pools to under provisioned pools.

The sFlow-RT real-time analytics engine makes the full set of sFlow metrics accessible through a RESTful API so that they can be used to drive automation. A future article will explore how sFlow metrics can be used to control HAProxy behavior (by issuing UnixSocketCommands).

No comments:

Post a Comment