The nginx-sflow-module project is an open source implementation of sFlow monitoring for the Nginx (pronounced engine x) web server. The module exports the counter and transaction structures discussed in sFlow for HTTP.
The advantage of using sFlow is the scalability it offers for monitoring the performance of large web server clusters or load balancers where request rates are high and conventional logging solutions generate too much data or impose excessive overhead. Real-time monitoring of HTTP provides essential visibility into the performance of large-scale, complex, multi-layer services constructed using Representational State Transfer (REST) architectures. In addition, monitoring HTTP services using sFlow is part of an integrated performance monitoring solution that provides real-time visibility into applications, servers and switches (see sFlow Host Structures).
The nginx-sflow-module software is designed to integrate with the Host sFlow agent to provide a complete picture of server performance. Download, install and configure Host sFlow before proceeding to install nginx-sflow-module - see Installing Host sFlow on a Linux Server. There are a number of options for analyzing cluster performance using Host sFlow, including Ganglia and sFlowTrend.
Note: the nginx-sflow-module picks up its configuration from the Host sFlow agent. The Host sFlow sampling.http setting can be used to override the default sampling setting to set a specific sampling rate for HTTP requests.
Next, download the nginx sources from http://wiki.nginx.org/Install and the nginx-sflow-module sources from http://nginx-sflow-module.googlecode.com/. The following commands compile and install nginx with the sflow-module:
tar -xvzf nginx-sflow-module-0.9.3.tar.gz tar -xvzf nginx-1.0.0.tar.gz cd nginx-1.0.0 ./configure --add-module=/root/nginx-sflow-module-0.9.3 make make install
Once installed, the nginx-sflow-module will stream measurements to a central sFlow Analyzer. Currently the only software that can decode HTTP sFlow is sflowtool. Download, compile and install the latest sflowtool sources on the system your are using to receive sFlow from the servers in the nginx cluster.
Running sflowtool will display output of the form:
[pp@test]$ /usr/local/bin/sflowtool startDatagram ================================= datagramSourceIP 10.0.0.111 datagramSize 116 unixSecondsUTC 1294273499 datagramVersion 5 agentSubId 6486 agent 10.0.0.150 packetSequenceNo 6 sysUpTime 44000 samplesInPacket 1 startSample ---------------------- sampleType_tag 0:2 sampleType COUNTERSSAMPLE sampleSequenceNo 6 sourceId 3:65537 counterBlock_tag 0:2201 http_method_option_count 0 http_method_get_count 247 http_method_head_count 0 http_method_post_count 2 http_method_put_count 0 http_method_delete_count 0 http_method_trace_count 0 http_methd_connect_count 0 http_method_other_count 0 http_status_1XX_count 0 http_status_2XX_count 214 http_status_3XX_count 35 http_status_4XX_count 0 http_status_5XX_count 0 http_status_other_count 0 endSample ---------------------- startSample ---------------------- sampleType_tag 0:1 sampleType FLOWSAMPLE sampleSequenceNo 3434 sourceId 3:65537 meanSkipCount 2 samplePool 7082 dropEvents 0 inputPort 0 outputPort 1073741823 flowBlock_tag 0:2100 extendedType socket4 socket4_ip_protocol 6 socket4_local_ip 10.0.0.150 socket4_remote_ip 10.1.1.63 socket4_local_port 80 socket4_remote_port 61401 flowBlock_tag 0:2201 flowSampleType http http_method 2 http_protocol 1001 http_uri /favicon.ico http_host 10.0.0.150 http_referrer http://10.0.0.150/membase.php http_useragent Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-us) AppleW http_bytes 284 http_duration_uS 335 http_status 404 endSample ---------------------- endDatagram =================================
The -H option causes sflowtool to output the HTTP request samples using the combined log format:
[pp@test]$ /usr/local/bin/sflowtool -H 10.1.1.63 - - [05/Jan/2011:22:39:50 -0800] "GET /membase.php HTTP/1.1" 200 3494 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-us) AppleW" 10.1.1.63 - - [05/Jan/2011:22:39:50 -0800] "GET /favicon.ico HTTP/1.1" 404 284 "http://10.0.0.150/membase.php" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-us) AppleW"
Converting sFlow to combined logfile format allows existing log analyzers to be used to analyze the sFlow data. For example, the following commands use sflowtool and webalizer to create reports:
/usr/local/bin/sflowtool -H | rotatelogs log/http_log & webalizer -o report log/*
The resulting webalizer report shows top URLs:
Finally, the real potential of HTTP sFlow is as part of a broader performance management system providing real-time visibility into applications, servers, storage and networking across the entire data center.
For example, the diagram above shows typical elements in a Web 2.0 data center (e.g. Facebook, Twitter, Wikipedia, Youtube, etc.). A cluster of web servers handles requests from users. Typically, the application logic for the web site will run on the web servers in the form of server side scripts (PHP, Ruby, ASP etc). The web applications access the database to retrieve and update user data. However, the database can quickly become a bottleneck, so a cache is used to store the results of database queries. The combination of sFlow from all the web servers, Memcached servers and network switches provides end-to-end visibility into performance that scales to handle even the largest data center.