sFlow: RESTflow

Wednesday, August 28, 2013

RESTflow

Figure 1: Embedded, on-switch flow cache with flow record export

This article describes RESTflow™, a new method for exporting flow records that has significant advantages over current approaches to flow export.

A flow record summarizes a set of packets that share common attributes - for example, a typical flow record includes ingress interface, source IP address, destination IP address, IP protocol, source TCP/UDP port, destination TCP/UDP port, IP ToS, start time, end time, packet count and byte count.

Figure 1 shows the steps performed by the switch in order to construct flow records. First the stream of packets is likely to be sampled (particularly in high-speed switches). Next, the sampled packet header is decoded to extract key fields. A hash function is computed over the keys in order to look up the flow record in the flow cache. If an existing record is found, its values are updated, otherwise a record is created for the new flow. Records are flushed from the cache based on protocol information (e.g. if a FIN flag is seen in a TCP packet), a timeout, inactivity, or when the cache is full. The flushed records are finally sent to the traffic analysis application using one of the many formats that switches use to export flow records (e.g. NetFlow, IPFIX, J-Flow, NetStream, etc.).

Figure 2: External software flow cache with flow record export

Figure 2 shows the relationship between the widely supported sFlow standard for packet export and flow export. With sFlow monitoring, the decode, hash, flow cache and flush functionality are no longer implemented on the switch. Instead, sampled packet headers are immediately sent to the traffic analysis application which decodes the packets and analyzes the data. In typical deployments, large numbers of switches stream sFlow data to a central sFlow analyzer. In addition, sFlow provides a polling function; switches periodically send standard interface counters to the traffic analysis applications, eliminating the need for SNMP polling, see Link utilization.

There are significant advantages to moving the flow cache to external software: the article, Superlinear, discusses some of the scaleability implications of on device flow caches and, Rapidly detecting large flows, sFlow vs. NetFlow/IPFIX, describes how on device flow caches delay measurements and makes them less useful for software defined networking (SDN) applications.

The following example uses the sFlow-RT analyzer to demonstrate flow record export based on sFlow packet data received from a network of switches.

Figure 3: Performance aware software defined networking

Figure 3 from Performance aware software defined networking shows how sFlow-RT exposes the active flow cache to applications that address important use cases, such as DDoS mitigation, large flow load balancing, multi-tenant performance isolation, traffic engineering, and packet capture.

The recent extension of the REST API to support flow record export provides a useful log of network activity that can be incorporated in security, information and event management (SIEM) tools.

Three types of query combine to deliver the RESTflow flexible flow definition and export interface:

1. Define flow cache

The following command instructs the central sFlow-RT analytics engine running on host 10.0.0.162 to build a flow cache for TCP flows and log the completed flows:

curl -H "Content-Type:application/json" -X PUT --data '{"keys":"ipsource,ipdestination,tcpsourceport,tcpdestinationport", "value":"bytes", "log":true}' http://10.0.0.162:8008/flow/tcp/json

What might not be apparent is that the single configuration command to sFlow-RT enabled network wide monitoring of TCP connections, even in a network containing hundreds of physical switches, thousands of virtual switches, different switch models, multiple vendors etc. In contrast, if devices maintain their own flow caches then each switch needs to be re-configured whenever monitoring requirements change - typically a time consuming and complex manual process, see Software defined analytics.

To illustrate the point, the following command defines an additional network wide flow cache for records describing DNS (UDP port 53) requests and log the completed flows:

curl -H "Content-Type:application/json" -X PUT --data '{"keys":"ipsource", "value":"frames", "filter":"udpdestinationport=53", "log":true}' http://10.0.0.162:8008/flow/dns/json

2. Query flow cache definition

The following command retrieves the flow definitions:

$ curl http://10.0.0.162:8008/flow/json
{
 "dns": {
  "filter": "udpdestinationport=53",
  "fs": ",",
  "keys": "ipsource",
  "log": true,
  "n": 5,
  "t": 2,
  "value": "frames"
 },
 "tcp": {
  "fs": ",",
  "keys": "ipsource,ipdestination,tcpsourceport,tcpdestinationport",
  "log": true,
  "n": 5,
  "t": 2,
  "value": "bytes"
 }
}

The definition for a specific flow can also be retrieved:

$ curl http://10.0.0.162:8008/flow/tcp/json
{
 "fs": ",",
 "keys": "ipsource,ipdestination,tcpsourceport,tcpdestinationport",
 "log": true,
 "n": 5,
 "t": 2,
 "value": "bytes"
}

3. Retrieve flow records

The following command retrieves flow records logged by all the flow caches:

curl http://10.0.0.162:8008/flows/json?maxFlows=2
[
 {
  "agent": "10.0.0.20",
  "dataSource": "2",
  "end": 1377658682679,
  "flowID": 250,
  "flowKeys": "10.0.0.162",
  "name": "dns",
  "start": 1377658553679,
  "value": 400
 },
 {
  "agent": "10.0.0.20",
  "dataSource": "5",
  "end": 1377658681678,
  "flowID": 249,
  "flowKeys": "10.0.0.20,10.0.0.236,47571,3260",
  "name": "tcp",
  "start": 1377658613678,
  "value": 1217600
 }
]

And the following command retrieves flow records from a specific cache:

$ curl "http://10.0.0.162:8008/flows/json?name=dns&maxFlows=2"
[
 {
  "agent": "10.0.0.28",
  "dataSource": "53",
  "end": 1377658938378,
  "flowID": 400,
  "flowKeys": "10.0.0.162",
  "name": "dns",
  "start": 1377658398378,
  "value": 400
 },
 {
  "agent": "10.0.0.20",
  "dataSource": "2",
  "end": 1377658682679,
  "flowID": 251,
  "flowKeys": "10.0.0.71",
  "name": "dns",
  "start": 1377658612679,
  "value": 400
 }
]

The JSON encoded text based output is easy to read and widely supported by programming tools.

Transporting large amounts of flow data using a text based protocol might seem inefficient when compared to binary flow record export protocols such as IPFIX, NetFlow etc. However, one of the advantages of a REST API is that it builds on the mature and extensive capabilities built into the HTTP protocol stack. For example, most HTTP clients are capable of handling compression and will set the HTTP Accept-Encoding header to indicate that they are willing to accept compressed data. The sFlow-RT web server responds by compressing the data before sending it, resulting in a 20 times reduction in data volume. Similarly, using a REST API, allows users to leverage the existing infrastructure to load balance, encrypt, authenticate, cache and proxy requests.

The real power of the RESTflow API becomes apparent when it is accessed programmatically. For example, the following Python script defines the TCP flow described earlier and continuously retrieves new flow records:

#!/usr/bin/env python
import requests
import json
import signal

rt = 'http://10.0.0.162:8008'
name = 'tcp'

def sig_handler(signal,frame):
  requests.delete(rt + '/flow/' + name + '/json');
  exit(0)
signal.signal(signal.SIGINT, sig_handler)

flow = {'keys':'ipsource,ipdestination,tcpsourceport,tcpdestinationport',
        'value':'frames',
        'log':True}
r = requests.put(rt + '/flow/' + name + '/json',data=json.dumps(flow))

flowurl = rt + '/flows/json?name=' + name + '&maxFlows=100&timeout=60'
flowID = -1
while 1 == 1:
  r = requests.get(flowurl + "&flowID=" + str(flowID))
  if r.status_code != 200: break
  flows = r.json()
  if len(flows) == 0: continue

  flowID = flows[0]["flowID"]
  flows.reverse()
  for f in flows:
    print str(f['flowKeys']) + ',' + str(int(f['value'])) + ',' + str(f['end'] - f['start']) + ',' + f['agent'] + ',' + str(f['dataSource'])

The following command runs the script, which results in the newly arriving flow records being printed as comma separated text:

$ ./tcp_flows.py 
10.0.0.16,10.0.0.236,38834,3260,4000,98100,10.0.0.16,5
10.0.0.151,10.0.0.152,39046,22,837800,60000,10.0.0.28,2
10.0.0.151,10.0.0.152,39046,22,851433,60399,10.0.0.20,25
10.0.0.20,10.0.0.16,443,48859,12597,64000,10.0.0.253,1
10.0.0.152,10.0.0.151,22,39046,67049,61800,10.0.0.28,19

Instead of simply printing the flow records, the script could easily add them to scale out databases like MongoDB so that they can be combined with other types of information and easily searched.

The sFlow-RT REST API doesn't just provide access to completed flows, access to real-time information on in progress flows is available by querying the central flow cache. For example, the following command searches the flow cache and reports the most active flow in the network (based on current data transfer rate, i.e. bits/second).

$ curl http://10.0.0.162:8008/metric/ALL/tcp/json
[{
 "agent": "10.0.0.28",
 "dataSource": "2",
 "metricN": 9,
 "metricName": "tcp",
 "metricValue": 29958.06882871171,
 "topKeys": [
  {
   "key": "10.0.0.20,10.0.0.28,443,40870",
   "updateTime": 1377664899679,
   "value": 29958.06882871171
  },
  {
   "key": "10.0.0.236,10.0.0.28,3260,56044",
   "updateTime": 1377664888679,
   "value": 23.751630816369214
  }
 ],
 "updateTime": 1377664899679
}]

As well as identifying the most active flow, the query result also identifies the switch and port carrying the traffic (out of potentially tens of thousands of ports being monitored).

While flow records are a useful log of completed flows, the ability to track flows in real time transforms traffic monitoring from a reporting tool to a powerful driver for active control, unlocking the capabilities of software defined networking to dynamically adapt the network to changing demand. Embedded flow caches in networking devices are not easily accessible and even if there were a programmatic way to access the on device cache, polling thousands of devices would take so long that the information would be stale by the time it was retrieved.

Figure 4: Visibility and the software defined data center

Looking at the big picture, flow export is only one of many functions that can be performed by an sFlow analyzer, some of which have been described on this blog. Providing simple, programmatic, access allows these functions to be integrated into the broader orchestration system. REST APIs are the obvious choice since they are already widely used in data center orchestration and monitoring tools.

Embedded flow monitoring solutions typically require CLI access to the network devices to define flow caches and direct flow record export. Access to switch configurations is tightly controlled by the network management team and configuration changes are often limited to maintenance windows. In part this conservatism results because hardware resource limitations on the devices need to be carefully managed - for example, a misconfigured flow cache can destabilize the performance of the switch. In contrast, the central sFlow analyzer is software running on a server with relatively abundant resources that can safely support large numbers of requests without any risk of destabilizing the network.

The REST APIs in sFlow-RT are part of a broader movement to break out of the networking silo and integrate management of network resources with the orchestration tools used to automatically manage compute, storage and application resources. Automation transforms the network from a fragile static resource into a robust and flexible resource that can be adapted to support the changing demands of the applications it supports.

9 comments:

PeterSeptember 19, 2013 at 7:36 AM
Have you enabled packet sampling on your sFlow agent?

ULOG on Linux

Mininet/OVS
ReplyDelete
Replies
daniMay 8, 2015 at 12:00 PM
I see the flows dumped by the application, is there a way to get Vlan associated with these flows ?

Thanks
ReplyDelete
Replies
UnknownMarch 27, 2018 at 10:01 AM
Hi,

I have a Machine Learning algorithm and I would like to take the flow statistics as an input to my Algorithm. Do you have an idea on how I can do that?
ReplyDelete
Replies
PeterJuly 21, 2022 at 5:04 AM
Flows are only logged after the activeTimeout (60 seconds in your example), so you won't expect the /flows/json query to generate results until a minute after the flows have started. The activeFlows query gives you a real-time view of the flows as they happen and should give immediate results.

The /agents/json query can be used to verify that you are receiving sFlow packet samples.
ReplyDelete
Replies

Add comment