Wednesday, January 15, 2014

Large flow marking using hybrid OpenFlow

Top of rack switches are in a unique position at the edge of the network to implement traffic engineering controls. Marking large flows describes a use case for dynamically detecting and marking large flows as they enter the network:
Figure 1: Marking large flows
Physical switch hybrid OpenFlow example described how real-time sFlow analytics can be used to trigger OpenFlow controls to block denial of service attacks. This article will describe how the sFlow-RT, Floodlight OpenFlow controller, and Alcatel-Lucent OmniSwitch hybrid OpenFlow SDN controller setup can be programmed to dynamically detect and mark large (Elephant) flows as they enter the network.
Figure 2: Large flow marking controller results
In the experimental setup, a flood ping is used to generate a large flow:
ping -f 10.0.0.238 -s 1400
Figure 2 shows the results, the left half of the chart shows traffic when the controller is disabled and the right half shows traffic when the controller is enabled. The blue line trends the largest unmarked flow seen in the network and the gold line shows the largest marked flow. When controller is disabled, none of the traffic is marked. When the controller is enabled, sFlow-RT detects the large flow within a second and makes a call to Floodlight's Static Flow Pusher API to create a rule that matches the IP source and destination addresses of the large flow with actions to set the IP Type of Service bits and forward the packet using the normal forwarding path. The Floodlight controller pushes an OpenFlow rule to the switch. The upstream is also sending sFlow data to sFlow-RT and so the marked traffic be detected and reported by sFlow-RT, confirming that the control has in fact been implemented.

The controller logic is implemented by the following embedded script running within sFlow-RT:
include('extras/aluws.js');

var flowkeys = 'ipsource,ipdestination';
var value = 'bytes';
var filter = 'direction=ingress';

var trigger = 100000;
var release = 100;

var tos = '0x4';

var metricName = 'mark';
var id = 0;
var controls = {};
var enabled = true;

var user = 'admin';
var password = 'password';
var sampling = 128;
var polling = 30;

var collectorIP = "10.0.0.162";
var collectorPort = 8343;

// Floodlight OpenFlow Controller REST API
var floodlight = 'http://10.0.0.53:8080/';
var listswitches = floodlight+'wm/core/controller/switches/json';
var flowpusher = floodlight+'wm/staticflowentrypusher/json';
var clearflows = floodlight+'wm/staticflowentrypusher/clear/all/json'; 

function clearOpenFlow() {
  http(clearflows);
}

function setOpenFlow(spec) {
  http(flowpusher, 'post','application/json',JSON.stringify(spec));
}

function deleteOpenFlow(spec) {
  http(flowpusher, 'delete','application/json',JSON.stringify(spec));
}

var agents = {};
function discoverAgents() {
  var res = http(listswitches);
  var dps = JSON.parse(res);
  for(var i = 0; i < dps.length; i++) {
    var dp = dps[i];
    var agent = dp.inetAddress.match(/\/(.*):/)[1];
    var ports = dp.ports;
    var nameToNumber = {};
    var names = [];
    // get ifName to OpenFlow port number mapping
    // and list of OpenFlow enabled ports
    for (var j = 0; j < dp.ports.length; j++) {
      var port = dp.ports[j];
      var name = port.name.match(/^port (.*)$/)[1];
      names.push(name);
      nameToNumber[name] = port.portNumber;
    }
    agents[agent] = {dpid:dp.dpid,names:names,nameToNumber:nameToNumber}; 
  }
}

function initializeAgent(agent) {
  var rec = agents[agent];
  var server = new ALUServer(agent,user,password);
  rec.server = server;

  var ports = rec.names.join(' ');

  server.login();

  // configure sFlow
  server.runCmds([
    'sflow agent ip ' + agent,
    'sflow receiver 1 name InMon address '+collectorIP+' udp-port '+collectorPort,
    'sflow sampler 1 port '+ports+' receiver 1 rate '+sampling,
    'sflow poller 1 port '+ports+' receiver 1 interval '+polling
  ]);

  // get ifIndex to ifName mapping
  var res = server.rest('get','mib','ifXTable',{mibObject0:'ifName'});
  var rows = res.result.data.rows;
  var ifIndexToName = {};
  for(var ifIndex in rows) ifIndexToName[ifIndex] = rows[ifIndex].ifName;

  server.logout();

  agents[agent].ifIndexToName = ifIndexToName;
}

function mark(agent,dataSource,flowkey) {
  if(controls[flowkey]) return;

  var rec = agents[agent];
  if(!rec) return;

  var name = 'ctl' + id++;
  var parts = flowkey.split(',');
  setOpenFlow({name:name,switch:rec.dpid,cookie:0,
               priority:500,active:true,
               'ether-type':'0x0800','src-ip':parts[0],'dst-ip':parts[1],
               actions:'set-tos-bits='+tos+',output=normal'});

    controls[flowkey] = { 
 name: name, 
 agent:agent,
        dataSource:dataSource,
 action:'mark', 
 time: (new Date()).getTime() 
    };
}

function unmark(flowkey) {
  if(!controls[flowkey]) return;

  deleteOpenFlow({name:controls[flowkey].name});
  delete controls[flowkey];
}

setEventHandler(function(evt) {
  if(!enabled) return;

  mark(evt.agent,evt.dataSource,evt.flowKey);
}, [metricName]);


setIntervalHandler(function() {
  // remove controls when flow below release threshold
  var stale = [];
  for(var flowkey in controls) {
    var ctl = controls[flowkey];
    var val = flowvalue(ctl.agent,ctl.dataSource+'.'+metricName,flowkey);
    if(!val || val <= release) stale.push(flowkey);
  }
  for(var i = 0; i < stale.length; i++) unmark(stale[i]);
},5);


setHttpHandler(function(request) {
  var result = {};
  try {
    var action = '' + request.query.action;
    switch(action) {
    case 'enable':
      enabled = true;
      break;
    case 'disable':
      enabled = false;
      break;
    case 'clear':
      clearOpenFlow();
      controls = {};
      break;
    }
  }
  catch(e) { result.error = e.message }
  result.controls = controls;
  result.enabled = enabled;
  return JSON.stringify(result);
});

discoverAgents();
for(var agent in agents) {
    initializeAgent(agent);
}

setFlow(metricName,{keys:flowkeys,value:value,filter:filter});
setThreshold(metricName,{metric:metricName,value:trigger,byFlow:true,timeout:10});
The following command line argument loads the script on startup:
-D script.file=omniofmark.js
Some notes on the script:
  1. A call to the Floodlight REST API is used to discover the set of switches, their IP addresses and OpenFlow datapath identifiers, ports, port names and OpenFlow port numbers.
  2. The initializeAgent() function uses OmniSwitch Web Services API is used to configure sFlow on the switches and ports that are controllable using OpenFlow/Floodlight.
  3. A threshold is set to trigger an event when a flow exceeds 100,000 bytes/second
  4. The eventHandler() is triggered when large flows are detected and it calls the mark() function to push a control to Floodlight.
  5. The mark() function extracts source and destination IP address information from the flowkey and constructs a Static Flow Pusher message that matches the flow. The key to making this example work is a switch that is able to implement the actions set-tos-bits=0x4,output=normal  These actions instruct the switch to mark the traffic by setting the IP TOS bits and then use the normal hardware forwarding path.
  6. The intervalHander() function runs every 5 seconds and checks the traffic levels of each of the large flows that are being controlled. If the flow is no longer detectable or below the release threshold  of 100 bytes/second then Floodlight is instructed to remove the rule, freeing up hardware resources for new large flows.
Large flow marking is only one use case for large flow control, others described on this blog include: DDoS mitigation, ECMP / LAG load balancing, blacklists, and packet capture. Scripts can be added to address these different use cases, as well as providing information on network health and server performance to operations teams (see Exporting events using syslog and Metric export to Graphite)

2 comments:

  1. I have some customers that spike to the threshold for 1 or 2 seconds before it decreases, how can I set a threshold to only trigger if a user has exceeded the value for 10 seconds in a row? Cant seem to find this anywhere

    ReplyDelete
    Replies
    1. Firstly, I would recommend that you use the latest version of sFlow-RT (it has a built-in hybrid OpenFlow controller that greatly simplifies operational deployments):

      Performance optimizing hybrid OpenFlow controller.

      Next, you can adjust the sensitivity of the controller by setting the t: value when you define a flow. The value is expressed in seconds and a setting of 10 would mean that a flow would have to be at the threshold value for 10 seconds, or double the threshold for 5 seconds to trigger an event.

      The t: value in the flow definition and the threshold value: together determine the sensitivity of the controller to large flows.

      Delete