Tuesday, December 24, 2013

Workload placement

Messy and organized closets are an every day example of the efficiency that can be gained by packing items together in a systematic way: randomly throwing items in a closet makes poor use of the space; keeping the closet organized increases the available space.

A recent CBS 60 Minutes Amazon segment describes the ultimate closet - an Amazon order fulfillment warehouse. Each vast warehouse looks like a chaotic jumble - something out of Raiders of the Lost Ark.
Even when you get up close to an individual shelf, there still appears to be no organizing principle. Interviewer Charlie Rose comments, "The products are then placed by stackers in what seems to outsiders as a haphazard way… a book on Buddhism and Zen resting next to Mrs. Potato Head…"
Amazon's Dave Clark explains, "Can those two things, you look at how these items fit in the bin. They’re optimized for utilizing the available space. And we have computers and algorithmic work that tells people the areas of the building that have the most space to put product in that’s coming in at that time. Amazon has become so efficient with its stacking, it can now store twice as many goods in its centers as it did five years ago."

The 60 Minutes piece goes on to discuss Amazon Web Services (AWS). There are interesting parallels between managing a cloud data center and managing a warehouse (both of which Amazon does extremely well). There is a fixed amount of physical compute, storage and bandwidth resources in the data center, but instead of having to find shelf space to store physical goods, the data center manager needs to find a server with enough spare capacity to run each new virtual machine.

Just as a physical object has a size, shape and weight that constrain where it can be placed, virtual machines have characteristics such as number of virtual CPUs, memory, storage and network bandwidth that determine how many virtual machines can be placed on each physical server (see Amazon EC2 Instances). For example, an Amazon m1.small instance provides 1 virtual CPU, 1.7 GiB RAM, and 160 GB storage. A simplistic packing scheme would allow 6 small instances to be hosted on a physical server with 8 CPU cores, 32 GiB RAM, and 1 TB disk. This allocation scheme is limited by the amount of disk space and leaves CPU cores and RAM unused.

While the analogy between a data center and a warehouse is interesting, there are distinct differences between computational workloads and physical goods that are important to consider. One of the motivating factors driving the move to virtualization was the realization that most physical servers were poorly utilized. Moving to virtual machines allowed multiple workloads to be combined and run on a single physical server, increasing utilization and reducing costs. Continuing the EC2 example, if measurement revealed that the m1.small instances where only using 80GB of storage, additional instances could be placed on the server by over subscribing the storage.
The Wired article, Return of the Borg: How Twitter Rebuilt Google’s Secret Weapon, describes Google's internally developed workload packing software and the strategic value it has for Google's business.
Amazon has been able to double the capacity of their physical warehouses by using bar code tracking and computer orchestration algorithms. Assuming analytics driven workload placement in data centers can drive a similar increase workload density, what impact would that have for a cloud hosting provider?

Suppose a data center is operating with a gross margin of 20%. Leveraging the sFlow standard for measurement doesn't add to costs since the capability is embedded in most vendor's data center switches, and open source sFlow agents can easily be deployed on hypervisors using orchestration tools. Real-time analytics software is required to turn the raw measurements into actionable data, however, the cost of this software is a negligible part of the overall cost of running the data center. On the other hand, doubling the number of virtual machines that can be hosted in the data center (and assuming that there is sufficient demand to fill this additional capacity) doubles the top line revenue and triples the gross margin to 60%.

One can argue about the assumptions in the example, but playing around with different assumptions and models, it is clear that workload placement has great potential for increasing the efficiency and profitability of cloud data centers. Where the puck is going: analytics describes the vital role for analytics in SDN orchestration stacks, including: VMware (NSX), Cisco, Open Daylight, etc. The article predicts that there will be increase merger and acquisition activity in 2014 as orchestration vendors compete by integrating analytics into their platforms.

Finally, while analytics offers attractive opportunities, a lack of visibility and poorly placed workloads carries significant risks. In SDN market predictions for New Year: NFV, OpenFlow, Open vSwitch boom, Eric Hanselman of 451 Research poses the question, "Will data center overlays hit a wall in 2014?"  He then goes on to state, "There is a point at which the overlay is going to be constrained by the mechanics of the network underneath... Data center operators will want the ability to do dynamic configuration and traffic management on the physical network and tie that management and control into application-layer orchestration."

Saturday, December 14, 2013

Blacklists

Blacklists are an important way in which the Internet community protects itself by identifying bad actors. However, before using a blacklist, it is important to understand how it is compiled and maintained in order to properly use the list and interpret the significance of a match.

Incorporating blacklists in traffic monitoring can be a useful way to find hosts on a network that have been compromised. If a host interacts with addresses known to be part of a botnet for example, then it raises the concern that the host has been compromised and is itself a member of the botnet.

This article provides an example that demonstrates how the standard sFlow instrumentation build into most vendors switches can be used match traffic against a large blacklist. Black lists can be very large, the list used in this example contains approximately 16,000 domain names and nearly 300,000 CIDRs. Most switches don't have the resources to match traffic against such large lists. However, the article RESTflow describes how sFlow shifts analysis from the switches to external software which can easily handle to task of matching traffic against large lists. This article uses sFlow-RT to perform the black list matching.
Figure 1: Components of sFlow-RT
The following sFlow-RT script (phish.js) makes use of the PhishTank blacklist to identify hosts that may have been compromised by phishing attacks:
include('extras/json2.js');

var server = '10.0.0.1';
var port = 514;
var facility = 16; // local0
var severity = 5;  // notice

var domains = {};
function updatePhish() {
  var phish = JSON.parse(http("http://data.phishtank.com/data/online-valid.json"));
  domains = {};
  var dlist = [];
  var groups = {};
  for(var i = 0; i < phish.length; i++) {
    var entry = phish[i];
    var target = entry.target;
    var id = entry.phish_id;
    var url = entry.url;
    var dnsqname = url.match(/:\/\/(.[^/]+)/)[1] + '.';
    if(!domains[dnsqname]) {
      domains[dnsqname] = id;
      dlist.push(dnsqname);
    }
    var details = entry.details;
    var cidrlist = [];
    for(var j = 0; j < details.length; j++) {
      var ip = details[j].ip_address;
      var cidr = details[j].cidr_block;
      if(cidr) cidrlist.push(cidr);
    }
    if(cidrlist.length > 0) groups["phish." + id] = cidrlist;
  }

  // add in local groups
  groups.other = ['0.0.0.0/0','::/0'];
  groups.private = ['10.0.0.0/8','172.16.0.0/12','192.168.0.0/16','FC00::/7'];
  groups.multicast = ['224.0.0.0/4'];
  setGroups(groups);

  setFlow('phishydns',
    {
      keys:'ipsource,ipdestination,dnsqname,dnsqr',
      value:'frames',
      filter:'dnsqname="'+ dlist + '"',
      log:true,
      flowStart:true
    }
  );
}

setFlowHandler(function(rec) {
  var keys = rec.flowKeys.split(',');
  var msg = {type:'phishing'};
  switch(rec.name) {
  case 'phishysrc':
     msg.victim=keys[0];
     msg.match='cidr';
     msg.phish_id = keys[1].split('.')[1];
     break;
  case 'phishydst':
     msg.victim=keys[0];
     msg.match='cidr';
     msg.phish_id = keys[1].split('.')[1];
     break;
  case 'phishydns':
     var id = domains[keys[2]];
     msg.victim = keys[3] == 'false' ? keys[0] : keys[1];
     msg.match = 'dns';
     msg.phish_id = domains[keys[2]];
     break;
  }
  syslog(server,port,facility,severity,msg);
},['phishysrc','phishydst','phishydns']);


updatePhish();

// update threat database every 24 hours
setIntervalHandler(function() {
  try { updatePhish(); } catch(e) {}
},60*60*24);

setFlow('phishysrc',
  {
    keys:'ipsource,destinationgroup',
    value:'frames',
    filter:'destinationgroup~^phish.*',
    log:true,
    flowStart:true
  }
);

setFlow('phishydest',
  {
    keys:'ipdestination,sourcegroup',
    value:'frames',
    filter:'sourcegroup~^phish.*',
    log:true,
    flowStart:true
  }
);
The following command line arguments should be added to sFlow-RT's start.sh in order to load the script on startup and allocate enough memory to allow the blacklists to be loaded:
-Xmx2000m -Dscript.file=phish.js
A few notes about the script:
  1. The script uses sFlow-RT's setGroups() function to efficiently classify and group IP addresses based on CIDR lists.
  2. The large number of DNS names used in the DNS filter is efficiently compiled and does not impact performance.
  3. The script makes an HTTP call to retrieve updated signatures every 24 hours. If more frequent updates are required then a developer key should be obtained, see Developer Information.
  4. Matches are exported using syslog(), see Exporting events using syslog. The script could easily be modified to post events into other systems, or take control actions, by using the http() function to interact with RESTful APIs.
Network virtualization poses interesting monitoring challenges since compromised hosts may be virtual machines and their traffic may be carried over tunnels (VxLAN, GRE, NVGRE etc.) across the physical network. Fortunately, sFlow monitoring intrinsically provides good visibility into tunnels (see Tunnels) and the sFlow-RT script could easily be modified to examine flows within the tunnels (see Down the rabbit hole) and report inner IP addresses and virtual network identifiers (VNI) for compromised hosts. In addition most virtual switches also support sFlow monitoring, providing direct visibility into inter virtual machine traffic.

Blacklist matching is only one use case for sFlow monitoring - many others have been described on this blog. The ability to pervasively monitor high speed networks at scale and deliver continuous real-time visibility is transformative, allowing many otherwise difficult or impossible tasks to be accomplished with relative ease.

Saturday, December 7, 2013

ovs-ofctl

The ovs-ofctl command line tool that ships with Open vSwitch provides a very convenient way to interact with OpenFlow forwarding rules, not just with Open vSwitch, but with any switch that can be configured to accept passive connections from an OpenFlow controller.

This article looks takes the example in Integrated hybrid OpenFlow and repeats it without an OpenFlow controller, using ovs-ofctl instead.

First start Mininet without a controller and configure the switch to listen for OpenFlow commands:
sudo mn --topo single,3 --controller none --listenport 6633
Next use enable normal forwarding in the switch:
ovs-ofctl add-flow tcp:127.0.0.1 priority=10,action=normal
The following command blocks traffic from host 1 (10.0.0.1):
ovs-ofctl add-flow tcp:127.0.0.1 priority=11,dl_type=0x0800,nw_src=10.0.0.1,action=drop
The following command removes the block:
ovs-ofctl --strict del-flows tcp:127.0.0.1 priority=11,dl_type=0x0800,nw_src=10.0.0.1
Finally, modify the controller script with the following block() and allow() functions:
function addFlow(spec) {
  runCmd(['ovs-ofctl','add-flow','tcp:127.0.0.1',spec.join(',')]);
}

function removeFlow(spec) {
  runCmd(['ovs-ofctl','--strict','del-flows','tcp:127.0.0.1',spec.join(',')]);
}

function block(address) {
  if(!controls[address]) {
     addFlow(['priority=11','dl_type=0x0800','nw_src=' + address,'action=drop']);
     controls[address] = { action:'block', time: (new Date()).getTime() };
  }
}

function allow(address) {
  if(controls[address]) {
     removeFlow(['priority=11','dl_type=0x0800','nw_src=' + address]);
     delete controls[address];
  }
}
Moving from Mininet to a production setting is simply a matter of modifying the script to connect to the remote switch, configuring the switch to listen for OpenFlow commands, and configuring the switch to send sFlow data to sFlow-RT.

DDoS mitigation is only one use case for large flow control, others described on this blog include: ECMP / LAG load balancing, traffic marking and packet capture. This script can be modified to address these different use cases. The Mininet test bed provides a useful way to test hybrid OpenFlow control schemes before moving them into production using physical switches that support integrated hybrid OpenFlow and sFlow.

Tuesday, December 3, 2013

Integrated hybrid OpenFlow

Figure 1: Hybrid Programmable Forwarding Planes
Figure 1 shows two models for hybrid OpenFlow deployment, allowing OpenFlow to be used in conjunction with existing routing protocols. The Ships-in-the-Night model divides the switch into two, allocating selected ports to external OpenFlow control and the remaining ports are left to the internal control plane. It is not clear how useful this model is, other than for experimentation.

The Integrated hybrid model is much more interesting since it can be used to combine the best attributes of OpenFlow and existing distributed routing protocols to deliver robust solutions. The OpenFlow 1.3.1 specification includes supports for the integrated hybrid model by defining the NORMAL action:
Optional: NORMAL: Represents the traditional non-OpenFlow pipeline of the switch (see 5.1). Can be used only as an output port and processes the packet using the normal pipeline. If the switch cannot forward packets from the OpenFlow pipeline to the normal pipeline, it must indicate that it does not support this action.
Hybrid solutions leverage the full capabilities of vendor and merchant silicon which efficiently support distributed forwarding protocols. In addition, most switch and merchant silicon vendors embed support for the sFlow standard, allowing the fabric controller to rapidly detect large flows and apply OpenFlow forwarding rules to control these flows.

Existing switching silicon is often criticized for the limited size of the hardware forwarding tables, supporting too few general match OpenFlow forwarding rules to be useful in production settings. However, consider that SDN and large flows defines a large flow as a flow that consumes 10% of a link's bandwidth. Using this definition, a 48 port switch would require a maximum of 480 general match rules in order to steer all large flows, well within the capabilities of current hardware (see OpenFlow Switching Performance: Not All TCAM Is Created Equal).

This article will use the Mininet testbed described in Controlling large flows with OpenFlow to experiment with using integrated hybrid forwarding to selectively control large flows, leaving the remaining flows to the switch's NORMAL forwarding pipeline.
Figure 2: MiniNet as an SDN test platform
The following command uses Mininet to emulate a simple topology with one switch and three hosts:
$ sudo mn --topo single,3 --controller=remote,ip=127.0.0.1
The next command enables sFlow on the switch:
sudo ovs-vsctl -- --id=@sflow create sflow agent=eth0  target=\"127.0.0.1:6343\" sampling=10 polling=20 -- -- set bridge s1 sflow=@sflow
Floodlight's Static Flow Pusher API will be used to insert OpenFlow rules in the switch. The default Floodlight configuration implements packet forwarding, disabling the forwarding module requires configuration changes:
  1. Copy the default properties file target/bin/floodlightdefault.properties to static.properties
  2. Edit the file to remove the line net.floodlightcontroller.forwarding.Forwarding,\
  3. Copy the floodlight.sh script to floodlight_static.sh
  4. Modify the last line of the script to invoke the properties, java ${JVM_OPTS} -Dlogback.configurationFile=${FL_LOGBACK} -jar ${FL_JAR} -cf static.properties
Update 22 December, 2013 Thanks to Jason Parraga, the following modules are the minimum set needed to support the Static Flow Pusher functionality in the Floodlight properties file:
floodlight.modules=\
net.floodlightcontroller.counter.CounterStore,\
net.floodlightcontroller.storage.memory.MemoryStorageSource,\
net.floodlightcontroller.core.internal.FloodlightProvider,\
net.floodlightcontroller.staticflowentry.StaticFlowEntryPusher,\
net.floodlightcontroller.perfmon.PktInProcessingTime,\
net.floodlightcontroller.ui.web.StaticWebRoutable
Start Floodlight with the forwarding module disabled:
cd floodlight
$ ./floodlight_static.sh
The following sFlow-RT script is based on the DDoS script described in Embedded SDN applications:
include('extras/json2.js');

var flowkeys = 'ipsource';
var value = 'frames';
var filter = 'outputifindex!=discard&direction=ingress&sourcegroup=external';
var threshold = 1000;
var groups = {'external':['0.0.0.0/0'],'internal':['10.0.0.2/32']};

var metricName = 'ddos';
var controls = {};
var enabled = true;
var blockSeconds = 20;

var flowpusher = 'http://localhost:8080/wm/staticflowentrypusher/json';

function clearOpenFlow() {
  http('http://localhost:8080/wm/staticflowentrypusher/clear/all/json');
}

function setOpenFlow(spec) {
  http(flowpusher, 'post','application/json',JSON.stringify(spec));
}

function deleteOpenFlow(spec) {
  http(flowpusher, 'delete','application/json',JSON.stringify(spec));
}

function block(address) {
  if(!controls[address]) {
     setOpenFlow({name:'block-' + address, switch:'00:00:00:00:00:01',
                  cookie:'0', priority:'11', active: true,
                  'ether-type':'0x0800', 'src-ip': address, actions:""});
     controls[address] = { action:'block', time: (new Date()).getTime() };
  }
}

function allow(address) {
  if(controls[address]) {
     deleteOpenFlow({name:'block-' + address});
     delete controls[address];
  }
}

setEventHandler(function(evt) {
  if(!enabled) return;

  var addr = evt.flowKey;
  block(addr);  
},[metricName]);

setIntervalHandler(function() {
  // remove stale controls
  var stale = [];
  var now = (new Date()).getTime();
  var threshMs = 1000 * blockSeconds;
  for(var addr in controls) {
    if((now - controls[addr].time) > threshMs) stale.push(addr);
  }
  for(var i = 0; i < stale.length; i++) allow(stale[i]);
},10);

setHttpHandler(function(request) {
  var result = {};
  try {
    var action = '' + request.query.action;
    switch(action) {
    case 'block':
       var address = request.query.address[0];
       if(address) block(address);
        break;
    case 'allow':
       var address = request.query.address[0];
       if(address) allow(address);
       break;
    case 'enable':
      enabled = true;
      break;
    case 'disable':
      enabled = false;
      break;
    }
  }
  catch(e) { result.error = e.message }
  result.controls = controls;
  result.enabled = enabled;
  return JSON.stringify(result);
});

setGroups(groups);
setFlow(metricName,{keys:flowkeys,value:value,filter:filter});
setThreshold(metricName,{metric:metricName,value:threshold,byFlow:true,timeout:5});

clearOpenFlow();
setOpenFlow({name:'normal',switch:"00:00:00:00:00:01",cookie:"0",
             priority:"10",active:true,actions:"output=normal"});
The following command line argument loads the script on startup:
-Dscript.file=normal.js
Some notes on the script:
  1. The intervalHandler() function is used to automatically release controls after 20 seconds
  2. The clearOpenFlow() function is used to remove any existing flow entries at startup
  3. The last line in the script defined the NORMAL forwarding action for all packets on the switch using a priority of 10
  4. Blocking rules are added for specific addresses using a higher priority of 11
Open a web browser to view a trend of traffic and then perform the following steps:
  1. disable the controller
  2. perform a simulated DoS attack (using a flood ping)
  3. enable the controller
  4. simulate a second DoS attack

Figure 3: DDoS attack traffic with and without controller
Figure 3 shows the results of the demonstration. When the controller is disabled, the attack traffic exceeds 6,000 packets per second and persists until the attacker stops sending. When the controller is enabled, traffic is stopped the instant it hits the 1,000 packet per second threshold in the application. The control is removed 20 seconds later and re-triggers if the attacker is still sending traffic.

DDoS mitigation is only one use case large flow control, others described on this blog include: ECMP / LAG load balancing, traffic marking and packet capture. This script can be modified to address these different use cases. The Mininet test bed provides a useful way to test hybrid OpenFlow control schemes before moving them into production using physical switches that support integrated hybrid OpenFlow.