Saturday, December 14, 2013

Blacklists

Blacklists are an important way in which the Internet community protects itself by identifying bad actors. However, before using a blacklist, it is important to understand how it is compiled and maintained in order to properly use the list and interpret the significance of a match.

Incorporating blacklists in traffic monitoring can be a useful way to find hosts on a network that have been compromised. If a host interacts with addresses known to be part of a botnet for example, then it raises the concern that the host has been compromised and is itself a member of the botnet.

This article provides an example that demonstrates how the standard sFlow instrumentation build into most vendors switches can be used match traffic against a large blacklist. Black lists can be very large, the list used in this example contains approximately 16,000 domain names and nearly 300,000 CIDRs. Most switches don't have the resources to match traffic against such large lists. However, the article RESTflow describes how sFlow shifts analysis from the switches to external software which can easily handle to task of matching traffic against large lists. This article uses sFlow-RT to perform the black list matching.
Figure 1: Components of sFlow-RT
The following sFlow-RT script (phish.js) makes use of the PhishTank blacklist to identify hosts that may have been compromised by phishing attacks:
include('extras/json2.js');

var server = '10.0.0.1';
var port = 514;
var facility = 16; // local0
var severity = 5;  // notice

var domains = {};
function updatePhish() {
  var phish = JSON.parse(http("http://data.phishtank.com/data/online-valid.json"));
  domains = {};
  var dlist = [];
  var groups = {};
  for(var i = 0; i < phish.length; i++) {
    var entry = phish[i];
    var target = entry.target;
    var id = entry.phish_id;
    var url = entry.url;
    var dnsqname = url.match(/:\/\/(.[^/]+)/)[1] + '.';
    if(!domains[dnsqname]) {
      domains[dnsqname] = id;
      dlist.push(dnsqname);
    }
    var details = entry.details;
    var cidrlist = [];
    for(var j = 0; j < details.length; j++) {
      var ip = details[j].ip_address;
      var cidr = details[j].cidr_block;
      if(cidr) cidrlist.push(cidr);
    }
    if(cidrlist.length > 0) groups["phish." + id] = cidrlist;
  }

  // add in local groups
  groups.other = ['0.0.0.0/0','::/0'];
  groups.private = ['10.0.0.0/8','172.16.0.0/12','192.168.0.0/16','FC00::/7'];
  groups.multicast = ['224.0.0.0/4'];
  setGroups(groups);

  setFlow('phishydns',
    {
      keys:'ipsource,ipdestination,dnsqname,dnsqr',
      value:'frames',
      filter:'dnsqname="'+ dlist + '"',
      log:true,
      flowStart:true
    }
  );
}

setFlowHandler(function(rec) {
  var keys = rec.flowKeys.split(',');
  var msg = {type:'phishing'};
  switch(rec.name) {
  case 'phishysrc':
     msg.victim=keys[0];
     msg.match='cidr';
     msg.phish_id = keys[1].split('.')[1];
     break;
  case 'phishydst':
     msg.victim=keys[0];
     msg.match='cidr';
     msg.phish_id = keys[1].split('.')[1];
     break;
  case 'phishydns':
     var id = domains[keys[2]];
     msg.victim = keys[3] == 'false' ? keys[0] : keys[1];
     msg.match = 'dns';
     msg.phish_id = domains[keys[2]];
     break;
  }
  syslog(server,port,facility,severity,msg);
},['phishysrc','phishydst','phishydns']);


updatePhish();

// update threat database every 24 hours
setIntervalHandler(function() {
  try { updatePhish(); } catch(e) {}
},60*60*24);

setFlow('phishysrc',
  {
    keys:'ipsource,destinationgroup',
    value:'frames',
    filter:'destinationgroup~^phish.*',
    log:true,
    flowStart:true
  }
);

setFlow('phishydest',
  {
    keys:'ipdestination,sourcegroup',
    value:'frames',
    filter:'sourcegroup~^phish.*',
    log:true,
    flowStart:true
  }
);
The following command line arguments should be added to sFlow-RT's start.sh in order to load the script on startup and allocate enough memory to allow the blacklists to be loaded:
-Xmx2000m -Dscript.file=phish.js
A few notes about the script:
  1. The script uses sFlow-RT's setGroups() function to efficiently classify and group IP addresses based on CIDR lists.
  2. The large number of DNS names used in the DNS filter is efficiently compiled and does not impact performance.
  3. The script makes an HTTP call to retrieve updated signatures every 24 hours. If more frequent updates are required then a developer key should be obtained, see Developer Information.
  4. Matches are exported using syslog(), see Exporting events using syslog. The script could easily be modified to post events into other systems, or take control actions, by using the http() function to interact with RESTful APIs.
Network virtualization poses interesting monitoring challenges since compromised hosts may be virtual machines and their traffic may be carried over tunnels (VxLAN, GRE, NVGRE etc.) across the physical network. Fortunately, sFlow monitoring intrinsically provides good visibility into tunnels (see Tunnels) and the sFlow-RT script could easily be modified to examine flows within the tunnels (see Down the rabbit hole) and report inner IP addresses and virtual network identifiers (VNI) for compromised hosts. In addition most virtual switches also support sFlow monitoring, providing direct visibility into inter virtual machine traffic.

Blacklist matching is only one use case for sFlow monitoring - many others have been described on this blog. The ability to pervasively monitor high speed networks at scale and deliver continuous real-time visibility is transformative, allowing many otherwise difficult or impossible tasks to be accomplished with relative ease.

No comments:

Post a Comment