Sunday, August 27, 2017

Cumulus Linux 3.4 REST API

The latest Cumulus Linux 3.4 release include a REST API. This article will demonstrate how the REST API can be used to automatically deploy traffic controls based on real-time sFlow telemetry. DDoS mitigation with Cumulus Linux describes how sFlow-RT can detect Distributed Denial of Service (DDoS) attacks in real-time and deploy automated controls.

The following ddos.js script is modified to use the REST API to send Network Command Line Utility - NCLU commands to add and remove ACLs, see Installing and Managing ACL Rules with NCLU:
var user = "cumulus";
var password = "CumulusLinux!";
var thresh = 10000;
var block_minutes = 1;

setFlow('udp_target',{keys:'ipdestination,udpsourceport',value:'frames'});

setThreshold('attack',{metric:'udp_target', value:thresh, byFlow:true, timeout:10});

function restCmds(agent,cmds) {
  for(var i = 0; i < cmds.length; i++) {
    let msg = {cmd:cmds[i]};
    http("https://"+agent+":8080/nclu/v1/rpc",
         "post","application/json",JSON.stringify(msg),user,password);
  }
}

var controls = {};
var id = 0;
setEventHandler(function(evt) {
  var key = evt.agent + ',' + evt.flowKey;
  if(controls[key]) return;

  var ifname = metric(evt.agent,evt.dataSource+".ifname")[0].metricValue;
  if(!ifname) return;

  var now = (new Date()).getTime();
  var name = 'ddos'+id++;
  var [ip,port] = evt.flowKey.split(',');
  var cmds = [
    'add acl ipv4 '+name+' drop udp source-ip any source-port '+port+' dest-ip '+ip+' dest-port any',
    'add int '+ifname+' acl ipv4 '+name+' inbound',
    'commit'
  ];
  controls[key] = {time:now, target: ip, port: port, agent:evt.agent, metric:evt.dataSource+'.'+evt.metric, key:evt.flowKey, name:name};
  try { restCmds(evt.agent, cmds); }
  catch(e) { logSevere('failed to add ACL, '+e); }
  logInfo('block target='+ip+' port='+port+' agent=' + evt.agent); 
},['attack']);

setIntervalHandler(function() {
  var now = (new Date()).getTime();
  for(var key in controls) {
    if(now - controls[key].time < 1000 * 60 * block_minutes) continue;
    var ctl = controls[key];
    if(thresholdTriggered('attack',ctl.agent,ctl.metric,ctl.key)) continue;

    delete controls[key];
    var cmds = [
      'del acl ipv4 '+ctl.name,
      'commit'
    ];
    try { restCmds(ctl.agent,cmds); }
    catch(e) { logSevere('failed to remove ACL, ' + e); }
    logInfo('allow target='+ctl.target+' port='+ctl.port+' agent='+ctl.agent);
  }
});
The quickest way test the script is to use docker to run sFlow-RT:
docker run -v $PWD/ddos.js:/sflow-rt/ddos.js \
-e "RTPROP=-Dscript.file=ddos.js -Dhttp.timeout.read=60000" \
-p 6343:6343/udp -p 8008:8008 sflow/sflow-rt
This solution can be tested using freely available software. The setup shown at the top of this article was constructed using a Cumulus VX virtual machine running on VirtualBox.  The Attacker and Target virtual machines are Linux virtual machines used to simulate the DDoS attack.

DNS amplification attack can be simulated using hping3. Run the following command on the Attacker host:
sudo hping3 --flood --udp -k -s 53 192.168.2.1
Run tcpdump on the Target host to see if the attack is getting through:
sudo tcpdump -i eth1 udp port 53
Each time an attack is launched a new ACL will be added that matches the attack signature and drops the traffic. The ACL is kept in place for at least block_minutes and removed once the attack ends. The following sFlow-RT log messages show the results:
2017-08-26T17:01:24+0000 INFO: Listening, sFlow port 6343
2017-08-26T17:01:24+0000 INFO: Listening, HTTP port 8008
2017-08-26T17:01:24+0000 INFO: ddos.js started
2017-08-26T17:03:07+0000 INFO: block target=192.168.2.1 port=53 agent=10.0.0.61
2017-08-26T17:03:49+0000 INFO: allow target=192.168.2.1 port=53 agent=10.0.0.61
REST API for Cumulus Linux ACLs describes the acl_server daemon that was used in the original article. The acl_server daemon is optimized for real-time performance, supporting use cases in which multiple traffic controls need to be quickly added and removed, e.g  DDoS mitigation, marking large flows, ECMP load balancing, packet brokers.

A key benefit of the openness of Cumulus Linux is that you can install software to suite your use case, other examples include: BGP FlowSpec on white box switchInternet router using Cumulus LinuxTopology discovery with Cumulus LinuxBlack hole detection, and Docker networking with IPVLAN and Cumulus Linux.

25 comments:

  1. I have tried to run this code yet it does not seem to work. I've tried to add "net add acl ..." instead of "add acl .." as the NCLU commands start with "net", yet it did not help ( the script runs and I can see that it has created the filters in sflow GUI), I've configured the agent correctly and I can see it on sflow, however, when launch the attack, no acl is added, I've created the ddos.js in the sflow directory, and started using the "env "RTPROP=-Dscript.file=ddos.js" ./start.sh" command , what am I doing wrong?

    ReplyDelete
    Replies
    1. It sounds like you have everything set correctly. The "net" prefix is implied by the CLI rest API so you shouldn't need to modify the script, see "show counters" example in HTTP API.

      Are any messages logged by the script? It can take up to 30 seconds to commit the changes and for the filter to have an effect. How are you testing to see if the script is working? If you log into the switch and run "net show configuration acl" you should be able to verify that the ACL has been installed correctly. The sFlow feed from the switch will still show traffic since packets are sampled on ingress. You would need to check the downstream to verify that the packets have been dropped.

      You might also check out REST API for Cumulus Linux ACLs. This method of controlling ACLs is much faster and more reliable. You would need to modify the acl_server script to use a port other than 8080 so that it doesn't clash with the new Cumulus REST API. Ideally, the functionality should be integrated under the new Cumulus Linux REST API to share authentication etc. The following article contains a DDoS mitigation script that uses the acl_server API, DDoS mitigation with Cumulus Linux.

      Delete
    2. Thanks for the reply Peter. I did test the HTTP API, the show counter did work, and I've tested another command :

      8.122.21:8080/nclu/v1/rpc": "show configuration interface swp1"}' https://192.16
      interface swp1
      address 10.10.11.5/24

      however when I run the script above, nothing happens, the remote ubuntu host (the host is configured as an sflow-rt host) shows that ddos.js is running but nothing else. The GUI shows that the Cumulus Vx switch (agent) does exist and the filters are set. I can see the incoming traffic, but the filters do not block and the ACL does not get created. is the problem is that I'm using sflow-rt? I tired using "DDoS mitigation with Cumulus Linux." example, yet when I ran the script I got a message saying that "-Dflow.sumegress=yes" command doesnt exsist. When I run it using env "RTPROP=-Dscr......." ./start , I get a warning that doos.js#1 java.io.fileNotFoundException : extras/json2.js (no such file or directory), and the script stops running. What seems to be the problem? any Ideas?

      Delete
    3. The "include('extras/json2.js');" statement from the older script should be removed since the JavaScript engine now natively supports JSON.

      Are you using CumulusVX? Since CumulusVX doesn't have an ASIC that would normally perform the packet sampling function, you need to use the following command on the CumulusVX switch to sample packets:

      sudo iptables -I FORWARD -j NFLOG --nflog-group 1 --nflog-prefix SFLOW

      Once you have packet samples sFlow-RT should be able to detect the DDoS attack.

      Delete
    4. I am using CumulusVX 3.4. I did applay the command, and I can see the destination ip address (targeted IP) on the sFlow's GUI ( the "Flows" tab), I can see as well the source port 53, and the amount of framse varying. the threshold is set, however, no event is being recorded once the threshold has been exceeded, I checked the tcpdump on the target, and the packet flood is getting through. the problem is that the ACL is not being implemented. I did use curl command to test the API'S (the POST example), and they are working fine for many show commands.

      any idea what seems to be the problem? is it possible to get a look at your configurations?

      Delete
    5. How long are you running your tests? You should wait a minute or two after starting sFlow-RT before generating the DDoS attack. This gives time for sFlow-RT to learn the ifNames from the sFlow stream (you need at least one polling interval to learn the names). The eventHandler() function returns if ifname isn't known since it is needed as an argument when creating the ACL using nclu.

      Delete
  2. this is my current topology for the test:

    attacker<-->Router<-->CumulusVX<-->Router<-->target
    |
    |
    switch<--->sFlow(DDoS script)
    |
    internet

    the script should run on the remote sflow machine? I think the topology is ok, feedback?

    I've test the following command to test the API remotely:

    curl -X POST -k -u user:pw -H "Content-Type: application/json" -d '{"cmd": "show counters"}' https://1.1.1.1:8080/nclu/v1/rpc

    ofcourse after changing, the username , password and IP address. I was able to get multiple results, not only for the show counters, but for ospf configurations on the cumulusVX such as interfaces etc.

    the question is, how should I know if I have did write configurations on the "/etc/nginx-restapi-chassis.conf"

    when ran the following command "sudo nginx -c /etc/nginx-restapi-chassis.conf -t"

    I got a successful test, warning free status

    is there something else needed to be configured with the chassis.conf beside, the normal instructions?

    regarding the sampling and polling value, I uncommenting the default value wthin hsflow.conf, should it be another than the default?

    I'm also uncommenting an option stating to listen to JSON application on certain pre-defined port number, should I leave it uncommented?

    Regards


    ReplyDelete
    Replies
    1. The topology looks good. The default sampling and polling values should be fine.

      You should configuring the /etc/nginx-restapi.conf. The nginx-restapi-chassis.conf is only used for chassis switches.

      You can leave the JSON API open on hsflowd if you want, although it probably isn't needed for this use case.

      Delete
  3. sorry the CumulusVX is supposed to be connected to sflow and internet via a switch and not the target.

    ReplyDelete
  4. it seems that the nginy-restapi-restapi.conf is pre-configured for CumulusVX, cause i checked it out and everything seems alright. should I though leave the option listen [::]:8080 as it is (default setting)?

    I run the example or adding a Layer2 bridge br212 using Curl PUT and the bridge was created successfully, however it took a couple of minutes to take effect. and this is the output once I've checked it on the switch

    Name Master Speed MTU Mode Remote Host Remote Port Summary
    -- ------ -------- ------- ----- ------------ ------------- --------------- ----------------------------------------------------
    UP lo None N/A 65536 Loopback IP: 127.0.0.1/8, ::1/128
    UP eth0 None 1G 1500 Mgmt IP: 192.168.122.133/24(DHCP)
    UP swp1 None 1G 1500 Interface/L3 R1 FastEthernet0/0 IP: 10.10.11.2/24
    UP swp2 None 1G 1500 Interface/L3 R2 FastEthernet0/0 IP: 10.10.22.2/24
    UP br212 None N/A 1500 Bridge/L2 802.1q Tag: Untagged STP: Disabled Vlan Aware Bridge

    what I dont get, how come the attack flow is being detected, target ip, source port and amout of frame. Yet the threshold is not being triggerd and the event is not being logged? the sFLow has a complete information about the switch and its interface as I've checked in the "agent" tab. is the problem with code failing to execute? or the Problem is with CumulusVX Platform on GNS3? the Curl commands though worked fine as I mentioned above

    ReplyDelete
    Replies
    1. The threshold is 10,000 packets per second. Are you generating that amount of traffic? You can monitor the flow by clicking on the entry in the sFlow-RT flows page. Edit the script and change the thresh variable to choose a different threshold.

      Delete
  5. I did change the threshold vlaue to 300, and it seemed to work some how. the attacked is being detected on the sFlow-RT flows page. the attack has been as well logged and recorded on the sFlow-RT Events page, as in I can see the attack details. As for the graph it keep showing spikes, once the traffic exceeds the 300 threshold vlaue, it drops down below 300 immediatly, yet it rises again aftwards and drops again below 300. it is some how similar to this graph
    https://robertscribbler.files.wordpress.com/2016/05/stephan-rahmstorf-temperature-anomaly.jpg

    yet the sflow graph is not curving up like the one in the link.

    the kali-linux shows that there's 100% packet loss. yet the target's tcpdump output shows that the packets are being received.

    the sflow script runs and prints the following message:

    date/time...-0500 SERVERE : failed to remove ACL , InternalError:Malformed URL java.net.malformedURLException: For input string: "agent:MAC:address:8080" (ddos.js#13)

    date/time...-500 INFO: block target=192.168.22.2 port=53 agent=agent:MAC:address

    any feedback? cause I'm fully understanding if the attack is being mitigated.

    thanks alot your help and patience

    Regards

    ReplyDelete
    Replies
    1. If you have a threshold as low as 300 then you should reduce the configured sampling rate. Since CumulusVX is handling much less traffic then a hardware switch, try setting it to something like 40.

      You will continue to see traffic reported in the sFlow feed from the switch after the ACL is inserted since packets are sampled at ingress (before they are dropped). You can verify that the packets are being dropped using tcpdump at the victim.

      I don't think the ACL is being implemented correctly. I don't understand why the sFlow agent is being reported as agent:MAC:address. It should be the IP address of the Cumulux VX switch. What do you see when you click on the sFlow-RT Agents tab?

      Delete
  6. I did reduce the sampling rate for sampling.1G interface to 40. nothing changed though. You indeed are right the ACL is not being created because the udp traffic is being captured by the tcpdump on the victim. the graph still looks the same, keeps spiking above and below 300, as clarified yesterday.

    Sorry it was my mistake it didnt check properly. its not a mac address, it is IPv6 of the management interface (eth0) on the cumulus

    when check sFlow-RT Agents tab, I can see the agent, but instead of seeing it as IPv4, I'm seeing as IPv6.

    I think thats why the ACL is not being created. and thats I'm getting the error:

    SEVERE: failed to add ACL, InternatlError: Malformed URL java.net.MalformedURLException: for input string: "agentIPv6:8080" (ddos.js#13)

    Im gonna try something by forcing the CumulusVX having an IPv4 and no IPv6, cause due to my connection via the VMware, it managed to get an IPv6 via my home router.

    or what do you think?

    ReplyDelete
    Replies
    1. You should be able to use the IPv6 address to communicate, just modify the http line to put the IPv6 address in square brackets, i.e.

      http("https://["+agent+"]:8080/nclu/v1/rpc"

      Delete
  7. I did eventually used your method of using the IPv6 for the agent. and it worked. however, if I've understood it right, the target gets blocked, so even if you lauch another attack from another machine, the attckers wont be able to access the target. Until the scripts unblocks the target again through ( allow target)? did I understand it right?

    thanks alot for your help

    ReplyDelete
    Replies
    1. The script doesn't match all traffic to the destination, it also includes the UDP source port in the filter, so traffic from other ports, or TCP traffic would get through. This behavior is designed to block UDP reflection attacks. There are typically too many source addresses to block them individually.

      Delete
  8. ok noted. During my testing, I've noticed that, when launch attack, it gets successfully mitigated, GUI and Tcpdump proves as well the acl in the CumulusVX and the CLI prints block then allow after the blocking time has passed. However, when lauch a 2nd attack after the target has been allowed again, the 2nd attack doesn't get mitigated or detected, does the script run once only? something wrong with my virtual environment?

    I also had a question in mind, actually 2, what if I wanna run the script on than a single agent? and what if I wanna run more than a script on the same agent? does sFLow support multi-scripting?

    Regards

    ReplyDelete
    Replies
    1. The script is designed to run multiple times - I am not sure whey it isn't working the second time in your environment. The nclu commit can take up to 30 seconds so it might be that is is just very slow.

      The REST API for Cumulus Linux ACLs script is much faster and more scaleable.

      The script can handle multiple sFlow agents. You can run multiple concurrent scripts / applications on a single sFlow-RT instance.

      Delete
  9. RESTful control of Cumulus Linux ACLs describes how to integrate real-time control of ACLs with the Cumulus Linux 3.4 HTTP API.

    ReplyDelete
  10. Thanks Peter, I'll read through and run a test

    ReplyDelete
  11. I think I kind of troubleshooted the problem for why is the 2nd attack is not being mitigated, I nocticed that if dont issue the command

    sudo iptables -I FORWARD -j NFLOG --nflog-group 1 --nflog-prefix SFLOW

    after each attack, the next attack will not be detected or mitigated, is there any clarification for this issue?

    I was wondering as about the "data source" field in the Event page, what does it mean, for me it was showing "3"

    Regards

    ReplyDelete
    Replies
    1. Data source 3 is the interface with SNMP ifIndex=3.

      The problem with resetting the NFLOG rule each time CumulusVX adds/removes an ACL isn't a problem with real hardware (since packet sampling is performed by the ASIC and doesn't involve iptables). I don't have a fix for CumulusVX.

      Delete
    2. okay, I'd like to acknowledge your help in my Master thesis, do you have an official website or details that I can acknowledge you through?

      Delete
    3. Thanks. This article is on my personal blog. There is a LinkedIn link at the bottom of the right hand side column for personal information.

      Delete