Friday, July 1, 2016

Real-time BGP route analytics

The diagram shows how sFlow-RT real-time analytics software can combine BGP route information and sFlow telemetry to generate route analytics. Merging sFlow traffic with BGP route data significantly enhances both data streams:
  1. sFlow real-time traffic data identifies active BGP routes
  2. BGP path attributes are available in flow definitions
The following example demonstrates how to configure sFlow / BGP route analytics. In this example, the switch IP address is 10.0.0.253, the router IP address is 10.0.0.254, and the sFlow-RT address is 10.0.0.162.

Setup

First download sFlow-RT. Next create a configuration file, bgp.js, in the sFlow-RT home directory with the following contents:
var reflectorIP  = '10.0.0.254';
var myAS         = '65162';
var myID         = '10.0.0.162';
var sFlowAgentIP = '10.0.0.253';

// allow BGP connection from reflectorIP
bgpAddNeighbor(reflectorIP,myAS,myID);

// direct sFlow from sFlowAgentIP to reflectorIP routing table
// calculate a 60 second moving average byte rate for each route
bgpAddSource(sFlowAgentIP,reflectorIP,60,'bytes');
The following sFlow-RT System Properties load the configuration file and enable BGP:
  • script.file=bgp.js
  • bgp.start=yes
Start sFlow-RT and the following log lines will confirm that BGP has been enabled and configured:
$ ./start.sh 
2016-06-28T13:14:34-0700 INFO: Listening, BGP port 1179
2016-06-28T13:14:35-0700 INFO: Listening, sFlow port 6343
2016-06-28T13:14:35-0700 INFO: Starting the Jetty [HTTP/1.1] server on port 8008
2016-06-28T13:14:35-0700 INFO: Starting com.sflow.rt.rest.SFlowApplication application
2016-06-28T13:14:35-0700 INFO: Listening, http://localhost:8008
2016-06-28T13:14:36-0700 INFO: bgp.js started
2016-06-28T13:14:36-0700 INFO: bgp.js stopped
Configure the switch (10.0.0.253) to send sFlow to the sFlow-RT instance(10.0.0.162), see Switch configurations for vendor specific configurations. Check the sFlow-RT /agents/html page to verify that sFlow telemetry is being received from the agent.

Next, configure the router (10.0.0.254) to reflect BGP routes to the sFlow-RT instance (10.0.0.162):
router bgp 65254
 bgp router-id 10.0.0.254
 neighbor 10.0.0.162 remote-as 65162
 neighbor 10.0.0.162 port 1179
 neighbor 10.0.0.162 timers connect 30
 neighbor 10.0.0.162 route-reflector-client
 neighbor 10.0.0.162 activate
The following sFlow-RT log entry confirms that a BGP session has been established:
2016-06-28T13:20:17-0700 INFO: BGP open 10.0.0.254 53975

Query active routes

The following cURL command uses the REST API to identify the top 5 IPv4 prefixes ranked by traffic (measured in bytes/second):
curl "http://10.0.0.162:8008/bgp/topprefixes/10.0.0.254/json?maxPrefixes=5
{
 "as": 65254,
 "direction": "destination",
 "id": "10.0.0.254",
 "learnedPrefixesAdded": 691838,
 "learnedPrefixesRemoved": 0,
 "nPrefixes": 691838,
 "pushedPrefixesAdded": 0,
 "pushedPrefixesRemoved": 0,
 "startTime": 1467322582093,
 "state": "established",
 "topPrefixes": [
  {
   "aspath": "NNNN-NNNN-NNNNN-NNNNN",
   "localpref": 100,
   "med": 1,
   "nexthop": "NNN.NNN.NNN.N",
   "origin": "IGP",
   "prefix": "NN.NNN.NN.0/24",
   "value": 9.735462342126082E7
  },
  {
   "aspath": "NNN-NNNN",
   "localpref": 100,
   "med": 1,
   "nexthop": "NNN.NNN.NNN.N",
   "origin": "IGP",
   "prefix": "NN.NNN.NNN.0/24",
   "value": 7.347515546153101E7
  },
  {
   "aspath": "NNNN-NNNNNN-NNNNN",
   "localpref": 100,
   "med": 1,
   "nexthop": "NNN.NNN.NNN.N",
   "origin": "IGP",
   "prefix": "NN.NNN.NN.N/24",
   "value": 4.26137765317916E7
  },
  {
   "aspath": "NNNN-NNNN-NNNN",
   "localpref": 100,
   "med": 1,
   "nexthop": "NNN.NNN.NNN.N",
   "origin": "IGP",
   "prefix": "NNN.NN.NNN.0/24",
   "value": 2.6633190792947102E7
  },
  {
   "aspath": "NNNN-NNN-NNNNN",
   "localpref": 100,
   "med": 10001,
   "nexthop": "NNN.NNN.NNN.NN",
   "origin": "IGP",
   "prefix": "NN.NNN.NNN.0/24",
   "value": 1.5500941476103483E7
  }
 ],
 "valuePercentCoverage": 71.38452058755995,
 "valueTopPrefixes": 2.55577687683634E8,
 "valueTotal": 3.5802956380458355E8
}
In addition to returning the top prefixes, the query returns information about the amount of traffic covered by these prefixes. In this case, the valuePercentageCoverage of 71.38 indicates that 71.38% of the traffic is covered by the top 5 prefixes.
Note: Identifying numeric digits have been substituted with the letter N to protect privacy.
Additional arguments can be used to refine the top prefixes query:
  • maxPrefixes, maximum number of prefixes in the result 
  • minValue, only include entries with a value greater than the threshold
  • direction, specify "ingress" for traffic arriving from remote networks and "egress" for traffic destined for remote networks
  • minPrefix, exclude shorter prefixes, e.g. minPrefix=1 would exclude 0.0.0.0/0.
  • includeCovered, set to "true" to also include prefixes that are covered by the top prefix, but wouldn't otherwise make the list. For example, if 10.1.0.0/16 was included, then 10.1.3.0/24 would also be included if it were in the set of prefixes advertised by the router.
  • pruneCovered, set to "true" to eliminate covered prefixes that share the same next hop.
IPv6 prefixes an be queried using /bgp/topprefixes6/{router}/json, which takes the same arguments as the topprefixes query shown above.

Writing Applications, describes how to build analytics driven controller applications using sFlow-RT's REST and embedded JavaScript APIs. For example, SDN router using merchant silicon top of rack switchWhite box Internet router PoC, and Active Route Manager demonstrate how real-time identification of active routes can be used to efficiently manage limited hardware resources in commodity white box switches in order to handle a full Internet routing table of over 600,000 routes.

Defining Flows

The following flow attributes learned from the BGP session are merged with sFlow data received from switch 10.0.0.253:
  • ipsourcemaskbits
  • ipdestinationmaskbits
  • bgpnexthop
  • bgpnexthop6
  • bgpas
  • bgpsourceas
  • bgpsourcepeeras
  • bgpdestinationas
  • bgpdestinationpeeras
  • bgpdestinationaspath
  • bgpcommunities
  • bgplocalpref
The sFlow-RT /flowkeys/html page can be queried to verify that the attributes have been merged and to see the full set of attributes that are available from the sFlow feed.

Writing Applications describes how to program sFlow-RT flow caches, using the flow keys to select and identify traffic flows. For example, the following Python script uses the REST API to identify the source networks associated with a UDP amplification DDoS attack:
#!/usr/bin/env python
import requests
import json

// DNS port
reflector_port = '53'
max_pps = 100000

rest = 'http://localhost:8008'

# define flow
flow = {'keys':'mask:ipsource,bgpsourceas',
 'filter':'udpsourceport='+reflector_port,
 'value':'frames'}
requests.put(rest+'/flow/ddos/json',data=json.dumps(flow))

# set threshold
threshold = {'metric':'ddos', 'value': max_pps, 'byFlow':True}
requests.put(rest+'/threshold/ddos/json',data=json.dumps(threshold))

# tail even log
eventurl = rest+'/events/json?thresholdID=ddos&maxEvents=10&timeout=60'
eventID = -1
while 1 == 1:
  r = requests.get(eventurl + "&eventID=" + str(eventID))
  if r.status_code != 200: break
  events = r.json()
  if len(events) == 0: continue

  eventID = events[0]["eventID"]
  events.reverse()
  for e in events:
    print e['flowKey']
Running the script generates a log of the source network and AS number that exceed 100,000 packets per second of DNS response traffic (again, identifying numeric digits have been substituted with the letter N to protect privacy):
$ ./ddos.py 
NNN.NNN.0.0/13,NNNN
NNN.NNN.NNN.NNN/27,NNNN
NNN.NN.NNN.NNN/28,NNNNN
NNN.NNN.NN.0/24,NNNNN
A variation on the script can be used to identify large "Elephant" flows and their destination AS paths (showing the list of networks that packets traverse en route to their destination):
#!/usr/bin/env python
import requests
import json

max_Bps = 1000000000/8

rest = 'http://localhost:8009'

# define flow
flow = {
 'keys':'ipsource,ipdestination,tcpsourceport,tcpdestinationport,bgpdestinationaspath',
 'value':'bytes'}
requests.put(rest+'/flow/elephant/json',data=json.dumps(flow))

# set threshold
threshold = {'metric':'elephant', 'value': max_Bps, 'byFlow':True}
requests.put(rest+'/threshold/elephant/json',data=json.dumps(threshold))

# tail even log
eventurl = rest+'/events/json?thresholdID=elephant&maxEvents=10&timeout=60'
eventID = -1
while 1 == 1:
  r = requests.get(eventurl + "&eventID=" + str(eventID))
  if r.status_code != 200: break
  events = r.json()
  if len(events) == 0: continue

  eventID = events[0]["eventID"]
  events.reverse()
  for e in events:
    print e['flowKey']
Running the script generates real-time notification of the Elephant flows (flows exceeding 1Gbit/s) along with their destination AS paths:
$ ./elephant.py 
NNN.NN.NN.NNN,NNN.NNN.NN.NN,60789,25,NNNNN
NNN.NN.NNN.NN,NNN.NN.NN.NNN,443,38016,NNNNN-NNNNN-NNNNN-NNNNN
NN.NNN.NNN.NNN,NNN.NNN.NN.NN,37030,10059,NNNN-NNN-NNNN
NNN.NN.NN.NNN,NN.NN.NNN.NNN,34611,25,NNNN
SDN and large flows describes how a small number of Elephant flows typically consume most of the bandwidth, even though they are greatly outnumbered by small (Mice) flows. Dynamic policy based routing can targeted at Elephant flows to significantly improve performance and manage network resources: Leaf and spine traffic engineering using segment routing and SDN and WAN optimization using real-time traffic analytics are two examples.
Finally, the real-time BGP analytics don't exist in isolation. The diagram shows how the sFlow-RT real-time analytics engine receives a continuous telemetry stream from sFlow instrumentation build into network, server and application infrastructure and delivers analytics through APIs and can easily be integrated with a wide variety of on-site and cloud, orchestration, DevOps and Software Defined Networking (SDN) tools.

No comments:

Post a Comment