Wednesday, March 11, 2015

Topology discovery with Cumulus Linux

Demo: Implementing the OpenStack Design Guide in the Cumulus Workbench is a great demonstration of the power of zero touch provisioning and automation. When the switches and servers boot they automatically pick up their operating systems and configurations for the complex network shown in the diagram.
REST API for Cumulus Linux ACLs describes a REST server for remotely controlling ACLs on Cumulus Linux. This article will discuss recently added topology discovery methods that allow an SDN controller to learn topology and apply targeted controls (e.g Large "Elephant" flow marking, Large flow steering, DDoS mitigation, etc.).

Prescriptive Topology Manager

Complex Topology and Wiring Validation in Data Centers describes how Cumulus Networks' prescriptive topology manager (PTM) provides a simple method of verifying and enforcing correct wiring topologies.

The following REST call converts the topology from PTM's dot notation and returns a JSON representation:
cumulus@wbench:~$ curl http://leaf1:8080/ptm
Returns the result:
{
 "links": {
  "L1": {
   "node1": "leaf1", 
   "node2": "spine1", 
   "port1": "swp1s0", 
   "port2": "swp49"
  },
  ...
 }
}

LLDP

Prescriptive Topology Manager is preferred since it ensures that the discovered topology is correct. However, PTM builds on basic Link Level Discovery Protocol (LLDP), which provides an alternative method of topology discovery.

The following REST call return the hostname:
cumulus@wbench:~$ curl http://leaf1:8080/hostname
Returns result:
"leaf1"
The following REST call returns LLDP neighbor information:
cumulus@wbench:~$ curl http://leaf1:8080/lldp/neighbors

Returns result:
{
   "lldp": [
     {
       "interface": [
         {
           "name": "eth0",
           "via": "LLDP",
           "chassis": [
             {
               "id": [
                 {
                   "type": "mac",
                   "value": "6c:64:1a:00:2e:7f"
                 }
               ],
               "name": [
                 {
                   "value": "colo-tor-3"
                 }
               ]
             }
           ],
           "port": [
             {
               "id": [
                 {
                   "type": "ifname",
                   "value": "swp10"
                 }
               ],
               "descr": [
                 {
                   "value": "swp10"
                 }
               ]
             }
           ]
         },
         ...
     }
   ]
 }
The following REST call returns LLDP configuration information:
cumulus@wbench:~$ curl http://leaf1:8080/lldp/configuration
Returns result:
{
   "configuration": [
     {
       "config": [
         {
           "tx-delay": [
             {
               "value": "30"
             }
           ],
           ...
         }
       ]
     }
   ]
 }

Topology discovery with LLDP

The script lldp.py extracts LLDP data from all the switches in the network and compiles a topology:
#!/usr/bin/env python

import sys, re, fileinput, json, requests

switch_list = ['leaf1','leaf2','spine1','spine2']

l = 0
linkdb = {}
links = {}
for switch_name in switch_list:
  # verify that lldp configuration exports hostname,ifname information
  r = requests.get("http://%s:8080/lldp/configuration" % (switch_name));
  if r.status_code != 200: continue
  config = r.json()
  lldp_hostname = config['configuration'][0]['config'][0]['hostname'][0]['value']
  if lldp_hostname != '(none)': continue
  lldp_porttype = config['configuration'][0]['config'][0]['lldp_portid-type'][0]['value']
  if lldp_porttype != 'ifname': continue
  # local hostname 
  r = requests.get("http://%s:8080/hostname" % (switch_name));
  if r.status_code != 200: continue
  host = r.json()
  # get neighbors
  r = requests.get("http://%s:8080/lldp/neighbors" % (switch_name));
  if r.status_code != 200: continue
  neighbors = r.json()
  interfaces = neighbors['lldp'][0]['interface']
  for i in interfaces:
    # local port name
    port = i['name']
    # neighboring hostname
    nhost = i['chassis'][0]['name'][0]['value']
    # neighboring port name
    nport = i['port'][0]['descr'][0]['value']
    if not host or not port or not nhost or not nport: continue
    if host < nhost:
      link = {'node1':host,'port1':port,'node2':nhost,'port2':nport}
    else:
      link = {'node1':nhost,'port1':nport,'node2':host,'port2':port}
    keystr = "%s %s -- %s %s" % (link['node1'],link['port1'],link['node2'],link['port2'])
    if keystr in linkdb:
       # check consistency
       prev = linkdb[keystr]
       if (link['node1'] != prev['node1'] 
           or link['port1'] != prev['port1']
           or link['node2'] != prev['node2']
           or link['port2'] != prev['port2']): raise Exception('Mismatched LLDP', keystr)
    else:
       linkdb[keystr] = link
       linkname = 'L%d' % (l)
       links[linkname] = link
       l += 1

top = {'links':links}               
print json.dumps(top,sort_keys=True, indent=1)
Returns result:
cumulus@wbench:~$ ./lldp.py 
{
 "links": {
  "L0": {
   "node1": "colo-tor-3", 
   "node2": "leaf1", 
   "port1": "swp10", 
   "port2": "eth0"
  }, 
  ...
 }
}
The lldp.py script and the latest version of acl_server can be found on Github, https://github.com/pphaal/acl_server/

Demonstration

Fabric visibility with Cumulus Linux demonstrates the visibility into network performance provided by Cumulus Linux support for the sFlow standard (see Cumulus Networks, sFlow and data center automation). The screen shot shows 10Gbit/s Elephant flows traversing the network shown at the top of this article. The flows between server1 and server2 were generated using iperf tests running in a continuous loop.

The acl_server and sFlow agents are installed on the leaf1, leaf2, spine1, and spine2 switches. By default, the sFlow agents automatically pick up their settings using DNS Service Discovery (DNS-SD). Adding the following entry in the wbench DNS server zone file, /etc/bind/zones/lab.local.zone, enables sFlow on the switches and directs measurements to the wbench host:
_sflow._udp     30      SRV     0 0 6343 wbench
Note: For more information on running sFlow in the Cumulus workbench, see Demo: Monitoring Traffic on Cumulus Switches with sFlow). Another point to note, this workbench setup demonstrates the visibility into Link Aggregation (LAG) provides by sFlow (see Link aggregation).

Fabric View is installed on wbench and is configured with the network topology obtained from acl_server. The web interface is accessed through the workbench reverse proxy, but access is also possible using a VPN (see Setting up OpenVPN on the Cumulus Workbench).
This workbench example automatically provisions an OpenStack cluster on the two servers along with the network to connect them. In much the same way OpenStack provides access to virtual resources, Cumulus' Remote Lab leverages the automation capabilities of open hardware to provide multi-tenant access to physical servers and networks.
Finally, Cumulus Linux runs on open switch hardware from Agema, Dell, Edge-Core, Penguin Computing, Quanta. In addition, Hewlett-Packard recently announced that they will soon be selling a new line of open network switches built by Accton Technologies and support Cumulus Linux. This article, demonstrates the flexibility that open networking offers to developers and network administrators. If you are curious, its very easy to give Cumulus Linux a try.

No comments:

Post a Comment