Saturday, January 24, 2015

Fabric visibility with Arista EOS

A leaf and spine fabric is challenging to monitor. The fabric spreads traffic across all the switches and links in order to maximize bandwidth. Unlike traditional hierarchical network designs, where a small number of links can be monitored to provide visibility, a leaf and spine network has no special links or switches where running CLI commands or attaching a probe would provide visibility. Even if it were possible to attach probes, the effective bandwidth of a leaf and spine network can be as high as a Petabit/second, well beyond the capabilities of current generation monitoring tools.

The 2 minute video provides an overview of some of the performance challenges with leaf and spine fabrics and demonstrates Fabric View - a monitoring solution that leverages industry standard sFlow instrumentation in commodity data center switches to provide real-time visibility into fabric performance.

Fabric View is free to try, just register at http://www.myinmon.com/ and request an evaluation. The software requires an accurate network topology in order to characterize performance and this article will describe how to obtain the topology from a fabric of Arista Networks switches.

Arista EOS™ includes the eAPI JSON-RPC service for programmatic monitoring and control. The article Arista eAPI 101 introduces eAPI and describes how to enable the service in EOS. Enable eAPI on all the switches in the fabric.

Configure all the switches in the leaf and spine fabric to send sFlow to the Fabric View server. The following script demonstrates how sFlow can be configured programmatically using an eAPI script:
#!/usr/bin/env python

import requests
import json
import signal
from jsonrpclib import Server

switch_list = ['switch1.example.com','switch2.example.com']
username = "admin"
password = "password"

sflow_collector = "192.168.56.1"
sflow_port = "6343"
sflow_polling = "20"
sflow_sampling = "10000"

for switch_name in switch_list:
  switch = Server("https://%s:%s@%s/command-api" %
                (username, password, switch_name))
  response = switch.runCmds(1,
   ["enable",
    "configure",
    "sflow source %s" % switch_ip,
    "sflow destination %s %s" % (sflow_collector, sflow_port),
    "sflow polling-interval %s" % sflow_polling,
    "sflow sample output interface",
    "sflow sample dangerous %s" % sflow_sampling,
    "sflow run"])
Next use the following eAPI script to discover the topology:
#/usr/bin/python 
'''
Copyright (c) 2015, Arista Networks, Inc. All rights reserved.
 
Redistribution and use in source and binary forms, with or without 
modification, are permitted provided that the following conditions are met:
 
 * Redistributions of source code must retain the above copyright notice, 
   this list of conditions and the following disclaimer. 

 * Redistributions in binary form must reproduce the above copyright notice, 
   this list of conditions and the following disclaimer in the documentation 
   and/or other materials provided with the distribution. 

 * Neither the name of Arista Networks nor the names of its contributors 
   may be used to endorse or promote products derived from this software 
   without specific prior written permission.
 
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL ARISTA NETWORKS BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
'''

# v0.5 - initial version of the script to discover network topology using
# Arista eAPI and generate output in json format recognized by sFlow-RT.

from jsonrpclib import Server 
import json 
from pprint import pprint

# define switch in your topology, eapi transport protocol (http or https),
# eapi username and password
switch_list = ['switch1.example.com','switch2.example.com']
eapi_transport = 'https'
eapi_username = 'admin'
eapi_password = 'password'

debug = False

# internal variables used by the script
allports = {}
allswitches = {}
allneighbors = []
alllinks = {}

# method to populate allswitches and allports - called only from processNeighbor()
def addPort(switchname, switchIP, portname, ifindex):
 id = switchname + '>' + portname
 prt = allports.setdefault(id, { "portname": portname, "linked": False })
 if ifindex is not None:
  prt["ifindex"] = ifindex
 sw = allswitches.setdefault(switchname, { "name": switchname, "agent": switchIP, "ports": {} });
 if switchIP is not None:
  sw["agent"] = switchIP
 sw["ports"][portname] = prt

# method to collect neighbor records - called with each LLDP neighbor 
# entry as they are discovered
def processNeighbor(localname,localip,localport,localifindex,remotename,remoteport):
 addPort(localname, localip, localport,localifindex);
 addPort(remotename, None, remoteport, None);
 allneighbors.append({ "localname": localname, "localport": localport,
         "remotename": remotename, "remoteport": remoteport });

# method to remove agents that we did not discover properly, or
# that we did not intend to include in the topology.  (If we
# assigned an agent field to the switch then we assume it should stay.)
def pruneAgents():
 for nm,sw in allswitches.items():
  #if not "agent" in sw:
  if sw['agent'] == '0.0.0.0' or not sw['agent']:
   del allswitches[nm]

# method to test for a new link - called only from findLinks()
def testLink(nbor,linkno):
 swname1 = nbor["localname"]
 swname2 = nbor["remotename"]
 # one of the switches might have been pruned out
 if swname1 not in allswitches or swname2 not in allswitches:
  return False
 sw1 = allswitches[swname1]
 sw2 = allswitches[swname2]
 pname1 = nbor["localport"]
 pname2 = nbor["remoteport"]
 port1 = sw1["ports"][pname1];
 port2 = sw2["ports"][pname2];
 if not port1["linked"] and not port2["linked"]:
  # add new link
  linkid = "link" + str(linkno)
  port1["linked"] = True;
  port2["linked"] = True;
  alllinks[linkid] = {
   "node1": nbor["localname"],
   "port1": nbor["localport"],
   "node2": nbor["remotename"],
   "port2": nbor["remoteport"]
   }
  return True
 return False

# method to find unique links - call at the end once all the LLDP records have
# been processed from all the switches
def findLinks():
 linkcount = 0
 for nbor in allneighbors:
  if testLink(nbor, linkcount+1):
   linkcount += 1

# method to dump topology in json format recognized by sFlow-RT
def dumpTopology():
 topology = { "nodes": allswitches, "links": alllinks }
 print(json.dumps(topology, indent=4))

# method to get LLDP neighbors of each switch - calls processNeighbor() for each LLDP neighbor found
def getLldpNeighbors(switch_name):
 try:
  switch = Server('%s://%s:%s@%s/command-api' % (eapi_transport, eapi_username, eapi_password, switch_name))

  # Get LLDP neighbors
  commands = ["enable", "show lldp neighbors"]
  response = switch.runCmds(1, commands, 'json')
  neighbors = response[1]['lldpNeighbors']

  # Get local hostname
  commands = ["enable", "show hostname"]
  response = switch.runCmds(1, commands, 'json')
  hostname = response[1]['hostname']

  # Get SNMP ifIndexes
  commands = ["enable", "show snmp mib ifmib ifindex"]
  response = switch.runCmds(1, commands, 'json')
  interfaceIndexes = response[1]['ifIndex']

  # Get sFlow agent source address
  commands = ["enable", "show sflow"]
  response = switch.runCmds(1, commands, 'json')
  sflowAddress = response[1]['ipv4Sources'][0]['ipv4Address']
  
  # Create 2D array lldp_neighbors where each line has following entries 
  # , , , 
  lldp_neighbors = []
  for neighbor in neighbors:
   lldp_neighbors.append([neighbor['neighborDevice'].split('.')[0], 
        neighbor['port'], neighbor['neighborPort'], interfaceIndexes[neighbor['port']]])
  
  if (debug): 
   pprint(lldp_neighbors)


  # collect switches, ports and neighbor-relationships
  for row in lldp_neighbors:
   processNeighbor(hostname, 
    sflowAddress,
    row[1], # localport
    row[3], # localifindex
    row[0], # remotename
    row[2]) # remoteport

  # Print list of LLDP neighbors in human friendly format:
  #  neighbor, , connected to local  with remote 
  if debug:
   print "Switch %s has following %d neighbors:" % (hostname[1], len(neighbors))
   for i, neighbor in enumerate(lldp_neighbors):
    print "#%d neighbor, %s, connected to local %s with remote %s" % (i+1, neighbor[0], neighbor[1], neighbor[2])

 except:
  print 'Exception while connecting to %s' % switch_name
  return []


for switch in switch_list:
 getLldpNeighbors(switch)

pruneAgents()
findLinks()
dumpTopology()
The script outputs a JSON representation of the topology, for example:
{
    "nodes": {
        "leaf332": {
            "name": "leaf332", 
            "agent": "10.10.130.142", 
            "ports": {
                "Management1": {
                    "portname": "Management1", 
                    "ifindex": 999001, 
                    "linked": false
                }, 
                "Ethernet50/1": {
                    "portname": "Ethernet50/1", 
                    "ifindex": 50001, 
                    "linked": true
                }, 
                "Ethernet36": {
                    "portname": "Ethernet36", 
                    "ifindex": 36, 
                    "linked": true
                }, 
                "Ethernet51/1": {
                    "portname": "Ethernet51/1", 
                    "ifindex": 51001, 
                    "linked": true
                }, 
                "Ethernet52/1": {
                    "portname": "Ethernet52/1", 
                    "ifindex": 52001, 
                    "linked": true
                }, 
                "Ethernet49/1": {
                    "portname": "Ethernet49/1", 
                    "ifindex": 49001, 
                    "linked": true
                }, 
                "Ethernet12": {
                    "portname": "Ethernet12", 
                    "ifindex": 12, 
                    "linked": false
                }, 
                "Ethernet35": {
                    "portname": "Ethernet35", 
                    "ifindex": 35, 
                    "linked": true
                }
            }
        }, 
        "leaf259": {
            "name": "leaf259", 
            "agent": "10.10.129.220", 
            "ports": {
                "Management1": {
                    "portname": "Management1", 
                    "ifindex": 999001, 
                    "linked": false
                }, 
                "Ethernet5/1": {
                    "portname": "Ethernet5/1", 
                    "ifindex": 5001, 
                    "linked": true
                }, 
                "Ethernet29": {
                    "portname": "Ethernet29", 
                    "ifindex": 29, 
                    "linked": true
                }, 
                "Ethernet32": {
                    "portname": "Ethernet32", 
                    "ifindex": 32, 
                    "linked": true
                }, 
                "Ethernet6/1": {
                    "portname": "Ethernet6/1", 
                    "ifindex": 6001, 
                    "linked": true
                }, 
                "Ethernet31": {
                    "portname": "Ethernet31", 
                    "ifindex": 31, 
                    "linked": true
                }, 
                "Ethernet30": {
                    "portname": "Ethernet30", 
                    "ifindex": 30, 
                    "linked": true
                }, 
                "Ethernet15/1": {
                    "portname": "Ethernet15/1", 
                    "ifindex": 15001, 
                    "linked": false
                }
            }
        }, 
        "leaf331": {
            "name": "leaf331", 
            "agent": "10.10.130.141", 
            "ports": {
                "Management1": {
                    "portname": "Management1", 
                    "ifindex": 999001, 
                    "linked": false
                }, 
                "Ethernet50/1": {
                    "portname": "Ethernet50/1", 
                    "ifindex": 50001, 
                    "linked": true
                }, 
                "Ethernet36": {
                    "portname": "Ethernet36", 
                    "ifindex": 36, 
                    "linked": true
                }, 
                "Ethernet1": {
                    "portname": "Ethernet1", 
                    "ifindex": 1, 
                    "linked": false
                }, 
                "Ethernet51/1": {
                    "portname": "Ethernet51/1", 
                    "ifindex": 51001, 
                    "linked": true
                }, 
                "Ethernet52/1": {
                    "portname": "Ethernet52/1", 
                    "ifindex": 52001, 
                    "linked": true
                }, 
                "Ethernet49/1": {
                    "portname": "Ethernet49/1", 
                    "ifindex": 49001, 
                    "linked": true
                }, 
                "Ethernet11": {
                    "portname": "Ethernet11", 
                    "ifindex": 11, 
                    "linked": false
                }, 
                "Ethernet35": {
                    "portname": "Ethernet35", 
                    "ifindex": 35, 
                    "linked": true
                }
            }
        }, 
        "leaf260": {
            "name": "leaf260", 
            "agent": "10.10.129.221", 
            "ports": {
                "Management1": {
                    "portname": "Management1", 
                    "ifindex": 999001, 
                    "linked": false
                }, 
                "Ethernet11/1": {
                    "portname": "Ethernet11/1", 
                    "ifindex": 11001, 
                    "linked": false
                }, 
                "Ethernet5/1": {
                    "portname": "Ethernet5/1", 
                    "ifindex": 5001, 
                    "linked": true
                }, 
                "Ethernet29": {
                    "portname": "Ethernet29", 
                    "ifindex": 29, 
                    "linked": true
                }, 
                "Ethernet32": {
                    "portname": "Ethernet32", 
                    "ifindex": 32, 
                    "linked": true
                }, 
                "Ethernet6/1": {
                    "portname": "Ethernet6/1", 
                    "ifindex": 6001, 
                    "linked": true
                }, 
                "Ethernet31": {
                    "portname": "Ethernet31", 
                    "ifindex": 31, 
                    "linked": true
                }, 
                "Ethernet30": {
                    "portname": "Ethernet30", 
                    "ifindex": 30, 
                    "linked": true
                }
            }
        }, 
        "core210": {
            "name": "core210", 
            "agent": "10.10.129.185", 
            "ports": {
                "Ethernet3/3/1": {
                    "portname": "Ethernet3/3/1", 
                    "ifindex": 3037, 
                    "linked": false
                }, 
                "Ethernet3/6/1": {
                    "portname": "Ethernet3/6/1", 
                    "ifindex": 3073, 
                    "linked": true
                }, 
                "Ethernet3/5/1": {
                    "portname": "Ethernet3/5/1", 
                    "ifindex": 3061, 
                    "linked": true
                }, 
                "Ethernet3/2/1": {
                    "portname": "Ethernet3/2/1", 
                    "ifindex": 3025, 
                    "linked": false
                }, 
                "Ethernet3/8/1": {
                    "portname": "Ethernet3/8/1", 
                    "ifindex": 3097, 
                    "linked": true
                }, 
                "Ethernet3/1/1": {
                    "portname": "Ethernet3/1/1", 
                    "ifindex": 3013, 
                    "linked": false
                }, 
                "Management1/1": {
                    "portname": "Management1/1", 
                    "ifindex": 999011, 
                    "linked": false
                }, 
                "Ethernet3/34/1": {
                    "portname": "Ethernet3/34/1", 
                    "ifindex": 3409, 
                    "linked": false
                }, 
                "Ethernet3/31/1": {
                    "portname": "Ethernet3/31/1", 
                    "ifindex": 3373, 
                    "linked": false
                }, 
                "Ethernet3/7/1": {
                    "portname": "Ethernet3/7/1", 
                    "ifindex": 3085, 
                    "linked": true
                }
            }
        }, 
        "core212": {
            "name": "core212", 
            "agent": "10.10.129.64", 
            "ports": {
                "Ethernet3/3/1": {
                    "portname": "Ethernet3/3/1", 
                    "ifindex": 3037, 
                    "linked": false
                }, 
                "Ethernet3/12/1": {
                    "portname": "Ethernet3/12/1", 
                    "ifindex": 3145, 
                    "linked": false
                }, 
                "Ethernet3/2/1": {
                    "portname": "Ethernet3/2/1", 
                    "ifindex": 3025, 
                    "linked": false
                }, 
                "Ethernet3/13/1": {
                    "portname": "Ethernet3/13/1", 
                    "ifindex": 3157, 
                    "linked": false
                }, 
                "Ethernet3/31/1": {
                    "portname": "Ethernet3/31/1", 
                    "ifindex": 3373, 
                    "linked": false
                }, 
                "Ethernet3/32/1": {
                    "portname": "Ethernet3/32/1", 
                    "ifindex": 3385, 
                    "linked": false
                }, 
                "Ethernet3/18/1": {
                    "portname": "Ethernet3/18/1", 
                    "ifindex": 3217, 
                    "linked": true
                }, 
                "Ethernet3/28/1": {
                    "portname": "Ethernet3/28/1", 
                    "ifindex": 3337, 
                    "linked": true
                }, 
                "Ethernet3/33/1": {
                    "portname": "Ethernet3/33/1", 
                    "ifindex": 3397, 
                    "linked": false
                }, 
                "Ethernet3/5/1": {
                    "portname": "Ethernet3/5/1", 
                    "ifindex": 3061, 
                    "linked": true
                }, 
                "Ethernet3/8/1": {
                    "portname": "Ethernet3/8/1", 
                    "ifindex": 3097, 
                    "linked": true
                }, 
                "Ethernet3/34/1": {
                    "portname": "Ethernet3/34/1", 
                    "ifindex": 3409, 
                    "linked": false
                }, 
                "Ethernet3/36/1": {
                    "portname": "Ethernet3/36/1", 
                    "ifindex": 3433, 
                    "linked": false
                }, 
                "Ethernet3/35/1": {
                    "portname": "Ethernet3/35/1", 
                    "ifindex": 3421, 
                    "linked": false
                }, 
                "Ethernet3/15/1": {
                    "portname": "Ethernet3/15/1", 
                    "ifindex": 3181, 
                    "linked": true
                }, 
                "Ethernet3/7/1": {
                    "portname": "Ethernet3/7/1", 
                    "ifindex": 3085, 
                    "linked": true
                }, 
                "Ethernet3/16/1": {
                    "portname": "Ethernet3/16/1", 
                    "ifindex": 3193, 
                    "linked": true
                }, 
                "Ethernet3/17/1": {
                    "portname": "Ethernet3/17/1", 
                    "ifindex": 3205, 
                    "linked": true
                }, 
                "Management1/1": {
                    "portname": "Management1/1", 
                    "ifindex": 999011, 
                    "linked": false
                }, 
                "Ethernet3/26/1": {
                    "portname": "Ethernet3/26/1", 
                    "ifindex": 3313, 
                    "linked": true
                }, 
                "Ethernet3/25/1": {
                    "portname": "Ethernet3/25/1", 
                    "ifindex": 3301, 
                    "linked": true
                }, 
                "Ethernet3/21/1": {
                    "portname": "Ethernet3/21/1", 
                    "ifindex": 3253, 
                    "linked": false
                }, 
                "Ethernet3/11/1": {
                    "portname": "Ethernet3/11/1", 
                    "ifindex": 3133, 
                    "linked": false
                }, 
                "Ethernet3/6/1": {
                    "portname": "Ethernet3/6/1", 
                    "ifindex": 3073, 
                    "linked": true
                }, 
                "Ethernet3/27/1": {
                    "portname": "Ethernet3/27/1", 
                    "ifindex": 3325, 
                    "linked": true
                }, 
                "Ethernet3/1/1": {
                    "portname": "Ethernet3/1/1", 
                    "ifindex": 3013, 
                    "linked": false
                }, 
                "Ethernet3/23/1": {
                    "portname": "Ethernet3/23/1", 
                    "ifindex": 3277, 
                    "linked": false
                }, 
                "Ethernet3/22/1": {
                    "portname": "Ethernet3/22/1", 
                    "ifindex": 3265, 
                    "linked": false
                }
            }
        }
    }, 
    "links": {
        "link5": {
            "node1": "leaf260", 
            "node2": "core212", 
            "port2": "Ethernet3/15/1", 
            "port1": "Ethernet31"
        }, 
        "link4": {
            "node1": "leaf260", 
            "node2": "core212", 
            "port2": "Ethernet3/5/1", 
            "port1": "Ethernet30"
        }, 
        "link7": {
            "node1": "leaf259", 
            "node2": "core210", 
            "port2": "Ethernet3/6/1", 
            "port1": "Ethernet29"
        }, 
        "link6": {
            "node1": "leaf260", 
            "node2": "core212", 
            "port2": "Ethernet3/25/1", 
            "port1": "Ethernet32"
        }, 
        "link1": {
            "node1": "leaf260", 
            "node2": "leaf259", 
            "port2": "Ethernet5/1", 
            "port1": "Ethernet5/1"
        }, 
        "link3": {
            "node1": "leaf260", 
            "node2": "core210", 
            "port2": "Ethernet3/5/1", 
            "port1": "Ethernet29"
        }, 
        "link2": {
            "node1": "leaf260", 
            "node2": "leaf259", 
            "port2": "Ethernet6/1", 
            "port1": "Ethernet6/1"
        }, 
        "link9": {
            "node1": "leaf259", 
            "node2": "core212", 
            "port2": "Ethernet3/16/1", 
            "port1": "Ethernet31"
        }, 
        "link8": {
            "node1": "leaf259", 
            "node2": "core212", 
            "port2": "Ethernet3/6/1", 
            "port1": "Ethernet30"
        }, 
        "link15": {
            "node1": "leaf331", 
            "node2": "core212", 
            "port2": "Ethernet3/17/1", 
            "port1": "Ethernet51/1"
        }, 
        "link14": {
            "node1": "leaf331", 
            "node2": "core212", 
            "port2": "Ethernet3/7/1", 
            "port1": "Ethernet50/1"
        }, 
        "link17": {
            "node1": "leaf332", 
            "node2": "core210", 
            "port2": "Ethernet3/8/1", 
            "port1": "Ethernet49/1"
        }, 
        "link16": {
            "node1": "leaf331", 
            "node2": "core212", 
            "port2": "Ethernet3/27/1", 
            "port1": "Ethernet52/1"
        }, 
        "link11": {
            "node1": "leaf331", 
            "node2": "leaf332", 
            "port2": "Ethernet35", 
            "port1": "Ethernet35"
        }, 
        "link10": {
            "node1": "leaf259", 
            "node2": "core212", 
            "port2": "Ethernet3/26/1", 
            "port1": "Ethernet32"
        }, 
        "link13": {
            "node1": "leaf331", 
            "node2": "core210", 
            "port2": "Ethernet3/7/1", 
            "port1": "Ethernet49/1"
        }, 
        "link12": {
            "node1": "leaf331", 
            "node2": "leaf332", 
            "port2": "Ethernet36", 
            "port1": "Ethernet36"
        }, 
        "link20": {
            "node1": "leaf332", 
            "node2": "core212", 
            "port2": "Ethernet3/28/1", 
            "port1": "Ethernet52/1"
        }, 
        "link19": {
            "node1": "leaf332", 
            "node2": "core212", 
            "port2": "Ethernet3/18/1", 
            "port1": "Ethernet51/1"
        }, 
        "link18": {
            "node1": "leaf332", 
            "node2": "core212", 
            "port2": "Ethernet3/8/1", 
            "port1": "Ethernet50/1"
        }
    }
}

Access the Fabric View web interface at http://fabricview:8008/ and navigate to the Settings tab:
Upload the JSON topology file by clicking on the disk icon in the Topology section. Alternatively, the topology can be installed programmatically using the Fabric View REST API documented at the bottom of the Settings page.

As soon as the topology is installed, traffic data should start appearing in Fabric View. The video provides a quick walkthrough of the software features.

Tuesday, January 6, 2015

Open vSwitch performance monitoring

Credit: Accelerating Open vSwitch to “Ludicrous Speed”
Accelerating Open vSwitch to "Ludicrous Speed" describes the architecture of Open vSwitch. When a packet arrives, the OVS Kernel Module checks its cache to see if there is an entry that matches the packet. If there is a match then the packet is forwarded within the kernel. Otherwise, the packet is sent to the user space ovs-vswitchd process to determine the forwarding decision based on the set of OpenFlow rules that have been installed or, if no rules are found, by passing the packet to an OpenFlow controller. Once a forwarding decision has been made, the packet and the forwarding actions are passed back to the OVS Kernel Module which caches the decision and forwards the packet. Subsequent packets in the flow will then be matched by the cache and forwarded within the kernel.

The recent Open vSwitch 2014 Fall Conference included the talk, Managing Open vSwitch across a large heterogeneous fleet by Chad Norgan, describing Rackspace's experience with running a large scale OpenStack deployment using Open vSwitch for network virtualization. The talk describes the key metrics that Rackspace collects to monitor the performance of the large pools of Open vSwitch instances.

This article discusses the metrics presented in the Rackspace talk and describes how the embedded sFlow agent in Open vSwitch was extended to efficiently export the metrics.
The first chart trends the number of entries in each of the OVS Kernel Module caches across all the virtual switches in the OpenStack deployment.
The next chart trends the cache hit / miss rates for the OVS Kernel Module. Processing packets using cached entries in the kernel is much faster than sending the packet to user space and requires far fewer CPU cycles and so maintaining a high cache hit rate is critical to handling the large volume of traffic in a cloud data center.
The third chart from the Rackspace presentation tracks the CPU consumed by ovs-vswitchd as it handles cache misses. Excessive CPU utilization can result in poor network performance and dropped packets. Reducing the CPU cycles consumed by networking frees up resources that can be used to host additional virtual machines and generates additional revenue.

Currently, monitoring Open vSwitch cache performance involves polling each switch using the ovs-dpctl command and collecting the results. Polling is complex to configure and maintain and operational complexity is reduced if the Open vSwitch is able to push the metrics - see Push vs Pull

The following sFlow structure was defined to allow Open vSwitch to export cache statistics along with the other sFlow metrics that are pushed by the sFlow agent:
/* Open vSwitch data path statistics */
/* see datapath/datapath.h */
/* opaque = counter_data; enterprise = 0; format = 2207 */ 
struct ovs_dp_stats { 
  unsigned int hits;                                                
  unsigned int misses; 
  unsigned int lost;
  unsigned int mask_hits;
  unsigned int flows;
  unsigned int masks;
}
The sFlow agent was also extended to export CPU and memory statistics for the ovs-vswitchd process by populating the app_resources structure - see sFlow Application Structures.

These extensions are the latest in a set of recent enhancements to the Open vSwitch sFlow implementation, including:
The Open vSwitch project first added sFlow support five years ago and these recent enhancements build on the detailed visibility into network traffic provided by the core Open vSwitch sFlow implementation and the complementary visibility into hosts, hypervisors, virtual machines and containers provided by the Host sFlow project.
Visibility and the software defined data center
Broad support for the sFlow standard across the cloud data center stack provides simple, efficient, low cost, scaleable, and comprehensive visibility. The standard metrics can be consumed by a broad range of open source and commercial tools, including: sflowtool, sFlow-Trend, sFlow-RT, Ganglia, Graphite, InfluxDB and Grafana.

Monday, January 5, 2015

OpenFlow integration

Northbound APIs for traffic engineering describes how sFlow and OpenFlow provide complementary monitoring and control capabilities that can be combined to create software defined networking (SDN) solutions that automatically adapt the network to changing traffic and address high value use cases such as: DDoS mitigation, enforcing black lists, ECMP load balancing, and packet brokers.

The article describes the challenge of mapping between the different methods used by sFlow and OpenFlow to identify switch ports:
  • Agent IP address ⟷ OpenFlow switch ID
  • SNMP ifIndex ⟷ OpenFlow port ID
The recently published sFlow OpenFlow Structures extension addresses the challenge by providing a way for switches to export the mapping as an sFlow structure.

The Open vSwitch recently implemented the extension, unifying visibility and control of the virtual network edge. In addition, most physical that support OpenFlow also support sFlow. Ask vendors about their plans to implement the sFlow OpenFlow Structures extension since it is a key enabler for SDN control applications.

Saturday, January 3, 2015

Hybrid OpenFlow ECMP testbed


SDN fabric controller for commodity data center switches describes how the real-time visibility and hybrid control capabilities of commodity data center switches can be used to automatically adapt the network to changing traffic patterns and optimize performance. The article identifies hybrid OpenFlow as a critical component of the solution, allowing SDN to be combined with proven distributed routing protocols (e.g. BGP, ISIS, OSPF, etc) to deliver scaleable, production ready solutions that fully leverage the capabilities of commodity hardware.

This article will take the example of large flow marking that has been demonstrated using physical switches and show how Mininet can be used to emulate hybrid control of data center networks and deliver realistic results.
The article Elephant Detection in Virtual Switches & Mitigation in Hardware describes a demonstration by VMware and Cumulus Networks that shows how real-time detection and marking of large "Elephant" flows can dramatically improve application response time for small latency sensitive "Mouse" flows without impacting the throughput of the Elephants - see Marking large flows for additional background.
Performance optimizing hybrid OpenFlow controller demonstrated how hybrid OpenFlow can be used to mark Elephant flows on a top of rack switch. However, building test networks with physical switches to test the controller with realistic topologies is expensive and time consuming.

Mininet offers an attractive alternative, providing a lightweight network emulator that can be run in a virtual machine on a laptop and realistically simulate network topologies. In this example, Mininet will be used to emulate the four switch leaf and spine network shown in the diagram at the top of this article.

The sFlow-RT SDN controller includes a leafandspine.py script that configures Mininet to emulate ECMP leaf and spine fabrics with hybrid OpenFlow capable switches. To run the emulation, copy the leafandspine.py script from the sFlow-RT extras directory to your Mininet system and run the following command to create the leaf and spine network:
sudo ./leafandspine.py --collector=10.0.0.162 --controller=10.0.0.162 --topofile
=/var/www/html/topology.json
There are a few points to note about the emulation:
  1. While physical networks might have link speeds ranging from 1Gbit/s to 100Gbit/s, the emulation scales link speeds down to 10Mbit/s so that they can be emulated in software.
  2. The sFlow sampling rate is scaled proportionally - see Large flow detection
  3. A pair of OpenFlow 1.3 tables is used to emulate normal ECMP forwarding and hybrid OpenFlow overrides
  4. Linux Traffic Control (tc) commands are used to emulate hardware priority queueing based on Differentiated Services Code Point (DSCP) class, mapping DSCP class 8 to a lower priority or "less than best effort" queue.
  5. The script posts the topology as a JSON file under the default Apache document root so that it can be retrieved remotely by an SDN controller
  6. In this example the sFlow-RT controller is running on host 10.0.0.162 - change the address to match your setup.
The following script runs the ping command to test response time and plots the results as a simple text-based bar chart:
#!/bin/bash
SCALE=1
SCALE=${2-$SCALE}
ping $1 | awk -v SCALE=$SCALE 'BEGIN {FS="[= ]"; } NF==11 { n = $10 * SCALE; bar
 = ""; while(n >= 1) { bar = bar "*"; n--} print bar " " $10 " ms" }
Open and xterm on host h1 and run the command:
./pingtest 10.0.1.1 10
Next type the following command at the Mininet prompt to generate a large flow:
iperf h2 h3
The following screen capture shows the result of the iperf test:
The reported throughput of around the 10Mbit/s shows that the traffic is saturating the emulated 10Mbit/s links.

The following screen capture shows the ping results during the iperf test.
The ping test clearly shows the impact that the Elephant flow is having on response time. In addition, the increased response times of around 3ms are consistent with values shown in VMware / Cumulus Networks charts shown earlier.

The following sFlow-RT mark.js script implements an SDN controller that marks Elephant flows:
include('extras/leafandspine-hybrid.js');

// get topology from Mininet
setTopology(JSON.parse(http('http://10.0.0.30/topology.json')));

// Define large flow as greater than 1Mbits/sec for 1 second or longer
var bytes_per_second = 1000000/8, duration_seconds = 1;

// define TCP flow cache
setFlow('tcp',
 {keys:'ipsource,ipdestination,tcpsourceport,tcpdestinationport', filter:'direct
ion=ingress',
  value:'bytes', t:duration_seconds}
);

// set threshold identifying Elephant flows
setThreshold('elephant', {metric:'tcp', value:bytes_per_second, byFlow:true, tim
eout:4});

// set OpenFlow marking rule when Elephant is detected
var idx = 0;
setEventHandler(function(evt) {
 if(topologyInterfaceToLink(evt.agent,evt.dataSource)) return;
 var port = ofInterfaceToPort(evt.agent,evt.dataSource);
 if(port) {
  var dpid = port.dpid;
  var id = "mark" + idx++;
  var k = evt.flowKey.split(',');
  var rule= {
    match:{in_port: port.port, dl_type:2048, ip_proto:6, nw_src:k[0], nw_dst:k[1], tcp_src:k[2], tcp_dst:k[3]},
    actions:["set_ip_dscp=8","output=normal"], priority:1000, idleTimeout:5
  };
  logInfo(JSON.stringify(rule,null,1));
  setOfRule(dpid,id,rule);
 }
},['elephant']);
About the script:
  1. The included leafandspine-hybrid.js script emulates hybrid OpenFlow by rewriting the NORMAL OpenFlow action to jump to the table that contains the ECMP forwarding rules. 
  2. The script assumes that Mininet emulation is running on host 10.0.0.30. Modify the address in the setTopology() function for your setup.
  3. The setFlow() function instructs the controller build a flow cache to track TCP connections
  4. The setThreshold() function defines Elephant flows as TCP connections that exceed 10% of the link's bandwidth (in this case 1Mbit/second) for 1 second or more.
  5. The setEventHandler() function processes each Elephant flow notification and applies an OpenFlow marking rules to the ingress port on the edge switch where the traffic enters the fabric.
  6. The OpenFlow rules have an idleTimeout of 5 seconds, ensuring that they are automatically deleted by the switch when the flow ends.
Modify the sFlow-RT start.sh script to include the following settings:
-Dopenflow.start=yes 
-Dopenflow.flushRules=no
-Dscript.file=mark.js
Start sFlow-RT:
./start.sh
Repeat the iperf test.
The iperf results show that throughput of large flows is unaffected by the controller.
The screen capture shows the controller actions. The controller installs an OpenFlow rule as soon as the large flow is detected, settings the ip_dscp value to 8 and outputting the packets to the normal ECMP forwarding pipeline. The marked packets are treated as lower priority than the ping packets. Since the ping packets aren't stuck behind the deep queues caused by the iperf tests, the reported response times should be unaffected by the large flow.
The ping test confirms that with the controller running, response times are unaffected by Elephant flows, an approximately 10 times improvement in response time that is consistent with the results shown for a physical switch in the VMware / Cumulus charts shown earlier.

More broadly, hybrid OpenFlow provides an effective way to deliver SDN solutions in production, using OpenFlow to enhance the performance of existing networks. In addition to large flow marking, other cases described on this blog include: DDoS mitigation,  enforcing black lists, ECMP load balancing, and packet brokers.

Increasingly, vendors recognize the critical importance of hybrid OpenFlow in delivering practical SDN solutions - HP proposes hybrid OpenFlow discussion at Open Daylight design forum. The article Super NORMAL offers some suggestions for enhancing hybrid OpenFlow to address additional use cases, reduce operational complexity and increase reliability in production settings.

Finally, the sFlow measurement standard is critical to unlocking the full potential of hybrid OpenFlow. Support for sFlow is build into commodity switch hardware, providing cost effective visibility into traffic on production networks. The comprehensive real-time traffic analytics delivered by sFlow allows an SDN controller to effectively target actions, managing the limited hardware resources on the switches, to enhance network performance and security.