Tuesday, September 5, 2017

Troubleshooting connectivity problems in leaf and spine fabrics

Introducing data center fabric, the next-generation Facebook data center network describes the benefits of moving to a leaf and spine network architecture. The diagram shows how the leaf and spine architecture creates many paths between each pair of hosts. Multiple paths increase available bandwidth and resilience against the loss of a link or a switch. While most networks don't have the scale requirements of Facebook, smaller scale leaf and spine designs deliver high bandwidth, low latency, networking to support cloud workloads (e.g. vSphere, OpenStack, Docker, Hadoop, etc.).

Unlike traditional hierarchical network designs, where a small number of links can be monitored to provide visibility, a leaf and spine network has no special links or switches where running CLI commands or attaching a probe would provide visibility. Even if it were possible to attach probes, the effective bandwidth of a leaf and spine network can be as high as a Petabit/second, well beyond the capabilities of current generation monitoring tools.

Fortunately, industry standard sFlow monitoring technology is built into the commodity switch hardware used to build leaf and spine networks. Enabling sFlow telemetry on all the switches in the network provides centralized, real-time, visibility into network traffic.
Fabric View describes an open source application running on the sFlow-RT real-time analytics engine. The Fabric View application provides an overview of the health of the entire leaf and spine fabric, tracking flows and counters on all links and summarizing information in a set of fabric level metrics and dashboards. In addition, Black hole detection describes how to detect routing anomalies in the fabric using the forwarding information included in the sFlow telemetry stream.

The sFlow sampling mechanism implemented in the switches is a highly scaleable method of passively collecting traffic information. However,  analyzing failed connections can be a challenge since very few packets are generated and the chance of sampling these packets is small. The traditional tools used to diagnose connectivity issues, ping and traceroute, are of limited value in a leaf and spine network since they only test a single path and are likely to miss the path that is experiencing difficulties.

An alternative method of addressing the multi-path tracing problem is to enable filtered packet capture on each switch, programming the filters to capture the packets of interest. However, this method can be slow and complex since every switch needs to be configured for each test and the switch configurations need to be cleared after the test has been completed.

This article explores how the hping3 tool can be used with sFlow to trace packet paths across the fabric and detect where they are being lost. The following Python script, trace.py, uses sFlow-RT's REST API to program a flow to watch for a specific flow and print the links that it traverses:
#!/usr/bin/env python
import argparse
import requests
import json
import signal
from random import randint

def sig_handler(signal,frame):
  requests.delete(rt+'/flow/'+name+'/json')
  exit(0)
signal.signal(signal.SIGINT, sig_handler)

parser = argparse.ArgumentParser()
parser.add_argument('filter', help='sFlow-RT flow filter, e.g. "ipsource=10.0.0.1"')
args = parser.parse_args()

rt = 'http://localhost:8008'
name = 'trace' + str(randint(0,10000))

flow = {'keys':'link:inputifindex','value':'frames',
        'filter':args.filter,'log':True,'flowStart':True}
requests.put(rt+'/flow/'+name+'/json',data=json.dumps(flow))

flowurl = rt+'/flows/json?maxFlows=100&timeout=60&name='+name
flowID = -1
while 1 == 1:
  r = requests.get(flowurl+'&flowID='+str(flowID))
  if r.status_code != 200: break
  flows = r.json()
  if len(flows) == 0: continue

  flowID = flows[0]["flowID"]
  flows.reverse()
  for f in flows:
    print f['flowKeys']
Note: See RESTflow for a description of the sFlow-RT REST API.

First run the following Python script, supplying a filter to select the packets of interest:
./trace.py 'ipsource=172.16.134.1&udpsourceport=1111&ipdestination=172.16.135.1&udpdestinationport=53'
Note: Identifying characteristics of failed connections may be inferable from application error logs. Otherwise, running packet capture on the affected host (tcpdump/wireshark) can identify the network attributes of interest.

Next, log into the host that is having connectivity problems and generate traffic matching the flow:
sudo hping3 -c 100000 -i u100 --udp -k -s 1111 -p 53 172.16.135.1
Note: The above command sends 100,000 packets at a rate of 1 packet every 100 microseconds (i.e. at a rate of 10,000 packets per second).  Select a packet rate that will not disturb production traffic on the network and make sure to send enough packets so that at least one packet will be sampled on each link. For example, for 10G links the packet sampling rate should be around 1-in-10,000 so generating 100,000 packets means that there is a 99.995% chance that a link carrying the flow will generate at least 1 sample (the probability is easily calculated using the Binomal distribution, see Wolfram Alpha).

The trace.py script will start printing links traversed by the flow immediately they are detected (typically in less than a second after starting the test):
./trace.py 'ipsource=172.16.134.1&udpsourceport=1111&ipdestination=172.16.135.1&udpdestinationport=53'
leaf1-spine2
leaf2-spine2
The above example traced the single path traversed by a specific connection. To explore all paths, drop the source port and hping3 will cycle through source ports and the traffic should be visible on all the equal cost paths (provided that a layer 4 hash function has been selected by the switches).

Drop the source port from the trace.py filter:
./trace.py 'ipsource=172.16.134.1&ipdestination=172.16.135.1&udpdestinationport=53'
Drop the -k and -s options from the hping3 command:
sudo hping3  -c 100000 -i u100 --udp -p 53 172.16.138.1
The open source trace-flow application is a graphical version of the trace.py script written using sFlow-RT's JavaScript API (see Writing Applications). The screen capture above displayed the path for the test traffic within a second of the start of test.

Continuous network-wide monitoring of leaf and spine networks using sFlow leverages the capabilities of commodity switch hardware and provides centralized visibility that simplifies network operation and trouble shooting.

No comments:

Post a Comment