Friday, July 15, 2016

Internet router using merchant silicon

SDN router using merchant silicon top of rack switch and Dell OS10 SDN router demo discuss how an inexpensive white box switch running Linux can be used to replace a much costlier Internet router. The key to this solution is the observation that, while the full Internet routing table of over 600,000 routes is too large to fit in white box switch hardware, only a small fraction of the routes carry most of the traffic. Traffic analytics allows the active routes to be identified and installed in the hardware.

This article describes a simple self contained solution that uses standard APIs and should be able to run on a variety of Linux based network operating systems, including: Cumulus Linux, Dell OS10, Arista EOS, and Cisco NX-OS. The distinguishing feature of this solution is its real-time response, where previous solutions respond to changes in traffic within minutes or hours, this solution updates hardware routes within seconds.

The diagram shows the elements of the solution. Standard sFlow instrumentation embedded in the merchant silicon ASIC data plane in the white box switch provides real-time information on traffic flowing through the switch. The sFlow agent is configured to send the sFlow to an instance of sFlow-RT running on the switch. The Bird routing daemon is used to handle the BGP peering sessions and to install routes in the Linux kernel using the standard netlink interface. The network operating system in turn programs the switch ASIC with the kernel routes so that packets are forwarded by the switch hardware and not by the kernel software.

The key to this solution is Bird's multi-table capabilities. The full Internet routing table learned from BGP peers is installed in a user space table that is not reflected into the kernel. A BGP session between sFlow-RT and Bird allows sFlow-RT to see the full routing table and combine it with the sFlow telemetry to perform real-time BGP route analytics and identify the currently active routes. A second BGP session allows sFlow-RT to push routes to Bird which in turn pushes the active routes to the kernel, programming the ASIC.

In this example, the following Bird configuration, /etc/bird/bird.conf, was installed on the switch:
# Please refer to the documentation in the bird-doc package or BIRD User's
# Guide on http://bird.network.cz/ for more information on configuring BIRD and
# adding routing protocols.

# Change this into your BIRD router ID. It's a world-wide unique identification
# of your router, usually one of router's IPv4 addresses.
router id 10.0.0.136;

# The Kernel protocol is not a real routing protocol. Instead of communicating
# with other routers in the network, it performs synchronization of BIRD's
# routing tables with the OS kernel.
protocol kernel {
  learn;
  scan time 2;
  import all;
  export all;   # Actually insert routes into the kernel routing table
}

# The Device protocol is not a real routing protocol. It doesn't generate any
# routes and it only serves as a module for getting information about network
# interfaces from the kernel. 
protocol device {
  scan time 60;
}

protocol direct {
  interface "*";
}

# Create a new table (disconnected from kernel/master) for peering routes
table peers;

# Create BGP sessions with peers
protocol bgp peer_65134 {
  table peers;
  igp table master;
  local as 65136;
  neighbor 10.0.0.134 as 65134;
  import all;
  export all;
}

protocol bgp peer_65135 {
  table peers;
  igp table master;
  local as 65136;
  neighbor 10.0.0.135 as 65135;
  import all;
  export all;
}

# Copy default route from peers table to master table
protocol pipe {
  table peers;
  peer table master;
  import none;
  export filter {
     if net ~ [ 0.0.0.0/0 ] then accept;
     reject;
  };
}

# Reflect peers table to sFlow-RT
protocol bgp to_sflow_rt {
  table peers;
  igp table master;
  local as 65136;
  neighbor 127.0.0.1 port 1179 as 65136;
  import none;
  export all;
}

# Receive active prefixes from sFlow-RT
protocol bgp from_sflow_rt {
  local as 65136;
  neighbor 10.0.0.136 port 1179 as 65136;
  import all;
  export none;
}
The open source Active Route Manager (ARM) application has been installed in sFlow-RT and the following sFlow-RT configuration, /usr/local/sflow-rt/conf.d/sflow-rt.conf, enables the BGP route reflector and control sessions with Bird:
bgp.start=yes
arm.reflector.ip=127.0.0.1
arm.reflector.as=65136
arm.reflector.id=0.0.0.1
arm.sflow.ip=10.0.0.136
arm.target.ip = 10.0.0.136
arm.target.as=65136
arm.target.id=0.0.0.2
arm.target.prefixes=10000
Once configured, operation is entirely automatic. As soon as traffic starts flowing to a new route, the route is identified and installed in the ASIC. If the route later becomes inactive, it is automatically removed from the ASIC to be replaced with a different active route. In this case, the maximum number of routes allowed in the ASIC has been specified as 10,000. This number can be changed to reflect the capacity of the hardware.
The Active Route Manager application has a web interface that provides up to the second visibility into the number of routes, routes installed in hardware, amount of traffic, hardware and software resource utilization etc. In addition, the sFlow-RT REST API can be used to make additional queries.

2 comments:

  1. Looks like a re-write of the SIR solution from David Barroso when he was at Spotify

    ReplyDelete
    Replies
    1. It is similar to David Barroso's SIR router (which is described in detail in the first link in the article). However, there are significant differences in the way active routes are detected and injected.

      The sFlow-RT software detects newly active routes within a second (the SIR router takes an hour). In this example routes are injected using a BGP connection and are added immediately.

      In the SIR router, a batch job created a large selective route download filter file which is installed as part of a periodic Bird configuration reload (every hour). In this example the active route cache is populated using a BGP session and routes are added and removed within seconds.

      Reducing the time between detecting and acting on a newly active prefix from an hour to a few seconds makes better use of the hardware resources since the system can respond quickly to changing traffic patterns.

      Delete