Tuesday, June 14, 2016

Merchant silicon based routing, flow analytics, and telemetry

Drivers for growth describes how switches built on merchant silicon from Broadcom ASICs dominate the current generation of data center switches, reduce hardware costs, and support an open ecosystem of switch operating systems (Cumulus Linux, OpenSwitch, Dell OS10, Broadcom FASTPATH, Pica8 PicOS, Open Network Linux, etc.).

The router market is poised to be similarly disrupted with the introduction of devices based on Broadcom's Jericho ASIC, which has the capacity to handle over 1 million routes in hardware (the full Internet routing table is currently around 600,000 routes).
An edge router is a very pricey box indeed, often costing anywhere from $100,000 to $200,000 per 100 Gb/sec port, depending on features in the router and not including optical cables that are also terribly expensive. Moreover, these routers might only be able to cram 80 ports into a half rack or full rack of space. The 7500R universal spine and 7280R universal leaf switches cost on the order of $3,000 per 100 Gb/sec port, and they are considerably denser and less expensive. - Leaving Fixed Function Switches Behind For Universal Leafs
Broadcom Jericho ASICs are currently available in Arista 7500R/7280R routers and in Cisco NCS 5000 series routers. Expect further disruption to the router market when white box versions of the 1U router hardware enter the market.
There was general enthusiasm for Broadcom Jericho based routers in a recent discussion on the North American Network Operators' Group (NANOG) mailing list, Arista Routing Solutions, so merchant silicon based routers should be expected to sell well.
The Broadcom Jericho ASICs also include hardware instrumentation to support industry standard sFlow traffic monitoring and streaming telemetry. For example, the following commands enable sFlow on all ports on an Arista router:
sflow source-interface Management1
sflow destination 170.1.1.11
sflow polling-interval 30
sflow sample 65535
sflow run
See EOS System Configuration Guide for details.

While Cisco supports standard sFlow on merchant silicon based switch platforms, see Cisco adds sFlow support, Cisco adds sFlow support to Nexus 9K series, and Cisco SF250, SG250, SF350, SG350, SG350XG, and SG550XG series switches. Unfortunately, IOS XR on Cisco's Jericho based routers doesn't yet support sFlow. Instead, a complex set of commands is required to configure Cisco's proprietary NetFlow and streaming telemetry protocols:
RP/0/RP0/CPU0:router#config
RP/0/RP0/CPU0:router(config)#flow exporter-map exp1
RP/0/RP0/CPU0:router(config-fem)#version v9
RP/0/RP0/CPU0:router(config-fem-ver)#options interface-table timeout 300
RP/0/RP0/CPU0:router(config-fem-ver)#options sampler-table timeout 300
RP/0/RP0/CPU0:router(config-fem-ver)#template data timeout 300
RP/0/RP0/CPU0:router(config-fem-ver)#template options timeout 300
RP/0/RP0/CPU0:router(config-fem-ver)#exit 
RP/0/RP0/CPU0:router(config-fem)#transport udp 12515
RP/0/RP0/CPU0:router(config-fem)#source Loopback0
RP/0/RP0/CPU0:router(config-fem)#destination 170.1.1.11
RP/0/RP0/CPU0:router(config-fmm)#exit
RP/0/RP0/CPU0:router(config)#flow monitor-map MPLS-IPv6-fmm
RP/0/RP0/CPU0:router(config-fmm)#record mpls ipv6-fields labels 3
RP/0/RP0/CPU0:router(config-fmm)#exporter exp1
RP/0/RP0/CPU0:router(config-fmm)#cache entries 10000
RP/0/RP0/CPU0:router(config-fmm)#cache permanent
RP/0/RP0/CPU0:router(config-fmm)#exit
RP/0/RP0/CPU0:router(config)#sampler-map FSM
RP/0/RP0/CPU0:router(config-sm)#random 1 out-of 65535
RP/0/RP0/CPU0:router(config-sm)# exit
And further commands are needed to enable monitoring on each interface (and there can be a large number of interfaces given the high port density of these routers):
RP/0/RP0/CPU0:router(config)#interface HundredGigE 0/3/0/0
RP/0/RP0/CPU0:router(config-if)#flow mpls monitor MPLS-IPv6-fmm sampler FSM ingress
See Netflow Configuration Guide for Cisco NCS 5500 Series Routers, IOS XR Release 6.0.x for configuration details and limitations.

We are still not done, further steps are required to enable the equivalent to sFlow's streaming telemetry.

Create policy file defining the counters to export:
{
 "Name": "Test",
 "Metadata": {
  "Version": 25,
  "Description": "This is a sample policy",
  "Comment": "This is the first draft",
  "Identifier": "data that may be sent by the encoder to the mgmt stn"
 },
 "CollectionGroups": {
  "FirstGroup": {
  "Period": 30,
  "Paths": [
   "RootOper.InfraStatistics.Interface(*).Latest.GenericCounters"
   ]
  }
 }
}
Copy the policy file to router:
$ scp Test.policy cisco@170.1.1.1:/telemetry/policies
Finally, configure the JSON encoder:
Router# configure
Router(config)#telemetry encoder json
Router(config-telemetry-json)#policy group FirstGroup
Router(config-policy-group)#policy Test
Router(config-policy-group)#destination ipv4 170.1.1.11 port 5555
Router(config-policy-group)#commit
See Cisco IOS XR Telemetry Configuration Guide for details.
Software defined analytics describes how the sFlow architecture disaggregates the flow analytics pipeline and integrates telemetry export to reduce complexity and increase flexibility. The reduced configuration complexity is clearly illustrated by the two configuration examples above.

Unlike the complex and disparate monitoring mechanisms in IOS XR, sFlow offers a simple, flexible and unified monitoring solution that exposes the full monitoring capabilities of the Broadcom Jericho ASIC. Expect a future release of IOS XR to add the sFlow support since sFlow a natural fit for the hardware capabilities of Jericho based router platforms and the addition of sFlow support will provide feature parity with Cisco's merchant silicon based switches.

Finally, the real-time visibility provided by sFlow supports a number of important use cases for high performance routers, including:
  • DDoS mitigation
  • Load balancing ECMP paths
  • BGP route analytics
  • Traffic engineering
  • Usage based accounting
  • Enforcing usage quotas

No comments:

Post a Comment