Figure 1: Two-Level Folded CLOS Network Topology Example |
Broadcom Trident ASICs are popular in white box, brite-box and branded data center switches from a wide range of vendors, including: Accton, Agema, Alcatel-Lucent, Arista, Cisco, Dell, Edge-Core, Extreme, Hewlett-Packard, IBM, Juniper, Penguin Computing, and Quanta.
Figure 2: OF-DPA Programming Pipeline for ECMP |
Broadcom's recently released sFlow specification, sFlow Broadcom Switch ASIC Table Utilization Structures, leverages the industry standard sFlow protocol to offer scaleable, multi-vendor, network wide visibility into the utilization of these hardware tables.
Support for the new extension has just been added to the open source Host sFlow agent, which runs on Cumulus Linux, a Debian based Linux distribution that supports open switch hardware from Agema, Dell, Edge-Core, Penguin Computing, Quanta. Hewlett-Packard recently announced that they will soon be selling a new line of open network switches built by Accton Technologies and supporting Cumulus Linux.
The speed with which this new features can be delivered on hardware from the wide range of vendors supporting Cumulus Linux is a powerful illustration of the power of open networking. While support for the Broadcom ASIC table extension has been checking into the Host sFlow trunk it hasn't yet made it into the Cumulus Networks binary repositories. However, Cumulus Linux is an open platform, so users are free to download sources, compile and install the latest software version direct from SourceForge.The following output from the open source sflowtool command line utility shows the raw table measurements (this is in addition to the extensive set of sFlow measurements already exported via sFlow on Cumulus Linux):
bcm_asic_host_entries 4 bcm_host_entries_max 8192 bcm_ipv4_entries 0 bcm_ipv4_entries_max 0 bcm_ipv6_entries 0 bcm_ipv6_entries_max 0 bcm_ipv4_ipv6_entries 9 bcm_ipv4_ipv6_entries_max 16284 bcm_long_ipv6_entries 3 bcm_long_ipv6_entries_max 256 bcm_total_routes 10 bcm_total_routes_max 32768 bcm_ecmp_nexthops 0 bcm_ecmp_nexthops_max 2016 bcm_mac_entries 3 bcm_mac_entries_max 32768 bcm_ipv4_neighbors 4 bcm_ipv6_neighbors 0 bcm_ipv4_routes 0 bcm_ipv6_routes 0 bcm_acl_ingress_entries 842 bcm_acl_ingress_entries_max 4096 bcm_acl_ingress_counters 68 bcm_acl_ingress_counters_max 4096 bcm_acl_ingress_meters 18 bcm_acl_ingress_meters_max 8192 bcm_acl_ingress_slices 3 bcm_acl_ingress_slices_max 8 bcm_acl_egress_entries 36 bcm_acl_egress_entries_max 512 bcm_acl_egress_counters 36 bcm_acl_egress_counters_max 1024 bcm_acl_egress_meters 18 bcm_acl_egress_meters_max 512 bcm_acl_egress_slices 2 bcm_acl_egress_slices_max 2The sflowtool output is useful for troubleshooting and is easy to parse with scripts.
DevOps
The diagram shows how the sFlow-RT analytics engine is used to deliver metrics and events to cloud based and on-site DevOps tools, see: Cloud analytics, InfluxDB and Grafana, Cloud Analytics, Metric export to Graphite, and Exporting events using syslog.
For example, the following sFlow-RT application simplifies monitoring of the leaf and spine network by combining measurements from all the switches, identifying the switch with the maximum utilization of each table, pushing the summaries to operations dashboard every 15 seconds, and sending syslog events immediately when any table exceeds 80% utilization:
var network_wide_metrics = [ 'max:bcm_host_utilization', 'max:bcm_mac_utilization', 'max:bcm_ipv4_ipv6_utilization', 'max:bcm_total_routes_utilization', 'max:bcm_ecmp_nexthops_utilization', 'max:bcm_acl_ingress_utilization', 'max:bcm_acl_ingress_meters_utilization', 'max:bcm_acl_ingress_counters_utilization', 'max:bcm_acl_egress_utilization', 'max:bcm_acl_egress_meters_utilization', 'max:bcm_acl_egress_counters_utilization' ]; var max_utilization = 80; setIntervalHandler(function() { var vals = metric('ALL',network_wide_metrics); var graphite_metrics = {}; for each (var val in vals) { if(!val.hasOwnProperty('metricValue')) continue; // generate syslog events for over utilized tables if(val.metricValue >= max_utilization) { var event = { "asic_table":val.metricName, "utilization":val.metricValue, "switchIP":val.agent }; try { syslog( '10.0.0.1', // syslog collector: splunk>, logstash, etc. 514, // syslog port 16, // facility = local0 5, // severity = notice event ); } catch(e) { logWarning("syslog() failed " + e); } } // add metric to graphite set graphite_metrics["network.podA."+val.metricName] = val.metricValue; } // sent metrics to graphite try { graphite( '10.0.0.151', // graphite server 2003, // graphite carbon UDP port graphite_metrics ); } catch(e) { logWarning("graphite() failed " + e); } },15);The following screen capture shows the graphs starting to appear in Graphite:
Real-time traffic analytics
A leaf and spine fabric is challenging to monitor. The fabric spreads traffic across all the switches and links in order to maximize bandwidth. Unlike traditional hierarchical network designs, where a small number of links can be monitored to provide visibility, a leaf and spine network has no special links or switches where running CLI commands or attaching a probe would provide visibility. Even if it were possible to attach probes, the effective bandwidth of a leaf and spine network can be as high as a Petabit/second, well beyond the capabilities of current generation monitoring tools.
Scaleable traffic measurement is possible because Broadcom ASICs implement hardware support for sFlow monitoring, providing cost effective, line rate visibility that is build into the switches and scales to all port speeds (1G, 10G, 25G, 40G, 50G, 100G, ...) and the high port counts found in large leaf and spine networks.The 2 minute video provides an overview of some of the performance challenges with leaf and spine fabrics and demonstrates Fabric View - a monitoring solution that leverages industry standard sFlow instrumentation in commodity data center switches to provide real-time visibility into fabric performance. Fabric visibility with Cumulus Linux describes how to set up Fabric View to monitor a Cumulus Linux leaf and spine network.
SDN
Real-time network analytics are a fundamental driver for a number of important SDN use cases, allowing the SDN controller to rapidly detect changes in traffic and respond by applying active controls. SDN fabric controller for commodity data center switches describes how control of the ACL table is the key feature needed to to build scaleable SDN solutions.REST API for Cumulus Linux ACLs describes open source software to allow an SDN controller to centrally manage the ACL tables on a large scale network of switches running Cumulus Linux.
The ability to install software on the switches is transformative, allowing third party developers and network operators transparent access to the full capabilities of the switch and build solutions that efficiently handle automation challenges.A number of SDN use cases have been demonstrated that build on Cumulus Linux to leverage the real-time visibility and control capabilities of the switch ASIC:
Visit the sFlow.com web site to learn more about SDN control of leaf and spine networks.
Finally, the SDN use cases make extensive use of the ACL table and so this brings us full circle to the importance of the Broadcom sFlow extension providing visibility into the utilization of table resources.
Finally, the SDN use cases make extensive use of the ACL table and so this brings us full circle to the importance of the Broadcom sFlow extension providing visibility into the utilization of table resources.