Thursday, February 26, 2015

Broadcom ASIC table utilization metrics, DevOps, and SDN

Figure 1: Two-Level Folded CLOS Network Topology Example
Figure 1 from the Broadcom white paper, Engineered Elephant Flows for Boosting Application Performance in Large-Scale CLOS Networks, shows a data center leaf and spine topology. Leaf and spine networks are seeing rapid adoption since they provide the scaleability needed to cost effectively deliver the low latency, high bandwidth interconnect for cloud, big data, and high performance computing workloads.

Broadcom Trident ASICs are popular in white box, brite-box and branded data center switches from a wide range of vendors, including: Accton, Agema, Alcatel-Lucent, Arista, Cisco, Dell, Edge-Core, Extreme, Hewlett-Packard, IBM, Juniper, Penguin Computing, and Quanta.
Figure 2: OF-DPA Programming Pipeline for ECMP
Figure 2 shows the packet processing pipeline of a Broadcom ASIC. The pipeline consists of a number of linked hardware tables providing bridging, routing, access control list (ACL), and ECMP forwarding group functions. Operations teams need to be able to proactively monitor table utilizations in order to avoid performance problems associated with table exhaustion.

Broadcom's recently released sFlow specification, sFlow Broadcom Switch ASIC Table Utilization Structures, leverages the industry standard sFlow protocol to offer scaleable, multi-vendor, network wide visibility into the utilization of these hardware tables.

Support for the new extension has just been added to the open source Host sFlow agent, which runs on Cumulus Linux, a Debian based Linux distribution that supports open switch hardware from Agema, Dell, Edge-Core, Penguin Computing, Quanta. Hewlett-Packard recently announced that they will soon be selling a new line of open network switches built by Accton Technologies and supporting Cumulus Linux.
The speed with which this new features can be delivered on hardware from the wide range of vendors supporting Cumulus Linux is a powerful illustration of the power of open networking. While support for the Broadcom ASIC table extension has been checking into the Host sFlow trunk it hasn't yet made it into the Cumulus Networks binary repositories. However, Cumulus Linux is an open platform, so users are free to download sources, compile and install the latest software version direct from SourceForge.
The following output from the open source sflowtool command line utility shows the raw table measurements (this is in addition to the extensive set of sFlow measurements already exported via sFlow on Cumulus Linux):
bcm_asic_host_entries 4
bcm_host_entries_max 8192
bcm_ipv4_entries 0
bcm_ipv4_entries_max 0
bcm_ipv6_entries 0
bcm_ipv6_entries_max 0
bcm_ipv4_ipv6_entries 9
bcm_ipv4_ipv6_entries_max 16284
bcm_long_ipv6_entries 3
bcm_long_ipv6_entries_max 256
bcm_total_routes 10
bcm_total_routes_max 32768
bcm_ecmp_nexthops 0
bcm_ecmp_nexthops_max 2016
bcm_mac_entries 3
bcm_mac_entries_max 32768
bcm_ipv4_neighbors 4
bcm_ipv6_neighbors 0
bcm_ipv4_routes 0
bcm_ipv6_routes 0
bcm_acl_ingress_entries 842
bcm_acl_ingress_entries_max 4096
bcm_acl_ingress_counters 68
bcm_acl_ingress_counters_max 4096
bcm_acl_ingress_meters 18
bcm_acl_ingress_meters_max 8192
bcm_acl_ingress_slices 3
bcm_acl_ingress_slices_max 8
bcm_acl_egress_entries 36
bcm_acl_egress_entries_max 512
bcm_acl_egress_counters 36
bcm_acl_egress_counters_max 1024
bcm_acl_egress_meters 18
bcm_acl_egress_meters_max 512
bcm_acl_egress_slices 2
bcm_acl_egress_slices_max 2
The sflowtool output is useful for troubleshooting and is easy to parse with scripts.


The diagram shows how the sFlow-RT analytics engine is used to deliver metrics and events to cloud based and on-site DevOps tools, see: Cloud analytics,  InfluxDB and GrafanaCloud AnalyticsMetric export to Graphite, and Exporting events using syslog.

For example, the following sFlow-RT application simplifies monitoring of the leaf and spine network by combining measurements from all the switches, identifying the switch with the maximum utilization of each table, pushing the summaries to operations dashboard every 15 seconds, and sending syslog events immediately when any table exceeds 80% utilization:
var network_wide_metrics = [

var max_utilization = 80;

setIntervalHandler(function() {
  var vals = metric('ALL',network_wide_metrics);
  var graphite_metrics = {};
  for each (var val in vals) {
    if(!val.hasOwnProperty('metricValue')) continue;

    // generate syslog events for over utilized tables
    if(val.metricValue >= max_utilization) {
       var event = {
       try {
           '', // syslog collector: splunk>, logstash, etc.
           514,        // syslog port
           16,         // facility = local0
           5,          // severity = notice
      } catch(e) { logWarning("syslog() failed " + e); }

    // add metric to graphite set
    graphite_metrics["network.podA."+val.metricName] = val.metricValue;

  // sent metrics to graphite
  try {
      '',  // graphite server
      2003,          // graphite carbon UDP port
  } catch(e) { logWarning("graphite() failed " + e); }
The following screen capture shows the graphs starting to appear in Graphite:

Real-time traffic analytics

The table utilization metrics are only a part of the visibility that sFlow provides into the performance of a leaf and spine network.

A leaf and spine fabric is challenging to monitor. The fabric spreads traffic across all the switches and links in order to maximize bandwidth. Unlike traditional hierarchical network designs, where a small number of links can be monitored to provide visibility, a leaf and spine network has no special links or switches where running CLI commands or attaching a probe would provide visibility. Even if it were possible to attach probes, the effective bandwidth of a leaf and spine network can be as high as a Petabit/second, well beyond the capabilities of current generation monitoring tools.
Scaleable traffic measurement is possible because Broadcom ASICs implement hardware support for sFlow monitoring, providing cost effective, line rate visibility that is build into the switches and scales to all port speeds (1G, 10G, 25G, 40G, 50G, 100G, ...) and the high port counts found in large leaf and spine networks.
The 2 minute video provides an overview of some of the performance challenges with leaf and spine fabrics and demonstrates Fabric View - a monitoring solution that leverages industry standard sFlow instrumentation in commodity data center switches to provide real-time visibility into fabric performance. Fabric visibility with Cumulus Linux describes how to set up Fabric View to monitor a Cumulus Linux leaf and spine network.


Real-time network analytics are a fundamental driver for a number of important SDN use cases, allowing the SDN controller to rapidly detect changes in traffic and respond by applying active controls. SDN fabric controller for commodity data center switches describes how control of the ACL table is the key feature needed to to build scaleable SDN solutions.

REST API for Cumulus Linux ACLs describes open source software to allow an SDN controller to centrally manage the ACL tables on a large scale network of switches running Cumulus Linux.
The ability to install software on the switches is transformative, allowing third party developers and network operators transparent access to the full capabilities of the switch and build solutions that efficiently handle automation challenges.
A number of SDN use cases have been demonstrated that build on Cumulus Linux to leverage the real-time visibility and control capabilities of the switch ASIC:
Visit the web site to learn more about SDN control of leaf and spine networks.

Finally, the SDN use cases make extensive use of the ACL table and so this brings us full circle to the importance of the Broadcom sFlow extension providing visibility into the utilization of table resources.

Thursday, February 5, 2015

Cloud analytics

Librato is an example of a cloud based analytics service (now part of SolarWinds). Librato provides an easy to use REST API for pushing metrics into their cloud service. The web portal makes it simple to combine and trend data and build and share dashboards.

This article describes a proof of concept demonstrating how Librato's cloud service can be used to cost effectively monitor large scale cloud infrastructure by leveraging standard sFlow instrumentation. Librato offers a free 30 day trial, making it easy to evaluate solutions based on this demonstration.
The diagram shows the measurement pipeline. Standard sFlow measurements from hosts, hypervisors, virtual machines, containers, load balancers, web servers and network switches stream to the sFlow-RT real-time analytics engine. Metrics are pushed from sFlow-RT to Librato using the REST API.

Over 40 vendors implement the sFlow standard and compatible products are listed on The open source Host sFlow agent exports standard sFlow metrics from hosts. For additional background, the Velocity conference talk provides an introduction to sFlow and case study from a large social networking site.

Librato's service is priced based on the number of data points that they need to store. For example, a Host sFlow agent reports approximately 50 measurements per node. Collecting all the measurements from a cluster of 100 servers would generate 5000 metrics and cost $1,000 per month if metrics are stored at 15 second intervals.
There are important scaleability and cost advantages to placing the sFlow-RT analytics engine in front of the metrics collection service. For example, in large scale cloud environments the metrics for each member of a dynamic pool isn't necessarily worth trending since virtual machines are frequently added and removed. Instead, sFlow-RT tracks all the members of the pool, calculates summary statistics for the pool, and logs the summary statistics. This pre-processing can significantly reduce storage requirements, reducing costs and increasing query performance. The sFlow-RT analytics software also calculates traffic flow metrics, hot/missed Memcache keys, top URLs, exports events via syslog to Splunk, Logstash etc. and provides access to detailed metrics through its REST API.
The following steps were involved in setting up the proof of concept.

First register for free trial at

Find or build a server with Java 1.7+ and install sFlow-RT:
tar -xvzf sflow-rt.tar.gz
cd sflow-rt
Edit the init.js script and add the following lines (modifying the user and token from your Librato account):
var url = "";
var user = "";
var token = "55add91c806fb5f634ad1a334789a32e8d10a597815e6865aa84f0749324450e";

setIntervalHandler(function() {
  var metrics = ['min:load_one','q1:load_one','med:load_one',
  var vals = metric('ALL',metrics,{os_name:['linux']});
  var gauges = {};
  for each (var val in vals) {
     gauges[val.metricName] = {
       "value": val.metricValue,
       "source": "Linux_Pool"
  var body = {"gauges":gauges};
  http(url,'post', 'application/json', JSON.stringify(body), user, token);
} , 15); 
Now start sFlow-RT:
Cluster performance metrics describes the summary metrics that sFlow-RT can calculate. In this case, the load average minimum, maximum, and quartiles for the cluster are being calculated and pushed to Librato every 15 seconds.

Install Host sFlow agents on the physical or virtual machines in your cluster and direct them to send metrics to the sFlow-RT host. The installation steps can be easily automated using orchestration tools like Puppet, Chef, Ansible, etc.

Physical and virtual switches in the cluster can be configured to send sFlow to sFlow-RT in order to add traffic metrics to the mix, exporting metrics that characterizing traffic between service tiers etc. However, in public cloud environments, traffic flow information is typically not available. The articles, Amazon Elastic Compute Cloud (EC2) and Rackspace cloudservers describe how Host sFlow agents can be configured to monitor traffic between virtual machines in the cloud.
Metrics should start appearing in Librato as soon as the Host sFlow agents are started.

In this example, sFlow-RT is exporting 5 metrics to summarize the cluster performance, reducing the total monthly cost of monitoring the cluster from $1,000 to $1. Of course there are likely to be more metrics that you will want to track, but the ability to selectively log high value metrics provides a way to control costs and maximize benefits.