Saturday, April 2, 2016

Internet Exchange (IX) Metrics

IX Metrics has been released on GitHub, https://github.com/sflow-rt/ix-metrics. The application provides real-time monitoring of traffic between members in an Internet Exchange (IX).

Close monitoring of exchange traffic is critical to operations:
  1. Ensure that there is sufficient capacity to accommodate new and existing members.
  2. Ensure that all traffic sources are accounted for and that there are no unauthorized connections.
  3. Ensure that only allowed traffic types are present.
  4. Ensure that non-unicast traffic is strictly controlled.
  5. Ensure that packet size policies are controlled to avoid loss due to MTU mismatches.
IX Metrics imports information about exchange members using the IX Member List JSON Schema. The member information is used to create traffic analytics and traffic is checked against the schema to identify errors, for example, if a member is using a MAC address that isn't listed.

The measurements from the exchange infrastructure are useful to members since it allows them to easily see how much traffic they are exchanging with other members through their peering relationships. This information is easy to collect using the exchange infrastructure, but much harder for members to determine independently.

The sFlow standard has long been a popular method of monitoring exchanges for a number of reasons:
  1. sFlow instrumentation is built into high capacity data center switches used in exchanges.
  2. Exchanges handle large amounts of traffic, many Terabits/second in large exchanges and sFlow instrumentation has the scaleability to monitor the entire infrastructure.
  3. Exchanges typically operate at layer 2 (i.e. traffic is switched, not routed between members) and sFlow provides detailed visibility into layer two traffic, including: packet sizes, Ethernet protocols, MAC addresses, VLANs, etc.
Leveraging the real-time analytics capabilities of sFlow-RT allows the IX Metrics application to provide up to the second visibility into exchange traffic. The IX Metrics application can also export metrics to InfluxDB to support operations and member facing dashboards (built using Grafana for example).  In addition, export of notification using syslog supports integration with Security Information and Event Management (SIEM) tools like Logstash.

IX Metrics is open source software that can easily be modified to integrate with other tools and additional applications can be installed on the sFlow-RT platform alongside the IX Metrics application, for example:

Wednesday, March 16, 2016

Network and system analytics as a Docker microservice

Microservices describes why the industry standard sFlow instrumentation embedded within cloud infrastructure is uniquely able to provide visibility into microservice deployments.

The sFlow-RT analytics engine is well suited to deployment as a Docker microservice since the application is stateless and presents network and system analytics as a RESTful service.

The following steps demonstrate how to create a containerized deployment of sFlow-RT.

First, create a directory for the project and edit the Dockerfile:
mkdir sflow-rt
cd sflow-rt
vi Dockerfile
Add the following contents to Dockerfile:
FROM   centos:centos6
RUN    yum install -y java-1.7.0-openjdk
RUN    rpm -i http://www.inmon.com/products/sFlow-RT/sflow-rt-2.0-1072.noarch.rpm
EXPOSE 8008 6343/udp
CMD    /etc/init.d/sflow-rt start && tail -f /dev/null
Build the project:
docker build -t sflow-rt .
Run the service:
docker run -p 8008:8008 -p 6343:6343/udp -d sflow-rt
Access the API at http://docker_host:8008/ to verify that the service is running.

Update July 22, 2016: sFlow-RT can now be run from Docker Hub, see sflow/sflow-rt for instructions.

Now configure sFlow agents to send data to the docker_host on port 6343:
The following articles provide examples of using the sFlow-RT REST API:
The diagram shows how new and existing cloud based or locally hosted orchestration, operations, and security tools can leverage sFlow-RT's analytics service to gain real-time visibility. The solution is extremely scaleable, a single sFlow-RT instance can monitor thousands of servers and the network devices connecting them.

Saturday, March 12, 2016

Microservices

Figure 1: Visibility and the software defined data center
In the land of microservices, the network is the king(maker) by Sudip Chakrabarti, Lightspeed Venture Partners, makes the case that visibility into network traffic is the key to monitoring, managing and securing applications that are composed of large numbers of communicating services running in virtual machines or containers.
While I genuinely believe that the network will play an immensely strategic role in the microservices world, inspecting and storing billions of API calls on a daily basis will require significant computing and storage resources. In addition, deep packet inspection could be challenging at line rates; so, sampling, at the expense of full visibility, might be an alternative. Finally, network traffic analysis must be combined with service-level telemetry data (that we already collect today) in order to get a comprehensive and in-depth picture of the distributed application.
Sampling isn't just an alternative, sampling is the key to making large scale microservice visibility a reality. Shrink ray describes how sampling acts as a scaling function, reducing the task of monitoring large scale microservice infrastructure from an intractable measurement and big data problem to a lightweight real-time data center wide visibility solution for monitoring, managing, optimizing and securing the infrastructure.
Figure 2: sFlow Host Structures
Industry standard sFlow is the multi-vendor method for distributed sampling of network traffic. The sFlow standard is model based - models of entities such as interfaces, switches, routers, forwarding state, hosts, virtual machines, messages, etc. are used to define standard measurements that describe their operation. Standardized measurements embedded within the infrastructure ensure consistent reporting that is independent of the specific vendors and application stacks deployed in the data center. Push vs Pull describes how sFlow's push based streaming telemetry addresses the challenge of monitoring large scale cloud environments where services and hosts are constantly being added, removed, started and stopped. In addition, sFlow Host Structures describes how the data model allows telemetry streams from independent sources in network, server and application entities to be combined at the sFlow receiver to provide end to end visibility into the microservice interactions and the compute and networking services on which they depend.

The challenge in delivering network visibility to microservice management tools is not technical - the solution is fully deployable today:
  • Applications - e.g. Apache, NGINX, Tomcat, HAproxy, ADC (F5, A10, ..), Memcache, ...
  • Virtual Servers - e.g. Xen, Hyper-V, KVM, Docker, JVM, ...
  • Virtual Network - e.g. Open vSwitch, Linux Bridge, macvlan, ...
  • Servers - e.g. Linux, Windows, FreeBSD, Solaris, AIX
  • Network - e.g. Cisco Nexus 9k/3k, Arista, Juniper QFX/EX, Dell, HPE, Brocade, Cumulus, Big Switch, Pica8, Quanta, ... – visit sFlow.org for a complete list
Network, system and application teams working together can enable sFlow instrumentation that is already embedded throughout the infrastructure to achieve comprehensive visibility into microservice interactions.
Incorporating sFlow analytics into the microservices architecture is straightforward. The sFlow-RT analytics engine processes the raw telemetry streams, combines data using the data model, and delivers visibility as a REST based microservice that is easily consumed by new and existing cloud based or locally hosted orchestration, operations, and security tools.

Saturday, February 27, 2016

Open vSwitch version 2.5 released

The recent Open vSwitch version 2.5 release includes significant network virtualization enhancements:
   - sFlow agent now reports tunnel and MPLS structures.
   ...
   - Add experimental version of OVN.  OVN, the Open Virtual Network, is a
     system to support virtual network abstraction.  OVN complements the
     existing capabilities of OVS to add native support for virtual network
     abstractions, such as virtual L2 and L3 overlays and security groups.
The sFlow Tunnel Structures specification enhances visibility into network virtualization by capturing encapsulation / decapsulation actions performed by tunnel end points. In many network virtualization implementations VXLAN, GRE, Geneve tunnels are terminate in Open vSwitch and so the new feature has broad application.

The second related feature is the inclusion of the Open Virtual Network (OVN), providing a simple method of building virtual networks for OpenStack and Docker.

The following articles provide additional background:

Friday, February 26, 2016

Linux bridge, macvlan, ipvlan, adapters

The open source Host sFlow project added a feature to efficiently monitor traffic on Linux host network interfaces: network adapters, Linux bridge, macvlan, ipvlan, etc. Implementation of high performance sFlow traffic monitoring is made possible by the inclusion of random packet sampling support in the Berkeley Packet Filter (BPF) implementation in recent Linux kernels (3.19 or later).

In addition to the new BPF capability, hsflowd has a couple of other ways to monitor traffic:
  • iptables, add a statistic rule to the iptables firewall to add traffic monitoring
  • Open vSwitch, has built-in sFlow instrumentation that can be configured by hsflowd.
The BPF sampling mechanism is less complex to configure than iptables and can be used to monitor any Linux network device, including: network adapters (e.g. eth0) and the Linux bridge (e.g. docker0). Monitoring a network adapter also provides visibility into lightweight macvlan and ipvlan network virtualization technologies that are likely to become more prevalent in the Linux container ecosystem, see Using Docker with macvlan Interfaces.

The following commands build and install hsflowd on an Ubuntu 14.03 host:
sudo apt-get update
sudo apt-get install build-essential
sudo apt-get install libpcap-dev
sudo apt-get install git
git clone https://github.com/sflow/host-sflow
cd host-sflow
make
sudo make install
Installing Host sFlow on a Linux server provides basic instructions for configuring the Host sFlow agent (hsflowd). To monitor traffic on the host, edit the /etc/hsflowd.conf file configure the sFlow collector and enable packet sampling on eth0
pcap { dev = eth0 }
Now start the daemon:
sudo hsflowd start
At this point packet traversing eth0 will be sampled and sent out as part of the standard sFlow telemetry stream sent to an sFlow analyzer. For example, using sFlow-RT with the top-flows application as the sFlow analyzer generated the top flows table below.
There a numerous server monitoring agents available in the open source community that will export similar host statistics (CPU, memory, disk) to the Host sFlow agent. Host sFlow differs by also including network traffic visibility using the same packet sampling mechanism supported by most data center switches. Significant advances are extending the visibility into the physical network, for example, Broadcom BroadView Instrumentation tracks buffer utilization and microbursts that effect application performance.
A common standard for monitoring physical and virtual network and server infrastructure reduces operational complexity. Visibility into network activity is critical to understanding the performance of scale out applications that drive large amounts of East-West traffic. Host sFlow, along with support for sFlow in the physical network, delivers scaleable data center wide telemetry to SDN and DevOps tools so that they can better orchestrate the allocation of resources to maximize performance and reduce costs.

Sunday, February 21, 2016

CloudFlare DDoS Mitigation Pipeline

The Usenix Enigma 2016 talk from Marek Majkowski describes CloudFlare's automated DDoS mitigation solution. CloudFlare provides reverse proxy services for millions of web sites and their customers are frequently targets of DDoS attacks. The talk is well worth watching in its entirety to learn about their experiences.
Network switches stream standard sFlow data to CloudFlare's "Gatebot" Reactive Automation component, which analyzes the data to identify attack vectors. Berkeley Packet Filter (BPF) rules are constructed to target specific attacks and apply customer specific mitigation policies. The rules are automatically installed in iptables firewalls on the CloudFlare servers.
The chart shows that over a three month period CloudFlare's mitigation system handled between 30 and 300 attacks per day.
Attack volumes mitigated regularly hit 100 million packers per second and reach peaks of over 150 million packets per second. These large attacks can cause significant damage and automated mitigation is critical to reducing their impact.

Elements of the CloudFlare solution are readily accessible to anyone interested in building DDoS mitigation solutions. Industry standard sFlow instrumentation is widely supported by switch vendors. Download sFlow-RT analytics software and combine real-time DDoS detection with business policies to automate mitigation actions. A number of DDoS mitigation examples are described on this blog that provide useful starting point for implementing an automated DDoS mitigation solution.

Monday, February 1, 2016

SignalFx

SignalFx is an example of a cloud based analytics service. SignalFx provides a REST API for uploading metrics and a web portal that it simple to combine and trend data and build and share dashboards.

This article describes a proof of concept demonstrating how SignalFx's cloud service can be used to cost effectively monitor large scale cloud infrastructure by leveraging standard sFlow instrumentation. SignalFx offers a free 14 day trial, making it easy to evaluate solutions based on this demonstration.

The diagram shows the measurement pipeline. Standard sFlow measurements from hosts, hypervisors, virtual machines, containers, load balancers, web servers and network switches stream to the sFlow-RT real-time analytics engine. Metrics are pushed from sFlow-RT to SignalFx using the REST API.

Over 40 vendors implement the sFlow standard and compatible products are listed on sFlow.org. The open source Host sFlow agent exports standard sFlow metrics from hosts, virtual machines and containers and local services. For additional background, the Velocity conference talk provides an introduction to sFlow and case study from a large social networking site.

SignalFx's service is priced based on the number of data points that they need to store and they estimate a cost of $15 per host per month to record comprehensive host statistics at 10 second granularity. Collecting metrics from a cluster of 1,000 hosts would cost as much as $15,000 per month.
There are important scaleability and cost advantages to placing the sFlow-RT analytics engine in front of the metrics collection service. For example, in large scale cloud environments the metrics for each member of a dynamic pool isn't necessarily worth trending since virtual machines are frequently added and removed. Instead, sFlow-RT tracks all the members of the pool, calculates summary statistics for the pool, and logs the summary statistics. This pre-processing can significantly reduce storage requirements, reducing costs and increasing query performance. The sFlow-RT analytics software also calculates traffic flow metrics, hot/missed Memcache keys, top URLs, exports events via syslog to Splunk, Logstash etc. and provides access to detailed metrics through its REST API.
The following steps were involved in setting up the proof of concept.

First register for free trial at SignalFx.com.

Download and install sFlow-RT.

Create a signalfx.js script in the sFlow-RT home directory with the following lines (use the token from your SignalFx account):
var url = "https://ingest.signalfx.com/v2/datapoint";
var token = "YOUR_APP_API_TOKEN";

setIntervalHandler(function() {
  var metrics = ['min:load_one','q1:load_one','med:load_one',
                 'q3:load_one','max:load_one'];
  var vals = metric('ALL',metrics,{os_name:['linux']});
  var gauges = [];
  for each (var val in vals) {
     gauges.push({
       metric: val.metricName.replace(/[^a-zA-Z0-9_]/g,"_"),
       dimensions:{cluster:"Linux"},
       value: val.metricValue
     });
  }
  var body = {"gauge":gauges};
  var req = {
    url:url,
    operation:'post',
    headers: {
      'Content-Type':'application/json',
      'X-SF-TOKEN':token
    },
    body: JSON.stringify(body)
  };
  try { http2(req); }
  catch(e) { logWarning("metric upload failed " + e); }
} , 10); 
Add the following sFlow-RT configuration entry to load the script:
script.file=signalfx.js
Now start sFlow-RT. Cluster performance metrics describes the summary metrics that sFlow-RT can calculate. In this case, the load average minimum, maximum, and quartiles for the cluster are being calculated and pushed to SignalFx every minute.

Install Host sFlow agents on the physical or virtual machines in your cluster and direct them to send metrics to the sFlow-RT host. The installation steps can be easily automated using orchestration tools like Puppet, Chef, Ansible, etc.

Physical and virtual switches in the cluster can be configured to send sFlow to sFlow-RT in order to add traffic metrics to the mix, exporting metrics that characterizing traffic between service tiers etc. However, in public cloud environments, traffic flow information is typically not available. The articles, Amazon Elastic Compute Cloud (EC2) and Rackspace cloudservers describe how Host sFlow agents can be configured to monitor traffic between virtual machines in the cloud.
Metrics should start appearing in SignalFx as soon as the Host sFlow agents are started.

In this example, sFlow-RT is exporting 5 metrics to summarize the cluster performance, reducing the total monthly cost of monitoring the 1,000 host cluster to less than $15 per month. Of course there are likely to be more metrics that you will want to track, but the ability to selectively log high value metrics provides a way to control costs and maximize benefits.

If you are managing physical infrastructure then sFlow provides a simple way to incorporate network telemetry. For example, add the following metrics to the script to summarize network health:
  • max:ifinutilization
  • max:ifoututilization
  • sum:ifindiscards
  • sum:ifinerrors
  • sum:ifoutdiscards
  • sum:ifouterrors
A network connecting 1,000 physical hosts would have considerably more than 1,000 switch ports and summarizing the per port statistics greatly reduces the cost of monitoring the network. For a catalog of network, host, and application metrics, see Metrics.