Thursday, June 26, 2014

Docker performance monitoring

IT’S HERE: DOCKER 1.0 recently announced the first production release of the Docker Linux container platform. Docker is seeing explosive growth and has already been embraced by IBM, RedHat and RackSpace. Today the open source Host sFlow project released support for Docker, exporting standard sFlow performance metrics for Linux containers and unifying Linux containers with the broader sFlow ecosystem.
Visibility and the software defined data center
Host sFlow Docker support simplifies data center performance management by unifying monitoring of Linux containers with monitoring of virtual machines (Hyper-V, KVM/libvirt, Xen/XCP/XenServer), virtual switches (Open vSwitch, Hyper-V Virtual Switch, IBM Distributed Virtual Switch, HP FlexFabric Virtual Switch), servers (Linux, Windows, Solaris, AIX, FreeBSD), and physical networks (over 40 vendors, including: A10, Arista, Alcatel-Lucent, Arista, Brocade, Cisco, Cumulus, Extreme, F5, Hewlett-Packard, Hitachi, Huawei, IBM, Juniper, Mellanox, NEC, ZTE). In addition, standardizing metrics allows allows measurements to be shared among different tools, further reducing operational complexity.


The talk provides additional background on the sFlow standard and case studies. The remainder of this article describes how to use Host sFlow to monitor a Docker server pool.

First, download, compile and install the Host sFlow agent on a Docker host (Note: The agent needs to be built from sources since Docker support is currently in the development branch):
svn checkout http://svn.code.sf.net/p/host-sflow/code/trunk host-sflow-code
cd host-sflow-code
make DOCKER=yes
make install
make schedule
service hsflowd start
Next, if SE Linux is enabled, run the following commands to allow Host sFlow to retrieve network stats (or disable SE Linux):
audit2allow -a -M hsflowd
semodule -i hsflowd.pp
See Installing Host sFlow on a Linux server for additional information on configuring the agent.


The slide presentation describes how Docker can be used with Open vSwitch to create virtual networks connecting containers. In addition to providing advanced SDN capabilities, the Open vSwitch includes sFlow instrumentation, providing detailed visibility into network traffic between containers and to the outside network.

The Host sFlow agent makes it easy to enable sFlow on Open vSwitch. Simply enable the sflowovd daemon and Host sFlow configuration settings will be automatically applied to the Open vSwitch.
service sflowovsd start
There are a number of tools that consume and report on sFlow data and these should be able to report on Docker since the metrics being reported are the same standard set reported for virtual machines. Here are a few examples from this blog:
Looking at the big picture, the comprehensive visibility of sFlow combined with the agility of SDN and Docker lays the foundation for optimized workload placement, resource allocation, and scaling by the orchestration system, maximizing the utility of the physical network, storage and compute infrastructure.

Tuesday, June 24, 2014

Microsoft Office 365 outage

6/24/2014 Information Week - Microsoft Exchange Online Suffers Service Outage, "Service disruptions with Microsoft's Exchange Online left many companies with no email on Tuesday."

The following entry on the Microsoft 365 community forum describes the incident:
====================================

Closure Summary: On Tuesday, June 24, 2014, at approximately 1:11 PM UTC, engineers received reports of an issue in which some customers were unable to access the Exchange Online service. Investigation determined that a portion of the networking infrastructure entered into a degraded state. Engineers made configuration changes on the affected capacity to remediate end-user impact. The issue was successfully fixed on Tuesday, June 24, 2014, at 9:50 PM UTC.

Customer Impact: Affected customers were unable to access the Exchange Online service.

Incident Start Time: Tuesday, June 24, 2014, at 1:11 PM UTC

Incident End Time: Tuesday, June 24, 2014, at 9:50 PM UTC

=====================================
The closure summary shows that operators took 8 hour 39 minutes to manually diagnose and remediate the problem with degraded networking infrastructure. The network related outage described in this example is not an isolated incident; other incidents described on this blog include: Packet lossAmazon EC2 outageGmail outageDelay vs utilization for adaptive control, and Multi-tenant performance isolation.

The incidents demonstrate two important points:
  1. Cloud services are critically dependent on the physical network
  2. Manually diagnosing problems in large scale networks is a time consuming process that results in extended service outages.
The article, SDN fabric controller for commodity data center switches, describes how the performance and resilience of the physical core can be enhanced through automation. The SDN fabric controller leverages the measurement and control capabilities of commodity switches to rapidly detect and adapt to changing traffic, reducing response times from hours to seconds.

Monday, June 9, 2014

RESTful control of Cumulus Linux ACLs

Figure 1: Elephants and Mice
Elephant Detection in Virtual Switches & Mitigation in Hardware discusses a VMware and Cumulus demonstration, Elephants and Mice, in which the virtual switch on a host detects and marks large "Elephant" flows and the hardware switch enforces priority queueing to prevent Elephant flows from adversely affecting latency of small "Mice" flows.

This article demonstrates a self contained real-time Elephant flow marking solution that leverages the visibility and control features of Cumulus Linux.

SDN fabric controller for commodity data center switches provides some background on the capabilities of the commodity switch hardware used to run Cumulus Linux. The article describes how the measurement and control capabilities of the hardware can be used to maximize data center fabric performance:
Exposing the ACL configuration files through a RESTful API offers a straightforward method of remotely creating, reading, updating, deleting and listing ACLs.

For example, the following command creates a filter called ddos1 to drop a DNS amplification attack:
curl -H "Content-Type:application/json" -X PUT --data \
'["[iptables]",\
"-A FORWARD --in-interface swp+ -d 10.10.100.10 -p udp --sport 53 -j DROP"]' \
http://10.0.0.233:8080/acl/ddos1
The filter can be retrieved:
curl http://10.0.0.233:8080/acl/ddos1
The following command lists the filter names:
curl http://10.0.0.233:8080/acl/
The filter can be deleted:
curl -X DELETE http://10.0.0.233:8080/acl/ddos1
Finally, all filters can be deleted:
curl -X DELETE http://10.0.0.233:8080/acl/
Running the following Python script on the Cumulus switches provides a simple proof of concept implementation of the REST API:
#!/usr/bin/env python

from BaseHTTPServer import BaseHTTPRequestHandler,HTTPServer
from os import listdir,remove
from os.path import isfile
from json import dumps,loads
from subprocess import Popen,STDOUT,PIPE
import re

class ACLRequestHandler(BaseHTTPRequestHandler):
  uripat = re.compile('^/acl/([a-z0-9]+)$')
  dir = '/etc/cumulus/acl/policy.d/'
  priority = '50'
  prefix = 'rest-'
  suffix = '.rules'
  filepat = re.compile('^'+priority+prefix+'([a-z0-9]+)\\'+suffix+'$')

  def commit(self):
    Popen(["cl-acltool","-i"],stderr=STDOUT,stdout=PIPE).communicate()[0]

  def aclfile(self,name):
    return self.dir+self.priority+self.prefix+name+self.suffix

  def wheaders(self,status):
    self.send_response(status)
    self.send_header('Content-Type','application/json')
    self.end_headers() 

  def do_PUT(self):
    m = self.uripat.match(self.path)
    if None != m:
       name = m.group(1)
       len = int(self.headers.getheader('content-length'))
       data = self.rfile.read(len)
       lines = loads(data)
       fn = self.aclfile(name)
       f = open(fn,'w')
       f.write('\n'.join(lines) + '\n')
       f.close()
       self.commit()
       self.wheaders(201)
    else:
       self.wheaders(404)
 
  def do_DELETE(self):
    m = self.uripat.match(self.path)
    if None != m:
       name = m.group(1)
       fn = self.aclfile(name)
       if isfile(fn):
          remove(fn)
          self.commit()
       self.wheaders(204)
    elif '/acl/' == self.path:
       for file in listdir(self.dir):
         m = self.filepat.match(file)
         if None != m:
           remove(self.dir+file)
       self.commit()
       self.wheaders(204)
    else:
       self.wheaders(404)

  def do_GET(self):
    m = self.uripat.match(self.path)
    if None != m:
       name = m.group(1)
       fn = self.aclfile(name)
       if isfile(fn):
         result = [];
         with open(fn) as f:
           for line in f:
              result.append(line.rstrip('\n'))
         self.wheaders(200)
         self.wfile.write(dumps(result))
       else:
         self.wheaders(404)
    elif '/acl/' == self.path:
       result = []
       for file in listdir(self.dir):
         m = self.filepat.match(file)
         if None != m:
           name = m.group(1)
           result.append(name)
       self.wheaders(200)
       self.wfile.write(dumps(result))
    else:
       self.wheaders(404)

if __name__ == '__main__':
  server = HTTPServer(('',8080), ACLRequestHandler) 
  server.serve_forever()
Some notes on building a production ready solution:
  1. Add authentication
  2. Add error handling
  3. Script needs to run as a daemon
  4. Scaleability could be improved by asynchronously committing rules in batches 
  5. Latency could be improved through use of persistent connections (SPDY, websocket)
Update December 11, 2014: An updated version of the script is now available on GitHub at https://github.com/pphaal/acl_server/

The following sFlow-RT controller application implements large flow marking using sFlow measurements from the switch and control of ACLs using the REST API:
include('extras/json2.js');

// Define large flow as greater than 100Mbits/sec for 1 second or longer
var bytes_per_second = 100000000/8;
var duration_seconds = 1;

var id = 0;
var controls = {};

setFlow('tcp',
 {keys:'ipsource,ipdestination,tcpsourceport,tcpdestinationport',
  value:'bytes', filter:'direction=ingress', t:duration_seconds}
);

setThreshold('elephant',
 {metric:'tcp', value:bytes_per_second, byFlow:true, timeout:4,
  filter:{ifspeed:[1000000000]}}
);

setEventHandler(function(evt) {
 if(controls[evt.flowKey]) return;

 var rulename = 'mark' + id++;
 var keys = evt.flowKey.split(',');
 var acl = [
'[iptables]',
'# mark Elephant',
'-t mangle -A FORWARD --in-interface swp+ -s ' + keys[0] + ' -d ' + keys[1] 
+ ' -p tcp --sport ' + keys[2] + ' --dport ' + keys[3]
+ ' -j SETQOS --set-dscp 10 --set-cos 5'
 ];
 http('http://'+evt.agent+':8080/acl/'+rulename,
      'put','application/json',JSON.stringify(acl));
 controls[evt.flowKey] = {
   agent:evt.agent,
   dataSource:evt.dataSource,
   rulename:rulename,
   time: (new Date()).getTime()
 };
},['elephant']);

setIntervalHandler(function() {
  for(var flowKey in controls) {
    var ctx = controls[flowKey];
    var val = flowValue(ctx.agent,ctx.dataSource + '.tcp',flowKey);
    if(val < 100) {
      http('http://'+ctx.agent+':8080/acl/'+ctx.rulename,'delete');
      delete controls[flowKey]; 
    }
  }
},5);
The following command line argument load the script:
-Dscript.file=clmark.js
Some notes on the script:
  1. The 100Mbits/s threshold for large flows was selected because it represents 10% of the bandwidth of the 1Gigabit access ports on the network
  2. The setFlow filter specifies ingress flows since the goal is to mark flows as they enter the network
  3. The setThreshold filter specifies that thresholds are only applied to 1Gigabit access ports
  4. The event handler function triggers when new Elephant flows are detected, creating and installing an ACL to mark packets in the flow with a dscp value of 10 and a cos value of 5
  5. The interval handler function runs every 5 seconds and removes ACLs for flows that have completed
The iperf tool can be used to generate a sequence of large flows to test the controller:
while true; do iperf -c 10.100.10.152 -i 20 -t 20; sleep 20; done
The following screen capture shows a basic test setup and results:
The screen capture shows a mixture of small flows "mice" and large flows "elephants" generated by a server connected to an edge switch (in this case a Penguin Computing Arctica switch running Cumulus Linux). The graph at the bottom right shows the mixture of unmarked large and small flows arriving at the switch. The sFlow-RT controller receives a stream of sFlow measurements from the switch and detects each elephant flows in real-time, immediately installing an ACL that matches the flow and instructs the switch to mark the flow by setting the DSCP value. The traffic upstream of the switch is shown in the top right chart and it can be clearly seen that each elephant flow has been identified and marked, while the mice have been left unmarked.

Thursday, June 5, 2014

Cumulus Networks, sFlow and data center automation

Cumulus Networks and InMon Corp have ported the open source Host sFlow agent to the upcoming Cumulus Linux 2.1 release. The Host sFlow agent already supports Linux, Windows, FreeBSD, Solaris, and AIX operating systems and KVM, Xen, XCP, XenServer, and Hyper-V hypervisors, delivering a standard set of performance metrics from switches, servers, hypervisors, virtual switches, and virtual machines - see Visibility and the software defined data center

The Cumulus Linux platform makes it possible to run the same open source agent on switches, servers, and hypervisors - providing unified end-to-end visibility across the data center. The open networking model that Cumulus is pioneering offers exciting opportunities. Cumulus Linux allows popular open source server orchestration tools to also manage the network, and the combination of real-time, data center wide analytics with orchestration make it possible to create self-optimizing data centers.

Install and configure Host sFlow agent

The following command installs the Host sFlow agent on a Cumulus Linux switch:
sudo apt-get install hsflowd
Note: Network managers may find this command odd since it is usually not possible to install third party software on switch hardware. However, what is even more radical is that Cumulus Linux allows users to download source code and compile it on their switch. Instead of being dependent on the switch vendor to fix a bug or add a feature, users are free to change the source code and contribute the changes back to the community.

The sFlow agent requires very little configuration, automatically monitoring all switch ports using the following default settings:

Link SpeedSampling RatePolling Interval
1 Gbit/s1-in-1,00030 seconds
10 Gbit/s1-in-10,00030 seconds
40 Gbit/s1-in-40,00030 seconds
100 Gbit/s1-in-100,00030 seconds

Note: The default settings ensure that large flows (defined as consuming 10% of link bandwidth) are detected within approximately 1 second - see Large flow detection

Once the Host sFlow agent is installed, there are two alternative configuration mechanisms that can be used to tell the agent where to send the measurements:

1. DNS Service Discovery (DNS-SD)

This is the default configuration mechanism for Host sFlow agents. DNS-SD uses a special type of DNS record (the SRV record) to allow hosts to automatically discover servers. For example, adding the following line to the site DNS zone file will enable sFlow on all the agents and direct the sFlow measurements to an sFlow analyzer (10.0.0.1):
_sflow._udp 300 SRV 0 0 10.0.0.1
No Host sFlow agent specific configuration is required, each switch or host will automatically pick up the settings when the Host sFlow agent is installed, when the device is restarted, or if settings on the DNS server are changed.

Default sampling rates and polling interval can be overridden by adding a TXT record to the zone file. For example, the following TXT record reduces the sampling rate on 10G links to 1-in-2000 and the polling interval to 20 seconds:
_sflow._udp 300 TXT (
"txtvers=1"
"sampling.10G=2000"
"polling=20"
)
Note: Currently defined TXT options are described on sFlow.org.

The article DNS-SD describes how DNS service discovery allows sFlow agents to automatically discover their configuration settings. The slides DNS Service Discovery from a talk at the SF Bay Area Large Scale Production Engineering Meetup provide additional background.

 2. Configuration File

The Host sFlow agent is configured by editing the /etc/hsflowd.conf file. For example, the following configuration disables DNS-SD, instructs the agent to send sFlow to 10.0.0.1, reduces the sampling rate on 10G links to 1-in-2000 and the polling interval to 20 seconds:
sflow {
  DNSSD = off

  polling = 20
  sampling.10G = 2000
  collector {
    ip = 10.0.0.1
  }
}
The Host sFlow agent must be restarted for configuration changes to take effect:
sudu /etc/init.d/hsflowd restart
All hosts and switches can share the same settings and it is straightforward to use orchestration tools such as Puppet, Chef, etc. to manage the sFlow settings.

Collecting and analyzing sFlow

Figure 1: Visibility and the software defined data center
Figure 1 shows the general architecture of sFlow monitoring. Standard sFlow agents embedded within the elements of the infrastructure, stream essential performance metrics to management tools, ensuring that every resource in a dynamic cloud infrastructure is immediately detected and continuously monitored.

  • Applications -  e.g. Apache, NGINX, Tomcat, Memcache, HAProxy, F5, A10 ...
  • Virtual Servers - e.g. Xen, Hyper-V, KVM ...
  • Virtual Network - e.g. Open vSwitch, Hyper-V extensible vSwitch
  • Servers - e.g. BSD, Linux, Solaris and Windows
  • Network - over 40 switch vendors, see Drivers for growth

The sFlow data from a Cumulus switch contains standard Linux performance statistics in addition to the interface counters and packet samples that you would typically get from a networking device.

Note: Enhanced visibility into host performance is important on open switch platforms since they may be running a number of user installed services that can stress the limited CPU, memory and IO resources.

For example, the following sflowtool output shows the raw data contained in an sFlow datagram from a switch running Cumulus Linux:
startDatagram =================================
datagramSourceIP 10.0.0.160
datagramSize 1332
unixSecondsUTC 1402004767
datagramVersion 5
agentSubId 100000
agent 10.0.0.233
packetSequenceNo 340132
sysUpTime 17479000
samplesInPacket 7
startSample ----------------------
sampleType_tag 0:2
sampleType COUNTERSSAMPLE
sampleSequenceNo 876
sourceId 2:1
counterBlock_tag 0:2001
adaptor_0_ifIndex 2
adaptor_0_MACs 1
adaptor_0_MAC_0 6c641a000459
counterBlock_tag 0:2005
disk_total 0
disk_free 0
disk_partition_max_used 0.00
disk_reads 980
disk_bytes_read 4014080
disk_read_time 1501
disk_writes 0
disk_bytes_written 0
disk_write_time 0
counterBlock_tag 0:2004
mem_total 2056589312
mem_free 1100533760
mem_shared 0
mem_buffers 33464320
mem_cached 807546880
swap_total 0
swap_free 0
page_in 35947
page_out 0
swap_in 0
swap_out 0
counterBlock_tag 0:2003
cpu_load_one 0.390
cpu_load_five 0.440
cpu_load_fifteen 0.430
cpu_proc_run 1
cpu_proc_total 95
cpu_num 2
cpu_speed 0
cpu_uptime 770774
cpu_user 160600160
cpu_nice 192970
cpu_system 77855100
cpu_idle 1302586110
cpu_wio 4650
cpuintr 0
cpu_sintr 308370
cpuinterrupts 1851322098
cpu_contexts 800650455
counterBlock_tag 0:2006
nio_bytes_in 405248572711
nio_pkts_in 394079084
nio_errs_in 0
nio_drops_in 0
nio_bytes_out 406139719695
nio_pkts_out 394667262
nio_errs_out 0
nio_drops_out 0
counterBlock_tag 0:2000
hostname cumulus
UUID fd-01-78-45-93-93-42-03-a0-5a-a3-d7-42-ac-3c-de
machine_type 7
os_name 2
os_release 3.2.46-1+deb7u1+cl2+1
endSample   ----------------------
startSample ----------------------
sampleType_tag 0:2
sampleType COUNTERSSAMPLE
sampleSequenceNo 876
sourceId 0:44
counterBlock_tag 0:1005
ifName swp42
counterBlock_tag 0:1
ifIndex 44
networkType 6
ifSpeed 0
ifDirection 2
ifStatus 0
ifInOctets 0
ifInUcastPkts 0
ifInMulticastPkts 0
ifInBroadcastPkts 0
ifInDiscards 0
ifInErrors 0
ifInUnknownProtos 4294967295
ifOutOctets 0
ifOutUcastPkts 0
ifOutMulticastPkts 0
ifOutBroadcastPkts 0
ifOutDiscards 0
ifOutErrors 0
ifPromiscuousMode 0
endSample   ----------------------
startSample ----------------------
sampleType_tag 0:1
sampleType FLOWSAMPLE
sampleSequenceNo 1022129
sourceId 0:7
meanSkipCount 128
samplePool 130832512
dropEvents 0
inputPort 7
outputPort 10
flowBlock_tag 0:1
flowSampleType HEADER
headerProtocol 1
sampledPacketSize 1518
strippedBytes 4
headerLen 128
headerBytes 6C-64-1A-00-04-5E-E8-E7-32-77-E2-B5-08-00-45-00-05-DC-63-06-40-00-40-06-9E-21-0A-64-0A-97-0A-64-14-96-9A-6D-13-89-4A-0C-4A-42-EA-3C-14-B5-80-10-00-2E-AB-45-00-00-01-01-08-0A-5D-B2-EB-A5-15-ED-48-B7-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35
dstMAC 6c641a00045e
srcMAC e8e73277e2b5
IPSize 1500
ip.tot_len 1500
srcIP 10.100.10.151
dstIP 10.100.20.150
IPProtocol 6
IPTOS 0
IPTTL 64
TCPSrcPort 39533
TCPDstPort 5001
TCPFlags 16
endSample   ----------------------
While sflowtool is extremely useful, there are many other open source and commercial tools available, including:
Note: The sFlow Collectors list on sFlow.org contains a number of additional tools.

There is a great deal of variety among sFlow collectors - many focus on the network, others have a compute infrastructure focus, and yet others report on application performance. The shared sFlow measurement infrastructure delivers value in each of these areas. However, as network, storage, host and application resources are brought together and automated to create cloud data centers, a new set of sFlow analytics tools is emerging to deliver the integrated real-time visibility required to drive automation and optimize performance and efficiency across the data center.
While network administrators are likely to be familiar with sFlow, application development and operations teams may be unfamiliar with the technology. The 2012 O'Reilly Velocity conference talk provides an introduction to sFlow aimed at the DevOps community.
Cumulus Linux presents the switch as a server with a large number of network adapters, an abstraction that will be instantly familiar to anyone with server management experience. For example, displaying interface information on Cumulus Linux uses the standard Linux command:
ifconfig swp2
On the other hand, network administrators experienced with switch CLIs may find that Linux commands take a little time to get used to - the above command is roughly equivalent to:
show interfaces fastEthernet 6/1
However, the basic concepts of networking don't change and these skills are essential to designing, automating, operating and troubleshooting data center networks. Open networking platforms such as Cumulus Linux are an important piece of the automation puzzle, taking networking out of its silo and allowing a combined NetDevOps team to manage network, server, and application resources using proven monitoring and orchestration tools such as Ganglia, Graphite, Nagios, CFEngine, Puppet, Chef, Ansible, and Salt.