Saturday, May 12, 2012

Scripting languages


The Host sFlow distributed agent article describes how sFlow agents, embedded in applications such as Apache, NGINX, Tomcat, Memcached and Java, coordinate with the Host sFlow daemon (hsflowd) in order to monitor server and application performance.

Implementing sFlow monitoring natively in widely used applications is worth the effort, since the result is a highly scaleable, easily deployed solution with minimal impact on performance. However, this approach is overkill for monitoring application logic implemented in scripting languages such as PHP, Python, Ruby or Perl.

Recently, a simple JSON API was added to hsfowd that makes it easy to integrate performance monitoring in scripting languages. The API is an implementation of the sFlow Application Structures draft, which defines a generalized set of application layer performance metrics.

Note: Even if you instrument your scripted application logic, you should still enable sFlow in the web server since the HTTP information exported from the web server complements application metrics to provide a more complete picture of performance, see HTTP.

Currently the API is only implemented in the Linux trunk. To try it out, you will need to check out the trunk and build hsflowd from sources:

svn co https://host-sflow.svn.sourceforge.net/svnroot/host-sflow/trunk host-sflow
cd host-sflow
make
make install
make schedule

Note: See Installing Host sFlow on a Linux server for additional configuration information. The JSON API will be included in the upcoming Host sFlow 1.21 release.

Uncommenting the following entry in the /etc/hsflowd.conf file opens a UDP port to receive JSON encoded metrics:

jsonPort = 36343

Note: It is recommended that you keep the default port, 36343, since any change to the port number will require a corresponding change in any scripts that send JSON messages. The jsonPort value is written into the /etc/hsflowd.auto file so that it is available to client scripts, along with other sFlow settings, see Host sFlow distributed agent. The hsflowd daemon will only accept messages generated on the same host which shouldn't be a problem since hsflowd should be installed on every host to report host performance metrics.

The following is an example of the type of JSON message that a script can send to describe the outcome of a transaction:

{"flow_sample":{
  "app_name":"myapp",
  "app_operation":{
    "operation":"user.friend",
    "attributes":"id=123&handle=sith",
    "status_descr":"OK",
    "status":0,
    "req_bytes":43,
    "resp_bytes":234,
    "uS":2000},
  "app_initiator":{"actor":"123"},
  "app_target":{"actor":"231"},
  "extended_socket_ipv4":{
    "protocol":6,
    "local_ip":"10.0.0.1",
    "remote_ip":"10.0.0.23",
    "local_port":123,
    "remote_port":43032}
}}

Note: The names of the structures and attributes in the JSON message mirror the structures defined in sFlow Application Structures.

Many of the structures and attributes are optional, a message can be as simple as:

{"flow_sample:{
  "app_name":"myapp",
  "app_operation":{
    "operation":"user.friend"
  }
}

Constructing and sending JSON messages as UDP datagrams to hsflowd is straightforward. For example, the following PHP app_operation function formats and sends the previous JSON message.

function app_operation($app_name,
                       $op_name,
                       $attributes="",
                       $status=0,
                       $status_descr="",
                       $req_bytes=0,
                       $resp_bytes=0,
                       $uS=0,
                       $sampling_rate=1) {
  if($sampling_rate > 1) {
    if(mt_rand(1,$sampling_rate) != 1) { return; }
  }

  try {
     $sock = fsockopen("udp://localhost",36343,$errno,$errstr);
     if(! $sock) { return; }
     fwrite($sock, '{"flow_sample":{
 "app_name":"'.$app_name.'",
 "sampling_rate":'.$sampling_rate.',
 "app_operation":{
   "operation":"'.$op_name.'",
   "attributes":"'.$attributes.'",
   "status_descr":"'.$status_descr.'",
   "status":'.$status.',
   "req_bytes":'.$req_bytes.',
   "resp_bytes":'.$resp_bytes.',
   "uS":'.$uS.'}}}');
     fclose($sock);
  } catch(Exception $e) {}
}

Note: This function supports transaction sampling, i.e. if you set a sampling_rate of 10 then there is a 1-in-10 chance that the operation will actually generate a measurement. Sampling allows you to reduce the measurement overhead in high transaction rate environments and still generate useful results. You should set a sampling_rate that reduces the impact of monitoring on your application to acceptable levels, although, you probably don't need to sample at all unless you are handling hundreds of operations per second. Choose a fixed sampling_rate that works for your application - choose the lowest sampling_rate that protects application performance - as you will see later, a low sampling_rate setting will allow hsflowd to maintain more accurate counters and provide a greater range of sampling rates when it exports the data.

Including the app_operation function in your PHP application library makes instrumenting PHP application logic as simple as including a single line of code in a PHP rendered web page:

<?php app_operation("myapp","user.friend"); ?>

When hsflowd receives a JSON message, it increments per application performance counters (scaled by the sampling rate if needed). The counters are periodically exported along with the other sFlow metrics (CPU, memory, disk and network I/O) that hsflowd exports.

The following output from sflowtool shows the contents of an sFlow datagram containing application counters:

startDatagram =================================
datagramSourceIP 127.0.0.1
datagramSize 112
unixSecondsUTC 1336846918
datagramVersion 5
agentSubId 100000
agent 10.0.0.150
packetSequenceNo 2670
sysUpTime 77514000
samplesInPacket 1
startSample ----------------------
sampleType_tag 0:2
sampleType COUNTERSSAMPLE
sampleSequenceNo 4
sourceId 3:150002
counterBlock_tag 0:2202
application myapp
status_OK 23
errors_OTHER 0
errors_TIMEOUT 0
errors_INTERNAL_ERROR 0
errors_BAD_REQUEST 0
errors_FORBIDDEN 0
errors_TOO_LARGE 0
errors_NOT_IMPLEMENTED 0
errors_NOT_FOUND 0
errors_UNAVAILABLE 0
errors_UNAUTHORIZED 0
endSample   ----------------------
endDatagram   =================================

The application counters can be sent to tools like Ganglia and Graphite in order to trend the performance of individual application instances, or of whole clusters of applications.

The transactions are also sampled by hsflowd based on the sampling setting in hsflowd.conf, or DNS-SD. Specific sampling rates can be set based on application name, for example, to override the default sampling rate of 400 and apply a sampling rate of 1-in-100 to the myapp, use the following setting:

sampling.app.myapp=100

Note: If the transactions were sampled before being sent to hsflowd, then they will be sub-sampled to achieve the target sampling rate. For example, if the script used a sampling rate of 1-in-10, then hsflowd would apply a 1-in-10 sampling operation in order to achieve the desired 1-in-100 sampling rate.

The following output from sflowtool shows the contents of an sFlow datagram containing an application transaction sample:

startDatagram =================================
datagramSourceIP 127.0.0.1
datagramSize 136
unixSecondsUTC 1336846925
datagramVersion 5
agentSubId 100000
agent 10.0.0.150
packetSequenceNo 2671
sysUpTime 77521000
samplesInPacket 1
startSample ----------------------
sampleType_tag 0:1
sampleType FLOWSAMPLE
sampleSequenceNo 1
sourceId 3:150002
meanSkipCount 10
samplePool 10
dropEvents 0
inputPort 0
outputPort 1073741823
flowBlock_tag 0:2202
flowSampleType applicationOperation
application myapp
operation user.friend
request_bytes 0
response_bytes 0
status SUCCESS
duration_uS 0
endSample   ----------------------
endDatagram   =================================

The transaction samples provide details that complement the counter samples. For example, if you were to see a rise in the errors_TIMEOUT rate, you could look at the transaction samples and determine the operations associated with the timeouts.

Note: Anyone familiar with Etsy's StatsD tool will see a similarity in the way sFlow monitoring is embedded in scripts, see Measure Anything, Measure Everything. The main difference is that sFlow application measurements contain additional structure that allows them to be part of a large scale monitoring system linking network switches, hosts and applications together. In addition, sFlow's inclusion of sampled transaction records allows metrics to be broken out into fine detail, making it possible to see how application instances interact and get to the root cause of performance problems.

Finally, the application metrics extension to the sFlow standard and the implementation in hsflowd are still in the early stages. Please try them out and provide feedback. Any comments or suggestions regarding the sFlow metrics should be directed to the sFlow.org mailing list and comments or questions relating to the scripting API should be directed to the host-sflow mailing list.

Monday, May 7, 2012

Tunnels

Figure 1: Network virtualization using tunnels
Layer 3/4 tunnels (GRE, VxLAN, STT, CAPWAP, NVGRE etc.) can be used to virtualize network services so that communication between virtual machines can be provisioned and controlled without dependencies on the underlying network.

Figure 1 shows the basic elements of a typical virtual machine networking stack (VMware, Hyper-V, XenServer, Xen, KVM etc.). Each virtual machine is connected to a software virtual switch using virtual network adapters. The virtual switch delivers packets based on destination MAC address (just like a physical switch). For example, when VM1 sends a packet to VM2, it will create an IP packet with source address vIP 1 and destination address vIP 2. The network stack on VM1 will create an Ethernet frame with source address vMAC 1 and destination address vMAC 2 with the IP packet as the frame payload. The virtual switch on Server 1 receives the Ethernet frame, examines the destination MAC address, and delivers the frame to the virtual network adapter corresponding to vMAC 2, which delivers the frame to VM2. The challenge is ensuring that virtual machines on different servers can communicate while minimizing dependencies on the underlying physical network.

Setting up a L3/4 tunnel between Server 1 and Server 2 (with tunnel endpoint addresses IP 1 and IP 2 respectively) limits the dependency on the physical infrastructure to IP connectivity between the two servers. For example, when VM1 sends a packet to VM3, the virtual switch on Server 1 recognizes that the packet is destined to a VM on Server 2 and sends the packet through the tunnel. On entering the tunnel, the original Ethernet frame from VM1 is encapsulated and the resulting IP packet is sent to Server 2. When Server 2 receives the packet, it extracts the original Ethernet frame, hands it to the virtual switch, which delivers the frame to VM3.

Network virtualization is particularly important in cloud environments where tenants need to be isolated from each other, but still share the same physical infrastructure.

Figure 2: Nicira's Distributed Virtual Network Infrastructure (DVNI)
Nicira's Distributed Virtual Network Infrastructure (DVNI) architecture, shown in Figure 2, is a good example of network virtualization using a Tunnel Mesh to connect virtual switches and overlay multiple Virtual Networks on the shared Physical Fabric. The Controller Cluster manages the virtual switches, setting up tunnels and controlling forwarding behavior.

Note: The Controller Cluster uses the OpenFlow protocol to configure virtual switches, making this an example of Software Defined Networking (SDN).

The importance of visibility in managing virtualized environments has been a constant theme on this blog, see Network visibility in the data center, System boundary and NUMA. The question is, how do you maintain visibility when tunneling is used for network virtualization?

The remainder of this article describes how the widely supported sFlow standard provides detailed visibility into tunneled traffic. You might be surprised to know that every sFlow enabled switch produced in the last 10 years is fully capable of reporting on L3/4 tunneled traffic, in spite of the fact that there is no mention of VxLAN, GRE, etc. in any of the sFlow standard documents.

The key to sFlow's adaptability is that switches export packet headers, leaving it to sFlow analysis software to decode the packet headers and report on traffic, see Choosing an sFlow analyzer.  The templates needed to extract tunnel information from packet headers are described in the Internet Drafts and RFCs that define the various tunneling protocols. For example, the following packet diagram from the internet draft VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks describes the fields present in a VxLAN packet header:

            0                   1                   2                   3
            0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     
        Outer Ethernet Header:             |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |             Outer Destination MAC Address                     |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           | Outer Destination MAC Address | Outer Source MAC Address      |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |                Outer Source MAC Address                       |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       Optional Ethertype = C-Tag 802.1Q   | Outer.VLAN Tag Information    |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           | Ethertype 0x0800              |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        Outer IP Header:
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |Version|  IHL  |Type of Service|          Total Length         |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |         Identification        |Flags|      Fragment Offset    |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |  Time to Live |    Protocol   |         Header Checksum       |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |                       Outer Source Address                    |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |                   Outer Destination Address                   |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         Outer UDP Header:
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |       Source Port = xxxx      |       Dest Port = VXLAN Port  |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |           UDP Length          |        UDP Checksum           |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         VXLAN Header:
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |R|R|R|R|I|R|R|R|            Reserved                           |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |                VXLAN Network Identifier (VNI) |   Reserved    |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

             0                   1                   2                   3
             0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     
      Inner Ethernet Header:             |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Inner Destination MAC Address                     |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            | Inner Destination MAC Address | Inner Source MAC Address      |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |                Inner Source MAC Address                       |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     Optional Ethertype = C-Tag [802.1Q]    | Inner.VLAN Tag Information    |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     Payload:
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            | Ethertype of Original Payload |                               |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
            |                                  Original Ethernet Payload    |
            |                                                               |
            | (Note that the original Ethernet Frame's FCS is not included) |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          Frame Check Sequence:
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |   New FCS (Frame Check Sequence) for Outer Ethernet Frame     |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Examining the template, the VxLAN header captured by sFlow includes the following information:
  • outer destination MAC
  • outer destination IP
  • outer source MAC
  • outer source IP
  • outer VLAN
  • outer source IP
  • outer destination IP
  • VXLAN Network Identifier
  • inner destination MAC
  • inner source MAC
  • inner VLAN
  • inner Ethertype
  • original Ethernet Payload (providing inner source, destination IP etc.)
This level of detailed visibility allows network managers to see both the outer tunnel information and the inner VM to VM traffic details.

Generic Routing Encapsulation (GRE) is similar to VxLAN, but the encapsulated Ethernet packet is transported directly over IP, rather UDP. The packet header is described in RFC 2784: Generic Routing Encapsulation (GRE).

Note: Visibility into tunnels is challenging if you are using NetFlow/IPFIX as your traffic monitoring protocol since you are dependent on the switch vendor for the hardware and firmware to decode, analyze and export details of the tunneled traffic, see Software defined networking. Maintaining visibility with traditional flow monitoring technologies is especially difficult in rapidly changing areas like network virtualization where a new tunneling protocol is proposed every few months.

Network visibility gets even more challenging when you throw fabric technologies like 802.1aq and TRILL into the mix. The following packet diagram from RFC 6325: Routing Bridges (RBridges): Base Protocol Specification shows the format of a TRILL header:

   Flow:
     +-----+  +-------+   +-------+       +-------+   +-------+  +----+
     | ESa +--+  RB1  +---+  RB3  +-------+  RB4  +---+  RB2  +--+ESb |
     +-----+  |ingress|   |transit|   ^   |transit|   |egress |  +----+
              +-------+   +-------+   |   +-------+   +-------+
                                      |
   Outer Ethernet Header:             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |             Outer Destination MAC Address  (RB4)              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Outer Destination MAC Address | Outer Source MAC Address      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                Outer Source MAC Address  (RB3)                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |Ethertype = C-Tag [802.1Q-2005]| Outer.VLAN Tag Information    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   TRILL Header:
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Ethertype = TRILL             | V | R |M|Op-Length| Hop Count |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Egress (RB2) Nickname         | Ingress (RB1) Nickname        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   Inner Ethernet Header:
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |             Inner Destination MAC Address  (ESb)              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Inner Destination MAC Address | Inner Source MAC Address      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                  Inner Source MAC Address  (ESa)              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |Ethertype = C-Tag [802.1Q-2005]| Inner.VLAN Tag Information    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   Payload:
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Ethertype of Original Payload |                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
      |                                  Original Ethernet Payload    |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   Frame Check Sequence:
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |               New FCS (Frame Check Sequence)                  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

In this case, the L3/4 tunneled packet sent by a virtual switch enters the top of rack switch (RB1 ingress), where it is encapsulated in an outer Ethernet header before being sent across the switch fabric to the destination top of rack switch (RB2 egress), where it is decapsulated and delivered to the destination virtual switch.

With sFlow reporting packet headers from intermediate switches, you get visibility into both sets of outer MAC addresses (TRILL and VxLAN), as well as the inner MAC addresses associated with the VMs and TCP/IP flows between the VMs.

Note: Wireshark can decode almost every protocol you are likely to see in your network and is a great troubleshooting tool to use with sFlow, see Wireshark for details.

Finally, sFlow provides the scalability needed to maintain full, end-to-end, visibility in virtual network environments; delivering multi-layer visibility into the physical fabric as well monitoring inter-VM traffic, in top of rack, intermediate and virtual switches. The sFlow standard is comprehensive, extending beyond network monitoring to provide unified, cloud-scale, visibility that links network, system and application performance in a single integrated system.

Tuesday, May 1, 2012

Software defined networking

Figure 1: Software Defined Networking architecture
The elements of the Software Defined Networking (SDN) architecture are shown in Figure 1. The Data Plane comprises switches connected together to form a network. However, instead of relying on proprietary software running on each switch to control its forwarding behavior, switches in a SDN architecture are controlled by a Network OS (NOS) that interacts with the switches to provide an abstract model of the network topology to Applications running on the NOS. Applications can adapt the network behavior to suite specialized requirements, for example, providing network virtualization services that allow multiple logical networks to share a single physical network - similar to the way in which a hypervisor allows multiple virtual machines to share a single physical machine.

Note: While to diagram shows the NOS as a single logical entity, it is likely to be implemented as a distributed cluster of controllers in order to provide scalability and fault tolerance.

Open APIs for programatic access to the switches is an essential pre-requisite to building a software defined network:
  1. Forwarding The OpenFlow protocol was originally developed so that academic researchers could experiment with external control of switch packet forwarding. OpenFlow quickly gained support, leading to the formation of the Open Networking Foundation (ONF) to develop and promote the OpenFlow standard.
  2. Configuration  It was quickly realized that OpenFlow alone isn't sufficient - a configuration protocol is needed to assign switches to controllers, configure port settings and provision queues. Recently, the Open Networking Foundation released the initial OF-Config version 1.0 specification for configuring OpenFlow switches. OF-Config is defined as an extension of the NETCONF protocol, which provides an programmatic XML/RPC API that is well suited to SDN.
  3. Visibility Current efforts in the SDN community are focused on provisioning of network services.  Going beyond merely providing connectivity to creating a NOS that is aware of network performance requires an API providing visibility into switch traffic. A performance aware NOS allows applications to manage resource allocation, balance loads and ensure quality of service.
The rest of this article examines different approaches to monitoring switch performance and their strengths and weaknesses within the context of software defined networking.
Figure 2: NetFlow/IPFIX and sFlow
The most common APIs for monitoring switch performance, NetFlow/IPFIX and sFlow, are shown in Figure 2:
  1. NetFlow/IPFIX Cisco NetFlow and IPFIX (the IETF standard based on NetFlow) define a protocol for exporting flow records. A flow record summarizes a set of packets that share common atributes - for example, a typical flow record includes ingress interface, source IP address, destination IP address, IP protocol, source TCP/UDP port, destination TCP/UDP port, IP ToS, start time, end time, packet count and byte count. Figure 2 shows the steps performed by the switch in order to construct flow records. First the stream of packets is likely to be sampled (particularly in high-speed switches). Next, the sampled packet header is decoded to extract key fields. A hash function is computed over the keys in order to look up the flow record in the flow cache. If an existing record is found, its values are updated, otherwise a record is created for the new flow. Records are flushed from the cache based on protocol information (e.g. if a FIN flag is seen in a TCP packet), a timeout, inactivity, or when the cache is full. The flushed records are finally sent to the traffic analysis application.
  2. sFlow With sFlow monitoring, the decode, hash, flow cache and flush functionality are no longer implemented on the switch. Instead, sampled packet headers are sent to the traffic analysis application which decodes the packets and aggregates the data. In addition, sFlow provides a polling function, periodically sending standard interface counters to the traffic analysis applications, eliminating the need for SNMP polling, see Link utilization.
Given the differences between the two technologies, the following attributes highlight features of sFlow that make it an attractive option for providing visibility in SDN environments:
  1. Software Defined With sFlow, decoding and analysis of network traffic is performed by external software, allowing the NOS to tailor measurements for consistency with its internal network model. In contrast, the decode, hash, flush, pipeline needed to generate flow records is typically implemented in hardware. Hardware-based measurements are inflexible, requiring switch vendor hardware/firmare upgrades to implement new measurements.  Hardware differences mean that measurements are inconsistent between vendors, or even between different products from the same vendor.
  2. Lightweight The sFlow architecture minimized resources consumed on the switch. In contrast, flow-based measurements consumes scarce TCAM resources that would be better used to increase the number of packet forwarding entries.
  3. Stateless The sFlow architecture is stateless, measurements are not stored on the switch, but immediately exported to the traffic analyzer (NOS) where they can be acted on. Flow-based measurements rely on a stateful flow cache that delays export, making the data less suitable for implementing dynamic control functionality in the NOS, see Delay and stability
  4. Easy to configure Very few parameters required to configure sFlow monitoring: the address of the traffic analyzer, a counter polling interval and a packet sampling rate. Flow-based technologies require additional configuration parameters: identifying packet attributes to decode, specifying how packet attributes and values should be used to update the flow cache, flow cache size, flow cache timeouts, etc. Operational complexity acts as a significant barrier to large scale deployment, see  Complexity kills.
  5. Scalable The sFlow architecture scales better than flow-based architectures, see Superlinear
  6. Multi-Vendor The sFlow standard is supported by most switch vendors, including virtually all OpenFlow capable switches. 
  7. Merchant Silicon Hardware support for sFlow is widely implemented in merchant silicon.
In the data center, SDN is being used as a component of cloud orchestration systems like OpenStack and CloudStack that manage network, server and application resources in order to deliver Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS).  The sFlow standard supports cloud orchestration by providing unified, cloud-scale, visibility that links network, system and application performance in a single integrated system.