Monday, May 7, 2012

Tunnels

Figure 1: Network virtualization using tunnels
Layer 3/4 tunnels (GRE, VxLAN, STT, CAPWAP, NVGRE etc.) can be used to virtualize network services so that communication between virtual machines can be provisioned and controlled without dependencies on the underlying network.

Figure 1 shows the basic elements of a typical virtual machine networking stack (VMware, Hyper-V, XenServer, Xen, KVM etc.). Each virtual machine is connected to a software virtual switch using virtual network adapters. The virtual switch delivers packets based on destination MAC address (just like a physical switch). For example, when VM1 sends a packet to VM2, it will create an IP packet with source address vIP 1 and destination address vIP 2. The network stack on VM1 will create an Ethernet frame with source address vMAC 1 and destination address vMAC 2 with the IP packet as the frame payload. The virtual switch on Server 1 receives the Ethernet frame, examines the destination MAC address, and delivers the frame to the virtual network adapter corresponding to vMAC 2, which delivers the frame to VM2. The challenge is ensuring that virtual machines on different servers can communicate while minimizing dependencies on the underlying physical network.

Setting up a L3/4 tunnel between Server 1 and Server 2 (with tunnel endpoint addresses IP 1 and IP 2 respectively) limits the dependency on the physical infrastructure to IP connectivity between the two servers. For example, when VM1 sends a packet to VM3, the virtual switch on Server 1 recognizes that the packet is destined to a VM on Server 2 and sends the packet through the tunnel. On entering the tunnel, the original Ethernet frame from VM1 is encapsulated and the resulting IP packet is sent to Server 2. When Server 2 receives the packet, it extracts the original Ethernet frame, hands it to the virtual switch, which delivers the frame to VM3.

Network virtualization is particularly important in cloud environments where tenants need to be isolated from each other, but still share the same physical infrastructure.

Figure 2: Nicira's Distributed Virtual Network Infrastructure (DVNI)
Nicira's Distributed Virtual Network Infrastructure (DVNI) architecture, shown in Figure 2, is a good example of network virtualization using a Tunnel Mesh to connect virtual switches and overlay multiple Virtual Networks on the shared Physical Fabric. The Controller Cluster manages the virtual switches, setting up tunnels and controlling forwarding behavior.

Note: The Controller Cluster uses the OpenFlow protocol to configure virtual switches, making this an example of Software Defined Networking (SDN).

The importance of visibility in managing virtualized environments has been a constant theme on this blog, see Network visibility in the data center, System boundary and NUMA. The question is, how do you maintain visibility when tunneling is used for network virtualization?

The remainder of this article describes how the widely supported sFlow standard provides detailed visibility into tunneled traffic. You might be surprised to know that every sFlow enabled switch produced in the last 10 years is fully capable of reporting on L3/4 tunneled traffic, in spite of the fact that there is no mention of VxLAN, GRE, etc. in any of the sFlow standard documents.

The key to sFlow's adaptability is that switches export packet headers, leaving it to sFlow analysis software to decode the packet headers and report on traffic, see Choosing an sFlow analyzer.  The templates needed to extract tunnel information from packet headers are described in the Internet Drafts and RFCs that define the various tunneling protocols. For example, the following packet diagram from the internet draft VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks describes the fields present in a VxLAN packet header:

            0                   1                   2                   3
            0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     
        Outer Ethernet Header:             |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |             Outer Destination MAC Address                     |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           | Outer Destination MAC Address | Outer Source MAC Address      |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |                Outer Source MAC Address                       |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       Optional Ethertype = C-Tag 802.1Q   | Outer.VLAN Tag Information    |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           | Ethertype 0x0800              |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        Outer IP Header:
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |Version|  IHL  |Type of Service|          Total Length         |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |         Identification        |Flags|      Fragment Offset    |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |  Time to Live |    Protocol   |         Header Checksum       |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |                       Outer Source Address                    |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |                   Outer Destination Address                   |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         Outer UDP Header:
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |       Source Port = xxxx      |       Dest Port = VXLAN Port  |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |           UDP Length          |        UDP Checksum           |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         VXLAN Header:
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |R|R|R|R|I|R|R|R|            Reserved                           |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |                VXLAN Network Identifier (VNI) |   Reserved    |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

             0                   1                   2                   3
             0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     
      Inner Ethernet Header:             |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Inner Destination MAC Address                     |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            | Inner Destination MAC Address | Inner Source MAC Address      |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |                Inner Source MAC Address                       |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     Optional Ethertype = C-Tag [802.1Q]    | Inner.VLAN Tag Information    |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     Payload:
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            | Ethertype of Original Payload |                               |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
            |                                  Original Ethernet Payload    |
            |                                                               |
            | (Note that the original Ethernet Frame's FCS is not included) |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          Frame Check Sequence:
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |   New FCS (Frame Check Sequence) for Outer Ethernet Frame     |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Examining the template, the VxLAN header captured by sFlow includes the following information:
  • outer destination MAC
  • outer destination IP
  • outer source MAC
  • outer source IP
  • outer VLAN
  • outer source IP
  • outer destination IP
  • VXLAN Network Identifier
  • inner destination MAC
  • inner source MAC
  • inner VLAN
  • inner Ethertype
  • original Ethernet Payload (providing inner source, destination IP etc.)
This level of detailed visibility allows network managers to see both the outer tunnel information and the inner VM to VM traffic details.

Generic Routing Encapsulation (GRE) is similar to VxLAN, but the encapsulated Ethernet packet is transported directly over IP, rather UDP. The packet header is described in RFC 2784: Generic Routing Encapsulation (GRE).

Note: Visibility into tunnels is challenging if you are using NetFlow/IPFIX as your traffic monitoring protocol since you are dependent on the switch vendor for the hardware and firmware to decode, analyze and export details of the tunneled traffic, see Software defined networking. Maintaining visibility with traditional flow monitoring technologies is especially difficult in rapidly changing areas like network virtualization where a new tunneling protocol is proposed every few months.

Network visibility gets even more challenging when you throw fabric technologies like 802.1aq and TRILL into the mix. The following packet diagram from RFC 6325: Routing Bridges (RBridges): Base Protocol Specification shows the format of a TRILL header:

   Flow:
     +-----+  +-------+   +-------+       +-------+   +-------+  +----+
     | ESa +--+  RB1  +---+  RB3  +-------+  RB4  +---+  RB2  +--+ESb |
     +-----+  |ingress|   |transit|   ^   |transit|   |egress |  +----+
              +-------+   +-------+   |   +-------+   +-------+
                                      |
   Outer Ethernet Header:             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |             Outer Destination MAC Address  (RB4)              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Outer Destination MAC Address | Outer Source MAC Address      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                Outer Source MAC Address  (RB3)                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |Ethertype = C-Tag [802.1Q-2005]| Outer.VLAN Tag Information    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   TRILL Header:
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Ethertype = TRILL             | V | R |M|Op-Length| Hop Count |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Egress (RB2) Nickname         | Ingress (RB1) Nickname        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   Inner Ethernet Header:
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |             Inner Destination MAC Address  (ESb)              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Inner Destination MAC Address | Inner Source MAC Address      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                  Inner Source MAC Address  (ESa)              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |Ethertype = C-Tag [802.1Q-2005]| Inner.VLAN Tag Information    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   Payload:
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Ethertype of Original Payload |                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
      |                                  Original Ethernet Payload    |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   Frame Check Sequence:
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |               New FCS (Frame Check Sequence)                  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

In this case, the L3/4 tunneled packet sent by a virtual switch enters the top of rack switch (RB1 ingress), where it is encapsulated in an outer Ethernet header before being sent across the switch fabric to the destination top of rack switch (RB2 egress), where it is decapsulated and delivered to the destination virtual switch.

With sFlow reporting packet headers from intermediate switches, you get visibility into both sets of outer MAC addresses (TRILL and VxLAN), as well as the inner MAC addresses associated with the VMs and TCP/IP flows between the VMs.

Note: Wireshark can decode almost every protocol you are likely to see in your network and is a great troubleshooting tool to use with sFlow, see Wireshark for details.

Finally, sFlow provides the scalability needed to maintain full, end-to-end, visibility in virtual network environments; delivering multi-layer visibility into the physical fabric as well monitoring inter-VM traffic, in top of rack, intermediate and virtual switches. The sFlow standard is comprehensive, extending beyond network monitoring to provide unified, cloud-scale, visibility that links network, system and application performance in a single integrated system.

4 comments:

  1. Great explanation with nice graphics.
    By the way, would you share the tools (software and/or library used to make them ?

    Regards.

    ReplyDelete
  2. Thanks. I used Keynote to draw Figure 1. Figure 2 is a screen capture from the Nicira white paper.

    ReplyDelete
  3. It's have any example use sFlow-RT to visible VXLAN Tunnel packet and flow?

    ReplyDelete
    Replies
    1. The article Down the rabbit hole provides an example using GRE, but it should work for VxLAN and NVGRE.

      Delete