Thursday, January 31, 2013

Down the rabbit hole

The article, Tunnels, describes the use of tunneling protocols such as GRE, NVGRE and VXLAN to create virtual networks in cloud environments. Tunneling is also an important tool in addressing challenges posed by IPv6 migration. However, while tunnels are an effective way to virtualize networking, they pose difficult challenges for application development and operations (DevOps) teams trying to optimize network performance and for network administrators who no longer have visibility into the applications running over the physical infrastructure.

This article uses sFlow-RT to demonstrate how sFlow monitoring, build into the physical and virtual network infrastructure, can be used to provide comprehensive visibility into tunneled traffic to application, operations and networking teams.

Note: The sFlow-RT analytics module is primarily intended to be used in automated performance aware software defined networking applications. However, it also provides a rudimentary web based user interface that can be used to demonstrate the visibility into tunneled traffic offered by the sFlow standard.

Application performance

One of the reasons that tunnels are popular for network virtualization is that they provide a useful abstraction that hides the underlying physical network topology. However, while this abstraction offers significant operational flexibility, lack of visibility into the physical network can result in poorly placed workloads, inefficient use of resources, and consequent performance problems (see NUMA).

In this example, consider the problem faced by a system manager troubleshooting poor throughput between two virtual machines: 10.0.201.1 and 10.0.201.2.
Figure 1: Tracing a tunneled flow
Figure 1 shows the Flows table with the following flow definition:
  1. Name: trace
  2. Keys: ipsource,ipdestination,ipprotocol
  3. Value: frames
  4. Filter: ipsource.1=10.0.201.1&ipdestination.1=10.0.201.2
These settings define a new flow definition called trace that is looking for traffic in which the inner (tenant) addresses are 10.0.201.1 and 10.0.201.2 and asks for information on the outer IP addresses.

Note: ipsource.1 has a suffix of 1, indicating a reference to the inner address. It is possible to have nested tunnels such that the inner, inner ipsource address would be indicated as ipsource.2 etc.

Figure 2: Outer addresses of a tunneled flow
Clicking on the flow in the Flows table brings up the chart shown in Figure 2. The chart shows a flow of approximately 15K packets per second and identifies the outer ipsource, ipdestination and ipprotocol as 10.0.0.151, 10.0.0.152 and 47 respectively.

Note: The IP protocol of 47 indicates that this is a GRE tunnel.
Figure 3: All data sources observing a flow
The sFlow-RT module has a REST/HTTP API and editing the URL modifies the query to reveal additional information. Figure 3 shows the effect of changing the query from metric to dump. The dump output shows each switch (Agent) and port (Data Source) that saw the traffic. In this case the traffic was seen traversing 2 virtual switches 10.0.0.28 and 10.0.0.20, and a physical switch 10.0.0.253.

Given the switch and port information, follow up queries could be constructed to look at utilizations, errors and discards on the links to see if there are network problems affecting the traffic.

Network performance

Tunnels hide the applications using the network from network managers, making it difficult to manage capacity, assess the impact of network performance problems and maintain security.

Consider the same example, but this time from a network manager's perspective, having identified a large flow from address 10.0.0.151 to 10.0.0.152.
Figure 4: Looking into a tunnel
Figure 4 shows the Flows table with the following definition:
  1. Name: inside
  2. Keys: ipsource.1,ipdestination.1,stack
  3. Value: frames
  4. Filter: ipsource=10.0.0.151&10.0.0.152
These settings define a new flow called inside that is looking for traffic in which the outer addresses are 10.0.0.151 and 10.0.0.152 and asks for information on the inner (tenant) addresses.
Figure 5: Inner addresses in a tunneled flow
Again, clicking on the entry in the Flows table brings up the chart shown in Figure 5. The chart shows a flow of 15K packets per second and identifies the inner ipsource.1, ipdestination.1 and stack as 10.0.201.1, 10.0.201.2 and eth.ip.gre.ip.tcp respectively.

Given the inner IP addresses and stack, follow up queries can identify the TCP port, server names, application names, CPU loads etc. needed to understand the application demand driving traffic and determine possible actions (moving a virtual machine for example).

Automation

This was a trivial example, in practice tunneled topologies are more complex and cloud data centers are far too large to be managed using manual processes like the one demonstrated here. sFlow-RT provides visibility into large, complex, multi-layered environments, including: QinQ, TRILL, VXLAN, NVGRE and 6over4. Programmatic access to performance data through sFlow-RT's REST API allows cloud orchestration and software defined networking (SDN) controllers to incorporate real-time network, server and application visibility to automatically load balance and optimize workloads.

No comments:

Post a Comment