|
Figure 1: Sending a picture over a packet switched network |
Figure 1 illustrates how data is transferred over a packet switched network.
Host A is in the process of transferring a picture to
Host B. The picture has been broken up into parts and each part is sent as a separate packet. Three packets containing parts
8,
9 and
10 are in transit and are in the process of being forwarded by switches
Z,
Y and
X respectively.
In the example,
Host A is responsible for breaking up the picture into parts and transmitting the packets.
Host B is responsible for re-constructing the picture, detecting parts that are missing, corrupted, or delivered out of order and sending acknowledgement packets back to
Host A, which is then responsible for resending packets if necessary.
The packet switches are responsible for transmitting, or forwarding, the packets. Each packet switch examines the destination address (e.g.
To: B) and sends the packet on a link that will take the packet closer to its destination. The switches are unaware of the contents of the packets they forward, in this case the picture fragments and part numbers. Figure 1 only shows packets relating to the image transfer between
Host A and
Host B, but in reality the switches will be simultaneously forwarding packets from many other hosts.
|
Figure 2: Sorting mail |
The mail sorting room shown in Figure 2 is a good analogy for the function performed by a packet switch. Letters arrive in the sorting room and are quickly placed into pigeon holes based on destination. The mail sorters don't know or care what's in the letters, they are focused on quickly reading the destination address on each envelope and placing the letter in a pigeon hole along with other letters to the same region so that the letters can be sent to another sorting facility closer to the destination.
In a packet switched network, each host and switch has a different perspective on data transfers and maintains different state in order to perform its task. Managing the performance of the communication system requires a correct understanding of the nature of the task that each element is responsible for and a way to monitor how effectively that task is being performed.
As an example of a poorly fitting model, consider the concept of "flow records" that are often presented as an intuitive way to monitor and understand traffic on packet switched networks. Continuing our example, the data transfer would be represented by two flow records, one accounting for packets from
Host A to
Host B and another accounting for packets from
Host B to
Host A.
|
Figure 3: Telephone operator |
There is an inherent appeal in flow records since they are similar to the familiar "call records" that you see on a telephone bill, recording the number dialed, the time the call started and the call duration. However, as the switchboard and patch cords demonstrate in Figure 3, telephones networks are circuit switched, i.e. a dedicated circuit is set up between the two telephones involving all the switches in the path. It is easy to see how a circuit switch might easily generate call records by considering the manual operator. The operator just needs to note when they connected the call using the patch cord, who they connected to, and when they terminated the call by pulling the plug.
Viewing packet switches through the lens of a circuit oriented measurement is misleading. Start by considering the steps that the mail sorter in Figure 2 would have to go through in order to create flow records. The mail sorter would be requiring to keep track of the
From: and
To: address information on each letter, count the number letters that
Host A sent to
Host B, open the letters and peek inside to decide whether the letter was part of an existing conversation or the start of a new conversation. This task is extremely cumbersome and error prone, and the resulting flow records don't monitor the task that the mail sorter is actually performing; for example, flow records won't tell you how many postcards, letters and parcels were sorted.
Packet and circuit switched networks have very different characteristics and an effective monitoring system will collect measurements that are relevant to the performance of the network:
- Circuit switches have a limited number of connections that they can handle and if more calls are attempted, calls are blocked (i.e. receive a busy signal). Blocking probabilities and sizing of circuit switches are analyzed using Erlang calculations.
- Packet switches don't block. Instead packets are interleaved as they are forwarded. If the number of packets arriving exceeds the forwarding capacity of the switch, then packets may be delayed as they wait to be serviced, or be discarded if there are too many packets already waiting in the queue. Queuing delays and packet discard probabilities are analyzed using queuing theory.
To make the example in Figure 1 concrete, make
Host A is an Apache web server,
Host B a laptop running a web browser, and the picture transfer a response associated with the HTTP request
http://www.lolcats.com/popular/159.html
The following table compares the switch, host and application measurements provided by
sFlow and NetFlow:
| sFlow | NetFlow |
Switch | Each switch exports packet oriented measurements, exporting interface counters and randomly sampled packet headers and associated forwarding decisions. | Switches exports connection oriented flow records that include source address, destination address, protocol, bytes, packets and duration.
Note: Many switches aren't capable of making these measurements and so go unmonitored. |
Host | The server exports standard host metrics, including: CPU, memory and disk performance. | None. NetFlow is generally only implemented in network devices. |
Application | The web server exports standard HTTP metrics that include request counts and randomly sampled web requests, providing detailed information such as the URL, referrer, server address, client address, user, browser, request bytes, response bytes, status and duration. The web server also reports maximum, idle and active workers. | None. NetFlow is typically only implemented in network devices. |
NetFlow takes a network centric view of measurement and tries to infer application behavior by examining packets in the network. NetFlow imposes a stateful, connection oriented, model on core devices that should be stateless. Unfortunately, the resulting flow measurements aren't a natural fit for packet switches, providing a distorted view of the operation of these devices. For example, the switch makes forwarding decisions on a packet by packet basis and these decisions can change over the lifetime of a flow. The packet oriented measurements made by sFlow accurately capture forwarding decisions, but flow oriented measurement can be misleading. Another example building on the mail sorting analogy in Figure 2; packet oriented measurements support analysis of small, large and jumbo frames (postcards, letters and parcels), but this detail is lost in flow records.
Flows are an abstraction that is useful for understanding end-to-end traffic traversing the packet switches. However, the flow abstraction describes connections created by the communication end points and to properly measure connection performance, one needs to instrument those end points. Hosts are responsible for initiating and terminating flows and are a natural place to report flows, but the traditional flow model ignores important detail that the host can provide. For example, the host is in a position to include important details about services such as user names, URLs, response times, and status codes as well as information about the computational resources needed to deliver the services; information that is essential for managing service capacity, utilization and response times.
While NetFlow is network centric and tries to infer information about applications from network packets (which is becoming increasingly difficult as more traffic is encrypted), the sFlow standard takes a
systems approach, exposing information from
network, servers and applications in order to provide a comprehensive view of performance.
Measurement isn't simply about producing pretty charts. The ultimate goal is to be able to act on the measurements and
control performance. Control requires a model of behavior that allow performance to be predicted and measurements that characterize demand and show how close the system performance matches the predictions. The sFlow standard is well suited to automation, providing comprehensive measurements based on models of network, server and application performance. The
Data center convergence, visibility and control presentation describes the critical role that measurement plays in managing costs and optimizing performance.