Saturday, May 30, 2009

Measurement overhead


How much extra traffic will network monitoring generate? The goal of network-wide visibility is to improve performance on the network, so the extra traffic generated by monitoring needs to be small and must not degrade performance.

The chart looks at the overhead in terms of measurement records reported per packet on the network. Ideally the overhead associated with monitoring should be small and constant (less than 0.1% of the traffic). Since flow-oriented monitoring (e.g. NetFlow) involves the creation and export of flow records, the overhead is determined by the average number of packets in a flow. If there are a large number of packets in a flow, the overhead will be low. However, if the number of packets per flow is small then the overhead will be high and in the worst case may result in a flow record being generated and exported for every packet on the network.

In practice, the number packets per flow can vary enormously depending on the type of traffic being monitored. DNS traffic is one packet per flow, web traffic will typically have 5-10 packets per flow and video streams may have thousands of packets per flow.

The overhead generated by flow monitoring can become acute during a worm outbreak or when the network is subjected to a denial of service attack (DoS attack). In both cases large numbers of single packet flows are created and the additional overhead created by flow monitoring is likely to exacerbate the problem. The impact of this increased measurement traffic on the network is made worse by the traffic bursts that flow monitoring creates. It is precisely during these times that network visibility is most needed so that the threat can be identified and controlled.

Since sFlow is not a flow-based protocol, the overhead is completely unaffected by the number of packets per flow. sFlow's use of packet sampling limits the overhead of traffic monitoring and ensures accurate, timely, network-wide visibility without impacting network performance - even during extreme traffic situations like a denial of service attack.

Wednesday, May 27, 2009

Measurement traffic


The charts, based on measurements from switches in a production environment, compare NetFlow and sFlow in terms of the load that they generate on the network. The following observations can be made based on this data:
  • NetFlow monitoring generates periodic bursts of traffic; the periodicity is confirmed by the sharp spikes in the frequency chart. This behavior is typical of flow-based traffic monitoring protocols (see Exporting IP flows using IPFIX) since flow generation involves maintaining a cache of active flows on the switch and the use timers to trigger flow export.
  • sFlow monitoring generates a random pattern of traffic with no periodicity and no bursts. The randomness is confirmed by the flat frequency chart.
Network-wide visibility involves collecting traffic data from large numbers of switches and routers. The bursts of traffic generated by flow monitoring can cause problems with delay, packet loss and jitter that will effect other traffic on the network. The periodicity observed in flow monitoring creates the risk that the different streams of monitoring traffic will synchronize and reinforce each other as large numbers of devices are monitored.

It is essential that the technology used to manage network traffic does not itself cause traffic problems. The random, low-level, background traffic that sFlow generates ensures that large networks can be safely monitored without any adverse effects. This behavior is no accident, sFlow was designed to be scalable and the random packet sampling mechanism in sFlow is one of the reasons that its traffic is well behaved.

Saturday, May 23, 2009

Control



Control theory is an area of engineering and applied mathematics dealing with the behavior and control of dynamic systems. Many of the concepts can usefully be applied to network visibility and control.

The diagram shows the basic elements of a feedback controller. When controlling a network, the network would be the "System", the "Sensor" takes observations of the system (sFlow) and converts them into an estimate of the current network state (link utilizations, traffic flows etc.). The measured network state is compared to a Reference (usage policies, thresholds etc.) and any deviations from the desired behavior is used to trigger a control action (blocking a port, setting a rate limit etc.), changing the behavior of the network and restoring service levels.

Control theory has concepts of stability, observability, controllability and robustness that are very general and worth thinking about in the context of network management:
  • Stability is a way of describing how well behaved a system is. If you make a small change and the system's behavior oscillates wildly then it isn't stable (routing instability and congestion are examples of instability in a network setting).
  • Observability is a way of saying, "You can't control what you can't see." If you don't incorporate traffic measurement into the network design (by specifying switches with built-in traffic monitoring) then traffic will not be observable. Every device needs to have built-in traffic monitoring if you want to ensure that the whole network is observable.
  • Controllability is something that should be considered when designing the network; deploying managed switches in each layer of the network with appropriate control capabilities (e.g. access control lists, rate-limiting, priorities etc.) ensures controllability.
  • Robustness is a measure of how resilient the control system is. The managed network should degrade gracefully during unexpected situations (failures, DoS, Slashdot etc.).
sFlow was designed to provide the network-wide visibility needed for effective traffic control. sFlow has the attributes, described in Control Systems Design, that the measurement component of a control system requires: reliability, accuracy, responsiveness, noise immunity, linearity and non-intrusiveness.

In describing the responsiveness requirement, the author states, "Slow responding measurements can not only affect the quality of control but can actually make the feedback loop unstable." sFlow's timely reporting of link utilization data and packet samples provides the responsive visibility into network traffic needed to make the information actionable. While flow-based measurements provide useful usage data for traffic accounting and reporting, they are by their nature less responsive than sFlow and less useful for control.

Tuesday, May 19, 2009

sFlow and Netflow


If you are interested in network-wide visibility it is easy to be confused by the different types of traffic monitoring available. Technologies such as sFlow, Cisco NetFlow®, Juniper J-Flow, NetStream and IPFIX all appear to perform similar functions, but are supported by different network equipment vendors.

The situation is simpler than it appears, in reality there are only two basic types of traffic monitoring available:
  1. Layer-2 (L2) packet, sFlow is designed to provide network-wide visibility. Monitoring all the way to the layer-2 access ports requires a protocol that scales well and can easily be implemented on a layer-2 switch. Because sFlow is packet-based, it is able to report in detail on all types of traffic on the network.
  2. Layer-3 (L3) flow, NetFlow, J-Flow, NetStream and IPFIX are all very similar technologies. Flow monitoring is typically implemented on routers and provide information about TCP/IP connections with limited visibility into other types of traffic.
To be scalable and cost effective, traffic monitoring needs to be built into switches and routers. Start by taking an inventory of the devices in your network and see what traffic monitoring they provide, you will probably find that your network is represented by one of the three diagrams above.

The diagrams show three typical scenarios based on the type of equipment in the network:
  1. All the switches and routers support sFlow and a central sFlow analyzer provides network-wide visibility. Most vendors support sFlow, so it is possible to build an sFlow capable network to meet almost any requirement.
  2. The switches support sFlow and the routers support L3 flow monitoring. In this case a traffic analyzer that supports sFlow and L3 flow monitoring will also be able to provide network-wide visibility. This situation typically occurs in multi-vendor environments where sFlow is supported by the switch vendor and flow monitoring is supported by the router vendor.
  3. The routers support L3 flow monitoring and the switches have no built-in traffic monitoring capability. In this case, only traffic through the routers is monitored providing very limited visibility into the data center and campus. This situation is typical of single vendor networks where the vendor exclusively supports L3 flow monitoring.
The first step to improved network visibility is to select a traffic analysis tool and enable whatever traffic monitoring is available from existing network equipment.

Making traffic monitoring a selection requirement for future network upgrades will allow you to increase network visibility over time. Selecting network equipment from one of the many vendors that support sFlow does not add to the network cost. Adding traffic monitoring later is likely to be prohibitively expensive.

Monday, May 18, 2009

Scalability and accuracy of packet sampling


This chart from Packet Sampling Basics is useful for explaining why sFlow's packet sampling mechanism provides the accuracy and scalability needed for network-wide visibility. The chart shows that the accuracy of a traffic measurement (e.g. How much bandwidth is being consumed by backup traffic?) increases rapidly as the number of samples contributing to the measurement increases.

The chart shows that the percentage accuracy is independent of the number of packets on the network. This independence is the key to sFlow's scalability.  For example, a measurement will have a 5% accuracy as long as it is based on at least 1,500 samples.  Only 1,500 samples are required whether the network contains one switch or 1,000 switches,  10Mbps links or 100Gbps links.

The accuracy of sampled data is also independent of the type of traffic: traffic can consist of a small number of large connections, many small connections, traffic can arrive in bursts or spread out over time. In all cases the accuracy is determined only by the number of samples.

The packet sampling mechanism in sFlow is implemented in hardware, providing wire-speed performance.  When a switch samples a packet, the sampled packet header and packet path information is immediately sent to the central traffic analyzer. Promptly sending the sFlow data reduces the amount of memory on the switch and provides the sFlow collector with a real-time view of network activity.

Using sFlow to monitor all the switches in the network provides a robust and accurate means of monitoring traffic suitable for exacting applications such as network billing and charge-back.  The redundancy that end-to-end monitoring provides ensures that very little data is lost, even when switches fail or are taken down for maintenance.

Saturday, May 16, 2009

Packet paths


Monitoring packet paths is an important component of sFlow monitoring. When a switch captures a packet header, it also captures the forwarding decision it made for the packet. The forwarding information includes the ingress and egress switch ports, VLANs and priorities, and if the packet is routed, subnet and next hop information is also provided. Additional information such as BGP AS path, MPLS tunnel and label stack will also be captured if it is relevant to the forwarding decision.

A traffic analyzer can thread together the path information from all the switches to provide a constantly updating network wide view of topology and the location of all hosts connected to the network.

The combination of packet header and packet forwarding data provides an integrated view of network traffic. For example, sFlow makes it possible to filter on forwarding attributes (VLAN , MPLS, route) and see traffic, or filter on traffic and identify forwarding paths. The integrated view of traffic makes it easy to answer questions such as: "Which links carry voice traffic?", "Is the voice traffic getting the correct priority value?" and "Who is sending traffic on a particular VLAN?"

Packet headers


There are a large number of protocols that can run over a switched network (the chart, from Agilent Technologies, shows the major protocol families). It is not reasonable to expect a layer 2 switch to be able to decode and report on all these protocols - the switch is there to forward packets and should only be concerned with the information it needs to make forwarding decisions. With sFlow, the switch simply forwards the Ethernet packet header and leaves it up to the traffic analyzer to decode the protocols.

This approach has a number of important advantages:
  1. Capturing packet headers simplifies the monitoring task on the switch, making it easy to implement in hardware.
  2. It is much easier to add new protocol decodes to a central traffic analyzer than it is to develop and deploy new switch firmware releases to add the new functionality. This is particularly true if you have a variety of switch models and vendors in your network.
  3. Packet headers are well standardized, they have to be, or you wouldn't be able to interconnect switches. If packets are decoded on the switches there can be differences in the way switches from different vendors decode the packets and report on the data, making it difficult to combine data to provide a network-wide view.
  4. Packet headers capture the complex layering (MAC, VLAN, MPLS, VPLS, IPv6 over IPv4 etc.) that is critical to understanding how traffic flows across the network.
In order to get the full benefit of sFlow monitoring, select an sFlow collector that decodes all the protocols that you use on your network.

Link utilization


One of the basic tasks in monitoring network traffic is to accurately track the utilization of links in your network. A managed switch will provide a standard set of counters for each interface that can be retrieved retrieved periodically using SNMP and used to trend link utilization, packet rates, errors and discards.

sFlow provides an alternative to SNMP counter polling. The sFlow agent in the switch will periodically send, or "push" its own counters to the central collector. Pushing counters is much more efficient than than retrieving them using SNMP, requiring 10-20 times fewer network packets to retrieve the same information. The sFlow protocol uses XDR to encode the counters. XDR is much simpler to encode and decode than the ASN1 encoding that the SNMP protocol uses, so the CPU load on the switches and the collectors is also significantly reduced. Finally, distributing the counter polling task among the switches further reduces the load on the central collector.

The benefits of using sFlow to retrieve interface statistics become clear when you monitor large networks. Instead of requiring 5-10 servers dedicated to SNMP polling, a single sFlow analyzer can collect counters from all the interfaces in the network, providing a centralized view of utilization throughout the network, rapidly identifying any areas of congestion.

Friday, May 15, 2009

Network-wide visibility


One of the unique features of sFlow is its ability to monitor entire networks, not just selected devices or links. When configuring sFlow monitoring, enable sFlow on every switch port on every switch in your network. sFlow is implemented in hardware so it can operate at line rate without impacting switch performance.

Don't just monitor WAN links and core switches, enabling sFlow on access switches gives detailed visibility into every server in the data center and every PC on the campus without the need to install software agents on the servers and PCs.

The scalability of sFlow extends to the traffic analyzer software. A single, well designed, sFlow analyzer should be able to monitor all the switch ports in the network. When choosing a traffic analyzer, count the total switch ports in your network and select software that will be able to monitor all the ports. A modest size campus network may have 20,000 switch ports and a large corporate network may have in excess of 100,000 switch ports. It is a good idea to request an evaluation and perform a full scale test to verify that the software delivers the required scalability.

Why an sFlow blog?

sFlow has been quietly emerging as the leading multi-vendor standard for monitoring traffic in switched networks. The first sFlow capable switches were shipped in 2001. Since then vendor support has been increasing each year and now switches from Brocade, Hewlett-Packard, Juniper Networks, Extreme Networks, 3Com, D-Link, Alcatel-Lucent, H3C, Hitachi, NEC, AlaxalA, Allied Telesis and Comtec have embedded sFlow. Chances are you already have switches in your network that support sFlow (for a complete list of switches that support sFlow, see Network Equipment).

sFlow monitoring solutions provide traffic visibility for the full range of switched networks, from an organization running a large, busy 10G network (see Amsterdam Internet Exchange) to a small office trying to see who is hogging bandwidth on their T1 link (see sFlowTrend).

Through the postings in this blog, we hope to increase awareness of sFlow and the way it can be used to solve problems that challenge network administrators every day.