Saturday, February 6, 2010

Scalability


The problem with networks, and many other types of system, is that as they get larger they also become more complex to manage. What makes the management challenge even harder is that complexity doesn't simply grow in proportion to network size, but tends to grow exponentially.


The reason for the exponential growth in complexity is that the various components of the system interact. The diagram above illustrates this effect with a very simple example. Imagine a network consisting of just two connected machines, A and B. The two possible interactions are A talks to B, or B talks to A.

Now consider the effect of adding two more machines, C and D. The additional interactions C talks to D and D talks to C doubles the complexity, however, A talks to C, C talk to A, B talk to D, D talks to B, A talks to D, D talks to A, B talks to C and C talk to B brings the total number of direct interactions to 12.

In this example, doubling the network size increased the number of possible interactions and the complexity by a factor of 6. For many systems this is an underestimate of the increase in complexity, we didn't take into account indirect interactions such as A talks to D via B etc.

To manage risk, many companies use small scale trials as a way to pilot new systems. However, while pilot implementations can be a useful way to test basic functionality, they do not guarantee that the solution will work when deployed at full scale. Many costly information system failures result because the challenge of managing large scale system complexity was not properly addressed (see Understanding Information System Failures from the Complexity Perspective).



Network-wide visibility provides a powerful means of reducing complexity. While network complexity results from the large number of possible interactions, only a tiny fraction of the possible interactions actually occur at an given moment. Traffic visibility reduces complexity by revealing the active paths so that resources can be applied where they are needed.

In order to be effective, the measurement system itself must be scalable, delivering the complete, timely, actionable information needed to manage complexity. The sFlow standard was designed specifically for scalable, network-wide visibility and control and enjoys broad multi-vendor support. Products incorporating the sFlow standard deliver visibility throughout the physical switch, virtual switch, virtual router and cloud layers, ensuring effective management of complexity in large, dynamic, virtualized environments.

Monday, February 1, 2010

Virtual routing


The diagram shows networking elements within a virtual server. The server's physical network adapters connect to LAN switches that provide a high speed, flat, layer 2, fabric connecting servers and storage in the data center. Virtual switches provide shared access to the physical adapters, connecting the virtual network adapters in the virtual machines to the physical network.

Current hierarchical network designs confine routing to specialized hardware at the core of the network.  It is worth re-examining the place of routing given the changes in data center architecture brought about by convergence and virtualization. What if routing could be virtualized?

The performance of software routers running on commodity x86 hardware is improving,  Vyatta recently announced 20 Gbps routing performance from their software routers. With network adapter support for virtualization (e.g. SR-IOV), it is now feasible to implement high-performance routing and firewall functionality in virtual machines.

Virtualization of routing offers a number of advantages:
  1. Virtualization allows services to be replicated and deployed where they are needed in the virtual infrastructure. A virtual router can easily be replicated to provide redundancy or add capacity.
  2. Virtual routing can provide better reliability and lower costs by making use of the general purpose virtual server infrastructure, eliminating the need for expensive, specialized router hardware.
  3. Distributing routing to the edge of the network reduces pressure on the core and improves scalability. 
There benefits don't just apply to routing, many other specialized devices can also be virtualized, including firewalls, load balancers, proxies etc. Virtualization of layer 3-7 network devices on a high performance converged Ethernet fabric offers a flexible and dynamic infrastructure that can easily be reconfigured to meet changing demands.

To illustrate the potential of virtualized networking, consider the example of a hosted data center. In a typical hosted data center, customers have racks or partial racks of equipment installed in the data center. A typical customer will have their own router, firewall, load balancer and servers installed in the rack. A virtual rack can be constructed by deploying routing and firewall virtual machines along with general purpose virtual machines that the customer can use to deploy their applications. A virtual rack can be provisioned and maintained automatically, providing customers with much more responsive service while reducing operating costs. In addition, virtualization allows higher customer densities per physical rack, increasing the revenue that can be generated per rack.

The benefits aren't restricted to service provider networks. In enterprise data centers, the flexibility of virtualized networking allows for more efficient management and utilization of resources. However, a barrier to realizing these benefits is the current siloed approach to data center management. Close coordination is needed between network and system management teams. For example, who would be responsible for provisioning and configuring a virtual router? This type of cross functional task is a challenge for most organizations.

Integrated traffic monitoring provides the visibility needed for effective management of virtualized networks. The diagram shows some of the data paths that are possible in a virtual stack: the red line shows traffic between two physical VLANs connected by a virtual router and the gold line shows traffic routed between two virtual machines hosted on the same server. In order to provide network visibility, every networking device, physical or virtual needs to include integrated traffic monitoring so that all traffic paths can be observed. Shared visibility into all resources in the data center ensures that each group (network, systems and storage) is aware of its impact on shared resources, eliminates finger pointing, improves coordination and lays the foundation for automating control.

There are many proprietary and standard technologies for embedded traffic monitoring. Broadly speaking these fall into two classes, TCP/IP flow monitoring built into many routers (e.g. Cisco NetFlow) and multi-protocol packet-based monitoring built into most switches. Convergence in both the LAN (data center bridging) and the WAN (Metro Ethernet and Carrier Ethernet) is taking place using Ethernet technologies, making the sFlow standard the logical choice for visibility since it enjoys broad, multi-vendor support and is already built into most vendor's Ethernet products. Just as convergence to Ethernet simplifies connectivity, convergence to sFlow standard monitoring built into Ethernet devices simplifies management of the converged network.

Products incorporating the sFlow standard provide visibility throughout the physical switch, virtual switchvirtual router and cloud layers, delivering the end to end visibility needed to realize the full benefits of virtualization and convergence.

Tuesday, January 26, 2010

Xen Cloud Platform



Since cloud computing is accessed as a service, most people are unaware of the vendors providing cloud solutions. The dominant cloud solution is Xen (see Xen in the Cloud), the open source virtualization software used by leading cloud providers like Amazon, Rackspace and GoGrid to deliver cloud services to their customers.

The Xen Cloud Platform (XCP) project offers a complete cloud platform based on Xen and other open source components in order to deliver a fully integrated stack, making it simpler for service providers and enterprises to deploy cloud computing solutions.

Enterprises evaluating cloud computing solutions should consider the benefits of using the same software stack as the public cloud providers. A common software stack eases integration between public and private cloud services and facilitates migration of services between hosted and internal virtual servers. This mobility is particularly attractive when using hosted facilities as a way to create an "elastic cloud", adding capacity from service providers to increase geographic coverage, provide off-site service redundancy, or to expand capacity during periods of high demand. A shared software stack provides seamless management of resources throughout the cloud infrastructure, simplifying management, reducing operating costs and improving agility.

The Xen/XCP stack includes the Open vSwitch, providing an open source alternative to proprietary virtual switches (such as Cisco Nexus 1000v and the VMware Distributed Switch).

A previous article, describes the visibility provided by sFlow monitoring built into the Open vSwitch. Implementation of the sFlow standard in the vSwitch extends the monitoring already available in most vendor's hardware switches into the virtualization layer, providing seamless visibility and control of all physical and virtual network resources in the cloud.

Many service providers already use sFlow-based billing solutions. The availability of sFlow in the Xen/XCP stack provides additional flexibility for usage based accounting and billing in virtual hosting and cloud service environments.

Finally, the detailed visibility that sFlow provides helps lower costs by optimizing resource utilization and reducing the need for over provisioning.

Monday, January 25, 2010

Open vSwitch


(diagram from Open vSwitch)

The Open vSwitch provides advanced switching capabilities for virtual servers. Currently the Open vSwitch supports Linux, Xen/XenServer, KVM and Virtual Box. The open source software is designed to be easily portable and is expected to support additional platforms in the future. The Open vSwitch is designed to integrate switching across multiple physical servers, providing an open source alternative to proprietary virtual switches such as VMWare's distributed switch and Cisco's Nexus 1000v.

The recent integration of sFlow traffic monitoring in the Open vSwitch extends the visibility into virtual servers, ensuring data center visibility and control.

Note: The Open vSwitch demonstrates how to integrate the reference sFlow agent code with a virtual switch or network adapter. Integrating sFlow requires minimal support in the "fast path" requiring only packet sampling and packet counters.

The following lines, added to the Open vSwitch configuration file (ovs-vswitchd.conf), configure sampling packets at 1-in-512, polling counters every 20 seconds and sending sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343):

sflow.<bridgename>.agent    = eth0
sflow.<bridgename>.host     = 10.0.0.50:6343
sflow.<bridgename>.sampling = 512
sflow.<bridgename>.polling  = 20
sflow.<bridgename>.header   = 128

Note: Type "man ovs-vswitchd.conf" for a full list of configuration options. A previous posting discussed the selection of sampling rates.

The following screen capture, from the free sFlowTrend application, demonstrates the visibility provided by sFlow in the Open vSwitch:



All traffic is visible, traffic between virtual machines, and from the virtual machines to the outside world. In addition, sFlow is able to report on all the protocols on the network (note the layer 2, TCP and IPv6 flows in the chart), as well as information on VLANs and layer 2 priorities that is essential for managing switched traffic.

The second screen capture shows a bandwidth trend for a virtual adapter on the vSwitch:



This type of interface trending is a staple of network management, but obtaining the information is challenging in virtual environments. While SNMP is typically used to obtain this information from network equipment, servers are much less likely to be managed using SNMP and so SNMP polling is often not an option. In addition, there may be large numbers of virtual ports associated with each physical switch port. In a virtual environment with 10,000 physical switch ports you might need to monitor as many as 200,000 virtual ports. Even if SNMP agents were installed on all the servers, SNMP polling does not scale well to large numbers of interfaces. The integrated counter polling mechanism built into sFlow provides scalable monitoring of the utilization of every switch port in the network, both physical and virtual, quickly identifying problems wherever they may occur in the network.

Download Open vSwitch and sFlowTrend to evaluate the benefits of visibility in the virtualization layer.

Finally, the Open vSwitch also supports the OpenFlow to allow centralized control of switch forwarding. The combination of sFlow and OpenFlow in the vSwitches delivers visibility and control of the network edge.

Configuring H3C switches

The following commands configure an H3C switch (10.0.0.250), sampling packets at 1-in-512, polling counters every 30 seconds and sending sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343):

<sysname> system-view
[Sysname] sflow agent ip 10.0.0.250
[Sysname] sflow collector ip 10.0.0.50 port 6343
[Sysname] sflow version 5
[Sysname] sflow interval 30

Then for each interface:

[Sysname] interface ethernet 1/0
[Sysname-Ethernet1/0] sflow enable inbound
[Sysname-Ethernet1/0] sflow sampling-mode random
[Sysname-Etherent1/0] sflow sampling-rate 512

A previous posting discussed the selection of sampling rates. Additional information can be found on the H3C web site.

See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.

Notes: Set the sampling-mode to random on all interfaces that support random sampling. Deterministic sampling is less accurate and should be avoided. The sampling direction should be set consistently across all interfaces, in this example enabling inbound sampling on all interfaces monitors all traffic paths through the switch and avoids double counting.

Saturday, January 23, 2010

Control costs



The visibility and control provided by sFlow can significantly reduce data center costs, both in terms of the capital cost of equipment and the operational costs of managing, powering and cooling the data center.

Poor visibility results in significant costs in two ways:
  1. Over provisioning, because demand is poorly understood system architects act defensively, overestimating requirements and adding excessive safety margins as insurance. The excess capacity built into the system is expensive to purchase and maintain, however, since there is no visibility these costs are hidden.
  2. Poor Service, because demand is poorly understood, operational changes occur in reaction to costly service failures, resulting in rushed, poorly targeted addition of capacity that further increase wasteful over provisioning.
The network wide visibility provided by sFlow fundamentally changes the equation. Detailed visibility into demand allows resources to be targeted where they are needed, minimizing over provisioning and avoiding service failures.

The diagram graphically demonstrates the difference between these two approaches. In the case of limited visibility, the increase in demand is not detected until it is too late. Over-reacting to the performance failure leads to excessive over-provisioning. With the visibility and control provided by sFlow, the increase in demand is detected early and additional capacity added. As demand for the service decreases, the additional resources are released so that they can be applied elsewhere.

The benefit of data center convergence and virtualization is that resources are pooled and can be allocated as needed. However, without network visibility it is impossible to fully realize the cost savings and improved performance that convergence promises.

Sunday, January 17, 2010

Developer resources


(photo from sFlow.org)

The challenge of providing data center visibility and control requires that all network devices integrate the instrumentation needed for visibility and that software developers deliver solutions that convert the measurements into actionable intelligence.

Meeting this challenge involves a broad community of individuals and companies participating in the sFlow consortium and developing products based on the sFlow standard. The rapid changes taking place in data center networking are accelerating the adoption of sFlow and introducing large numbers of developers to sFlow technology. This article provides links to resources intended to help new developers get started with sFlow.

The sFlow.org web site is the first place developers should look for information, whether adding sFlow support to a network device, or developing an application to analyze sFlow. Join the sFlow mailing list to ask questions and connect with the sFlow community.

Familiarity with the sFlow version 5 standard is essential, even if you are using one of the following software libraries. A detailed understanding of the standard is critical when integrating the agent software with switch hardware to ensure that data is correctly collected and exported. When developing an sFlow analyzer, an understanding of the measurements ensures that the data is correctly interpreted.

Agent developers should not base their agent implementations on RFC 3176. The RFC describes sFlow version 4 and has been superseded by version 5. However, when developing an sFlow analyzer, both versions of sFlow should be supported since customers may have equipment exporting sFlow version 4.

In addition to the standard, the paper Packet Sampling Basics, is worth reading in order to understand the theory behind sFlow's sampling mechanism.

The following resources are available for agent developers:
  1. Developer Tools, contains a free, open-source implementation of an sFlow agent that is the basis for a number of open source and commercial sFlow agent implementations. Building an agent using the sFlow Agent library saves time and ensures that the sFlow information is correctly and efficiently encoded. In addition to the agent source code, tools for testing and validating the correct function of the agent are included.
  2. Virtual Probe, a free, open-source, software probe for monitoring virtual switch traffic based on the sFlow Agent library.
  3. sFlow Toolkit, contains the sflowtool command line utility that decodes and prints sFlow records.
  4. sFlowTrend, a free, graphical sFlow analyzer. Testing the agent with sFlowTrend makes it easy to visually check the accuracy of the sFlow agent and ensure that it is correctly reporting traffic flows and interface counters.
The first challenge facing anyone interested in writing an sFlow analyzer is to get access to a source of sFlow data. In order to obtain realistic sFlow data, it is best if the sFlow is generated by devices installed in a production network. The sFlow standard is widely supported by switch vendors, so check to see if you have access to an sFlow capable device. If an sFlow capable switch is not available, the software agents listed above provide another alternative. Installing the agent software on a server will monitor traffic on the server, generating realistic sFlow data.

The following software tools are helpful for sFlow analyzer developers:
  1. sFlow Toolkit, contains a command line tool, sflowtool, that decodes sFlow and can be used with scripting languages to perform sFlow analysis. The source code for sflowtool is a useful starting point for developing a C language sFlow analyzer.
  2. Net::sFlow, open-source Perl library for decoding sFlow version 4 and 5 datagrams.
  3. jsFlow, open-source Java library for decoding sFlow version 5 datagrams.
  4. Wireshark, open-source protocol analyzer. Familiarity with protocol analysis is an important skill for an sFlow developer since sFlow exports packet headers and decoding the packet headers is the responsibility of the sFlow analyzer. Examining traffic in Wireshark is a great way to develop a better understanding of protocol analysis and the information that can be extracted from the packet header.
Finally, please share links to additional resources by posting comments to this article.