Tuesday, January 26, 2010

Xen Cloud Platform


Since cloud computing is accessed as a service, most people are unaware of the vendors providing cloud solutions. The dominant cloud solution is Xen (see Xen in the Cloud), the open source virtualization software used by leading cloud providers like Amazon, Rackspace and GoGrid to deliver cloud services to their customers.

The Xen Cloud Platform (XCP) project offers a complete cloud platform based on Xen and other open source components in order to deliver a fully integrated stack, making it simpler for service providers and enterprises to deploy cloud computing solutions.

Enterprises evaluating cloud computing solutions should consider the benefits of using the same software stack as the public cloud providers. A common software stack eases integration between public and private cloud services and facilitates migration of services between hosted and internal virtual servers. This mobility is particularly attractive when using hosted facilities as a way to create an "elastic cloud", adding capacity from service providers to increase geographic coverage, provide off-site service redundancy, or to expand capacity during periods of high demand. A shared software stack provides seamless management of resources throughout the cloud infrastructure, simplifying management, reducing operating costs and improving agility.

The Xen/XCP stack includes the Open vSwitch, providing an open source alternative to proprietary virtual switches (such as Cisco Nexus 1000v and the VMware Distributed Switch).

A previous article, describes the visibility provided by sFlow monitoring built into the Open vSwitch. Implementation of the sFlow standard in the vSwitch extends the monitoring already available in most vendor's hardware switches into the virtualization layer, providing seamless visibility and control of all physical and virtual network resources in the cloud.

Many service providers already use sFlow-based billing solutions. The availability of sFlow in the Xen/XCP stack provides additional flexibility for usage based accounting and billing in virtual hosting and cloud service environments.

Finally, the detailed visibility that sFlow provides helps lower costs by optimizing resource utilization and reducing the need for over provisioning.

Feb. 15, 2011 Update: More recent articles describing sFlow configuration in XCP and other cloud platforms can be found by clicking on the vSwitch label below.

Monday, January 25, 2010

Open vSwitch

(diagram from Open vSwitch)

The Open vSwitch provides advanced switching capabilities for virtual servers. Currently the Open vSwitch supports Linux, Xen/XenServer, KVM and Virtual Box. The open source software is designed to be easily portable and is expected to support additional platforms in the future. The Open vSwitch is designed to integrate switching across multiple physical servers, providing an open source alternative to proprietary virtual switches such as VMWare's distributed switch and Cisco's Nexus 1000v.

The recent integration of sFlow traffic monitoring in the Open vSwitch extends the visibility into virtual servers, ensuring data center visibility and control.

Note: The Open vSwitch demonstrates how to integrate the reference sFlow agent code with a virtual switch or network adapter. Integrating sFlow requires minimal support in the "fast path" requiring only packet sampling and packet counters.

The following lines, added to the Open vSwitch configuration file (ovs-vswitchd.conf), configure sampling packets at 1-in-512, polling counters every 20 seconds and sending sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343):

sflow.<bridgename>.agent    = eth0
sflow.<bridgename>.host     = 10.0.0.50:6343
sflow.<bridgename>.sampling = 512
sflow.<bridgename>.polling  = 20
sflow.<bridgename>.header   = 128

Note: Type "man ovs-vswitchd.conf" for a full list of configuration options. A previous posting discussed the selection of sampling rates.

The following screen capture, from the free sFlowTrend application, demonstrates the visibility provided by sFlow in the Open vSwitch:


All traffic is visible, traffic between virtual machines, and from the virtual machines to the outside world. In addition, sFlow is able to report on all the protocols on the network (note the layer 2, TCP and IPv6 flows in the chart), as well as information on VLANs and layer 2 priorities that is essential for managing switched traffic.

The second screen capture shows a bandwidth trend for a virtual adapter on the vSwitch:


This type of interface trending is a staple of network management, but obtaining the information is challenging in virtual environments. While SNMP is typically used to obtain this information from network equipment, servers are much less likely to be managed using SNMP and so SNMP polling is often not an option. In addition, there may be large numbers of virtual ports associated with each physical switch port. In a virtual environment with 10,000 physical switch ports you might need to monitor as many as 200,000 virtual ports. Even if SNMP agents were installed on all the servers, SNMP polling does not scale well to large numbers of interfaces. The integrated counter polling mechanism built into sFlow provides scalable monitoring of the utilization of every switch port in the network, both physical and virtual, quickly identifying problems wherever they may occur in the network.

Download Open vSwitch and sFlowTrend to evaluate the benefits of visibility in the virtualization layer.

Finally, the Open vSwitch also supports the OpenFlow to allow centralized control of switch forwarding. The combination of sFlow and OpenFlow in the vSwitches delivers visibility and control of the network edge.

Feb. 15, 2011 Update: The configuration steps shown in this article are no longer correct, more recent versions of the Open vSwitch use the ovs-vsctl command instead. The easiest way to manage the sFlow configuration of an Open vSwitch is to install the open source Host sFlow agent which will automatically manage sFlow settings in the Open vSwitch. For recent information on the Open vSwitch, click on the vSwitch label below.

Configuring H3C switches

The following commands configure an H3C switch (10.0.0.250), sampling packets at 1-in-512, polling counters every 30 seconds and sending sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343):

<sysname> system-view
[Sysname] sflow agent ip 10.0.0.250
[Sysname] sflow collector ip 10.0.0.50 port 6343
[Sysname] sflow version 5
[Sysname] sflow interval 30

Then for each interface:

[Sysname] interface ethernet 1/0
[Sysname-Ethernet1/0] sflow enable inbound
[Sysname-Ethernet1/0] sflow sampling-mode random
[Sysname-Etherent1/0] sflow sampling-rate 512

A previous posting discussed the selection of sampling rates. Additional information can be found on the H3C web site.

See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.

Notes: Set the sampling-mode to random on all interfaces that support random sampling. Deterministic sampling is less accurate and should be avoided. The sampling direction should be set consistently across all interfaces, in this example enabling inbound sampling on all interfaces monitors all traffic paths through the switch and avoids double counting.

Saturday, January 23, 2010

Control costs



The visibility and control provided by sFlow can significantly reduce data center costs, both in terms of the capital cost of equipment and the operational costs of managing, powering and cooling the data center.

Poor visibility results in significant costs in two ways:
  1. Over provisioning, because demand is poorly understood system architects act defensively, overestimating requirements and adding excessive safety margins as insurance. The excess capacity built into the system is expensive to purchase and maintain, however, since there is no visibility these costs are hidden.
  2. Poor Service, because demand is poorly understood, operational changes occur in reaction to costly service failures, resulting in rushed, poorly targeted addition of capacity that further increase wasteful over provisioning.
The network wide visibility provided by sFlow fundamentally changes the equation. Detailed visibility into demand allows resources to be targeted where they are needed, minimizing over provisioning and avoiding service failures.

The diagram graphically demonstrates the difference between these two approaches. In the case of limited visibility, the increase in demand is not detected until it is too late. Over-reacting to the performance failure leads to excessive over-provisioning. With the visibility and control provided by sFlow, the increase in demand is detected early and additional capacity added. As demand for the service decreases, the additional resources are released so that they can be applied elsewhere.

The benefit of data center convergence and virtualization is that resources are pooled and can be allocated as needed. However, without network visibility it is impossible to fully realize the cost savings and improved performance that convergence promises.

Sunday, January 17, 2010

Developer resources

(photo from sFlow.org)

The challenge of providing data center visibility and control requires that all network devices integrate the instrumentation needed for visibility and that software developers deliver solutions that convert the measurements into actionable intelligence.

Meeting this challenge involves a broad community of individuals and companies participating in the sFlow consortium and developing products based on the sFlow standard. The rapid changes taking place in data center networking are accelerating the adoption of sFlow and introducing large numbers of developers to sFlow technology. This article provides links to resources intended to help new developers get started with sFlow.

The sFlow.org web site is the first place developers should look for information, whether adding sFlow support to a network device, or developing an application to analyze sFlow. Join the sFlow mailing list to ask questions and connect with the sFlow community.

Familiarity with the sFlow version 5 standard is essential, even if you are using one of the following software libraries. A detailed understanding of the standard is critical when integrating the agent software with switch hardware to ensure that data is correctly collected and exported. When developing an sFlow analyzer, an understanding of the measurements ensures that the data is correctly interpreted.

Agent developers should not base their agent implementations on RFC 3176. The RFC describes sFlow version 4 and has been superseded by version 5. However, when developing an sFlow analyzer, both versions of sFlow should be supported since customers may have equipment exporting sFlow version 4.

In addition to the standard, the paper Packet Sampling Basics, is worth reading in order to understand the theory behind sFlow's sampling mechanism.

The following resources are available for agent developers:
  1. Developer Tools, contains a free, open-source implementation of an sFlow agent that is the basis for a number of open source and commercial sFlow agent implementations. Building an agent using the sFlow Agent library saves time and ensures that the sFlow information is correctly and efficiently encoded. In addition to the agent source code, tools for testing and validating the correct function of the agent are included.
  2. Virtual Probe, a free, open-source, software probe for monitoring virtual switch traffic based on the sFlow Agent library.
  3. sFlow Toolkit, contains the sflowtool command line utility that decodes and prints sFlow records.
  4. sFlowTrend, a free, graphical sFlow analyzer. Testing the agent with sFlowTrend makes it easy to visually check the accuracy of the sFlow agent and ensure that it is correctly reporting traffic flows and interface counters.
The first challenge facing anyone interested in writing an sFlow analyzer is to get access to a source of sFlow data. In order to obtain realistic sFlow data, it is best if the sFlow is generated by devices installed in a production network. The sFlow standard is widely supported by switch vendors, so check to see if you have access to an sFlow capable device. If an sFlow capable switch is not available, the software agents listed above provide another alternative. Installing the agent software on a server will monitor traffic on the server, generating realistic sFlow data.

The following software tools are helpful for sFlow analyzer developers:
  1. sFlow Toolkit, contains a command line tool, sflowtool, that decodes sFlow and can be used with scripting languages to perform sFlow analysis. The source code for sflowtool is a useful starting point for developing a C language sFlow analyzer.
  2. Net::sFlow, open-source Perl library for decoding sFlow version 4 and 5 datagrams.
  3. jsFlow, open-source Java library for decoding sFlow version 5 datagrams.
  4. Wireshark, open-source protocol analyzer. Familiarity with protocol analysis is an important skill for an sFlow developer since sFlow exports packet headers and decoding the packet headers is the responsibility of the sFlow analyzer. Examining traffic in Wireshark is a great way to develop a better understanding of protocol analysis and the information that can be extracted from the packet header.
Finally, please share links to additional resources by posting comments to this article.

Wednesday, January 13, 2010

OpenFlow



OpenFlow is a new protocol for controlling how packets are forwarded through switches. Currently switches implement control and forwarding logic within the same device. OpenFlow makes use of existing forwarding hardware but moves the control logic out of the switch to a central controller. The architecture provides fine grain control of traffic, making it possible to engineer the network to implement customized security and performance management strategies.

An interesting video demo shows how OpenFlow can be used to maintain a live connection between two laptops as they move around a wireless network. Similar challenges exist in data centers in maintaining connectivity as virtual machines migrate.

The combination of OpenFlow and sFlow offers exciting possibilities, combining the network wide visibility of sFlow with the centralized control capabilities of OpenFlow creates a dynamic feedback control system that can be used to create intelligent, self optimizing networks.

Wednesday, January 6, 2010

Data center traffic


Very few studies of data center traffic have been published since the challenge of instrumentation and the confidentiality of the data create significant obstacles for researchers.

The following papers are some of the few that contain traffic data from corporate data centers:
The papers are worth reading in their entirety, but some highlights stand out:
  • "The characteristics of traffic inside the Internet enterprises remain almost wholly unexplored"
  • "80% of packets stay in data center ... Trend is towards even more internal communication"
  • 90 - 95% of the network devices in large data centers are edge devices
  • "We find that utilization is significantly higher in the core than in the aggregation and edge layers"
  • "Given the large number of unused links (40% are never used), an ideal traffic engineering scheme would split traffic across the over-utilized and under-utilized links"
  • "Our data shows that a map-reduce style data mining workload results in sparse demand matrices"
  • "At any time only a few ToR [Top of Rack] switches are bottlenecked"
  • "Today, computation constrained by network"
In order to remove the network bottlenecks that can affect the performance of applications in the data center, a number of architectures have been developed to create a non-blocking data center network, including Fat-tree and VL2.


However, eliminating over-subscription in the network is expensive, ranging from 2 to 5 times the cost of a conventional network design. In addition, the cost to power and manage the increased number of links and switches adds significantly to the operating cost of the data center.

Applying this same strategy to the road system would be the equivalent of connecting every town and city with an 8-lane freeway, no matter how small or remote the town. In practice, traffic studies guide development and roads are built where they are needed to satisfy demand. A similar, measurement-based, approach can be applied to network design.

The measurement studies show that networks already contain many unused and under-utilized links. Instead of using the brute force approach of adding capacity, an alternative strategy is to use network visibility to utilize existing bandwidth more intelligently and target upgrades where they are most needed.

Data center visibility is made possible by the sFlow standard, currently supported by most switch vendors. Network switches with sFlow deliver the visibility into utilization, topology and traffic needed for effective control of data center resources, ensuring that the benefits of convergence and virtualization can be fully realized.