Wednesday, January 6, 2010

Data center traffic

Very few studies of data center traffic have been published since the challenge of instrumentation and the confidentiality of the data create significant obstacles for researchers.

The following papers are some of the few that contain traffic data from corporate data centers:
The papers are worth reading in their entirety, but some highlights stand out:
  • "The characteristics of traffic inside the Internet enterprises remain almost wholly unexplored"
  • "80% of packets stay in data center ... Trend is towards even more internal communication"
  • 90 - 95% of the network devices in large data centers are edge devices
  • "We find that utilization is significantly higher in the core than in the aggregation and edge layers"
  • "Given the large number of unused links (40% are never used), an ideal traffic engineering scheme would split traffic across the over-utilized and under-utilized links"
  • "Our data shows that a map-reduce style data mining workload results in sparse demand matrices"
  • "At any time only a few ToR [Top of Rack] switches are bottlenecked"
  • "Today, computation constrained by network"
In order to remove the network bottlenecks that can affect the performance of applications in the data center, a number of architectures have been developed to create a non-blocking data center network, including Fat-tree and VL2.

However, eliminating over-subscription in the network is expensive, ranging from 2 to 5 times the cost of a conventional network design. In addition, the cost to power and manage the increased number of links and switches adds significantly to the operating cost of the data center.

Applying this same strategy to the road system would be the equivalent of connecting every town and city with an 8-lane freeway, no matter how small or remote the town. In practice, traffic studies guide development and roads are built where they are needed to satisfy demand. A similar, measurement-based, approach can be applied to network design.

The measurement studies show that networks already contain many unused and under-utilized links. Instead of using the brute force approach of adding capacity, an alternative strategy is to use network visibility to utilize existing bandwidth more intelligently and target upgrades where they are most needed.

Data center visibility is made possible by the sFlow standard, currently supported by most switch vendors. Network switches with sFlow deliver the visibility into utilization, topology and traffic needed for effective control of data center resources, ensuring that the benefits of convergence and virtualization can be fully realized.

1 comment:

  1. This year's Cisco's annual Global Cloud Index (GCI) reveals the latest findings and predictions on data centre traffic and cloud computing between 2013 and 2018.