(chart from Flyways To De-Congest Data Networks)
The following papers are some of the few that contain traffic data from corporate data centers:
- A First Look at Modern Enterprise Traffic (2005), broad comparison of enterprise and Internet traffic based on traces captured at Lawrence Berkeley National Laboratory
- How Healthy are Today's Enterprise Networks? (2008), uses data collected from end hosts to assess the number of unsuccessful connection attempts in an enterprise network
- Understanding Data Center Traffic Characteristics (2009), surveys 19 corporate data centers
- The Nature of Datacenter Traffic: Measurements & Analysis (2009), provides detailed information on Hadoop/Map-Reduce network traffic
- What Goes Into a Data Center? (2009), overview of data center components and management challenges, including: power, servers, networking and software
- Flyways To De-Congest Data Networks (2009), examines congestion in a data center network
The papers are worth reading in their entirety, but some highlights stand out:
- "The characteristics of traffic inside the Internet enterprises remain almost wholly unexplored"
- "80% of packets stay in data center ... Trend is towards even more internal communication"
- 90 - 95% of the network devices in large data centers are edge devices
- "We find that utilization is significantly higher in the core than in the aggregation and edge layers"
- "Given the large number of unused links (40% are never used), an ideal traffic engineering scheme would split traffic across the over-utilized and under-utilized links"
- "Our data shows that a map-reduce style data mining workload results in sparse demand matrices"
- "At any time only a few ToR [Top of Rack] switches are bottlenecked"
- "Today, computation constrained by network"
In order to remove the network bottlenecks that can affect the performance of applications in the data center, a number of architectures have been developed to create a non-blocking data center network, including Fat-tree and VL2.
(table from Flyways To De-Congest Data Networks)
However, eliminating over-subscription in the network is expensive, ranging from 2 to 5 times the cost of a conventional network design. In addition, the cost to power and manage the increased number of links and switches adds significantly to the operating cost of the data center.
Applying this same strategy to the road system would be the equivalent of connecting every town and city with an 8-lane freeway, no matter how small or remote the town. In practice, traffic studies guide development and roads are built where they are needed to satisfy demand. A similar, measurement-based, approach can be applied to network design.
The measurement studies show that networks already contain many unused and under-utilized links. Instead of using the brute force approach of adding capacity, an alternative strategy is to use network visibility to utilize existing bandwidth more intelligently and target upgrades where they are most needed.
Data center visibility is made possible by the sFlow standard, currently supported by most switch vendors. Network switches with sFlow deliver the visibility into utilization, topology and traffic needed for effective control of data center resources, ensuring that the benefits of convergence and virtualization can be fully realized.