Tuesday, December 24, 2013

Workload placement

Messy and organized closets are an every day example of the efficiency that can be gained by packing items together in a systematic way: randomly throwing items in a closet makes poor use of the space; keeping the closet organized increases the available space.

A recent CBS 60 Minutes Amazon segment describes the ultimate closet - an Amazon order fulfillment warehouse. Each vast warehouse looks like a chaotic jumble - something out of Raiders of the Lost Ark.
Even when you get up close to an individual shelf, there still appears to be no organizing principle. Interviewer Charlie Rose comments, "The products are then placed by stackers in what seems to outsiders as a haphazard way… a book on Buddhism and Zen resting next to Mrs. Potato Head…"
Amazon's Dave Clark explains, "Can those two things, you look at how these items fit in the bin. They’re optimized for utilizing the available space. And we have computers and algorithmic work that tells people the areas of the building that have the most space to put product in that’s coming in at that time. Amazon has become so efficient with its stacking, it can now store twice as many goods in its centers as it did five years ago."

The 60 Minutes piece goes on to discuss Amazon Web Services (AWS). There are interesting parallels between managing a cloud data center and managing a warehouse (both of which Amazon does extremely well). There is a fixed amount of physical compute, storage and bandwidth resources in the data center, but instead of having to find shelf space to store physical goods, the data center manager needs to find a server with enough spare capacity to run each new virtual machine.

Just as a physical object has a size, shape and weight that constrain where it can be placed, virtual machines have characteristics such as number of virtual CPUs, memory, storage and network bandwidth that determine how many virtual machines can be placed on each physical server (see Amazon EC2 Instances). For example, an Amazon m1.small instance provides 1 virtual CPU, 1.7 GiB RAM, and 160 GB storage. A simplistic packing scheme would allow 6 small instances to be hosted on a physical server with 8 CPU cores, 32 GiB RAM, and 1 TB disk. This allocation scheme is limited by the amount of disk space and leaves CPU cores and RAM unused.

While the analogy between a data center and a warehouse is interesting, there are distinct differences between computational workloads and physical goods that are important to consider. One of the motivating factors driving the move to virtualization was the realization that most physical servers were poorly utilized. Moving to virtual machines allowed multiple workloads to be combined and run on a single physical server, increasing utilization and reducing costs. Continuing the EC2 example, if measurement revealed that the m1.small instances where only using 80GB of storage, additional instances could be placed on the server by over subscribing the storage.
The Wired article, Return of the Borg: How Twitter Rebuilt Google’s Secret Weapon, describes Google's internally developed workload packing software and the strategic value it has for Google's business.
Amazon has been able to double the capacity of their physical warehouses by using bar code tracking and computer orchestration algorithms. Assuming analytics driven workload placement in data centers can drive a similar increase workload density, what impact would that have for a cloud hosting provider?

Suppose a data center is operating with a gross margin of 20%. Leveraging the sFlow standard for measurement doesn't add to costs since the capability is embedded in most vendor's data center switches, and open source sFlow agents can easily be deployed on hypervisors using orchestration tools. Real-time analytics software is required to turn the raw measurements into actionable data, however, the cost of this software is a negligible part of the overall cost of running the data center. On the other hand, doubling the number of virtual machines that can be hosted in the data center (and assuming that there is sufficient demand to fill this additional capacity) doubles the top line revenue and triples the gross margin to 60%.

One can argue about the assumptions in the example, but playing around with different assumptions and models, it is clear that workload placement has great potential for increasing the efficiency and profitability of cloud data centers. Where the puck is going: analytics describes the vital role for analytics in SDN orchestration stacks, including: VMware (NSX), Cisco, Open Daylight, etc. The article predicts that there will be increase merger and acquisition activity in 2014 as orchestration vendors compete by integrating analytics into their platforms.

Finally, while analytics offers attractive opportunities, a lack of visibility and poorly placed workloads carries significant risks. In SDN market predictions for New Year: NFV, OpenFlow, Open vSwitch boom, Eric Hanselman of 451 Research poses the question, "Will data center overlays hit a wall in 2014?"  He then goes on to state, "There is a point at which the overlay is going to be constrained by the mechanics of the network underneath... Data center operators will want the ability to do dynamic configuration and traffic management on the physical network and tie that management and control into application-layer orchestration."

No comments:

Post a Comment