One comment from an App Engine user stood out, "We noticed a 5x increase in server instances. I think the scaling algorithm kicked in when instance latency grew to 60 seconds. Request latency is a key component in the decision to spawn more instances, right?"
Service level agreements are typically expressed in terms of latency/response time/delay, so response time needs to be managed. It seems intuitively obvious that monitoring response time and taking action if response time is seen to be increasing is the right approach to service scaling. However, there are serious problems with response time as a control metric.
This article discusses the problems with using response time to drive control decisions. The discussion has broad relevance to the areas of server scaling, cloud orchestration, load balancing and software defined networking, where cloud systems need to adapt to changing demand.
|Figure 1: Response Time vs Utilization (from Performance by Design)|
Problem 1: Non-linear gainAnyone who has held their ears because of the loud screech of a public address system has experienced the effect of gain on the stability of a feedback system. As the volume on the amplifier is increased there comes a point where the amplified sound from the speakers is picked up and re-amplified in a self sustaining feedback loop - resulting in an ear splitting screech. The only way to stop the sound is to turn the volume down, or turn off the microphone.
|Figure 2: Step response vs loop gain (from PID controller)|
|Figure 3: Gain vs utilization|
Problem 2: Non-linear delayDelay and stability, describes how delay in a feedback loop results in system instability.
|Figure 4: Effect of delay on stability (from Delay and stability)|
Response time is what is referred to as a lagging (delayed) indicator of performance. Delay is intrinsic to the measurement since response time can only be calculated when a request completes.
|Figure 5: Measurement delay vs measured response time|
SolutionUse of response time as a control variable leads to insidious performance problems - the controller appears to work well when the system is operating at low to moderate utilizations, but suddenly becomes unstable if an unexpected surge in demand moves the system into the high gain, high delay (unstable) region. Once the system has been destabilized, it can continue to behave erratically, even after the surge in demand has passed. A full shutdown may be the only way to restore stable operation. From the Google blog, "11:10 am - We determine that App Engine’s traffic routers are trapped in a cascading failure, and that we have no option other than to perform a full restart with gradual traffic ramp-up to return to service."
The solution to controlling response time lies in the recognition that response time is a function of system utilization. Instead of basing control actions on measured response time, controls should be based on measured utilization. Utilization is an easy to measure, low latency, linear metric that can be used to construct stable and responsive feedback control systems. Since response time is a function of utilization, controlling utilization effectively controls response time.
The sFlow standard provides multi-vendor, scaleable, visibility into changing demand needed to deliver stable and effective scaling, load balancing, control and orchestration solutions.