Thanks for asking question! You may want to investigate Best practices for Autoscale
Also, it’s important to understand the flapping process:
It is recommended to carefully choose different thresholds for scale-out and scale-in based on practical situations and don’t recommend autoscale settings like the examples below with the same or very similar threshold values for out and in conditions:
Take this as an example:
Increase instances by 1 count when Thread Count <= 600
Decrease instances by 1 count when Thread Count >= 600
Now please consider the following process:
Assume there are two instances to begin with and then the average number of threads per instance grows to 625.
Autoscale scales out adding a third instance.
Next, assume that the average thread count across instance falls to 575.
Before scaling down, autoscale tries to estimate what the final state will be if it scaled in. For example, 575 x 3 (current instance count) = 1,725 / 2 (final number of instances when scaled down) = 862.5 threads. This means autoscale would have to immediately scale-out again even after it scaled in, if the average thread count remains the same or even falls only a small amount. However, if it scaled up again, the whole process would repeat, leading to an infinite loop.
To avoid this situation (termed "flapping"), autoscale does not scale down at all. Instead, it skips and reevaluates the condition again the next time the service's job executes. This can confuse many people because autoscale wouldn't appear to work when the average thread count was 575.
Estimation during a scale-in is intended to avoid "flapping" situations, where scale-in and scale-out actions continually go back and forth. Keep this behavior in mind when you choose the same thresholds for scale-out and in.
We recommend choosing an adequate margin between the scale-out and in thresholds. As an example, consider the following better rule combination.
Increase instances by 1 count when CPU% >= 80
Decrease instances by 1 count when CPU% <= 60
To add to this the cool down period which means that if a scale down/up operation has happened, even if the rule is true (example - CPU remains high) the auto scale rule will not trigger. If the cool down is 2 min which means that if a scale down/up operation has happened, for the next 2 minutes, even if the rule is true, it will not be triggered due to cool down period.