Horizontal Pod Autoscaler of Kubernetes is not a re-active approach, but in fact it is a proactive scaling approach. Let I explain its algorithm using its default setting:
- The cool time is 5 minutes
- Resource utilization tracing for every 15 seconds
It means that the system traces resource utilization (depend on what metrics the end-users set, e.g., CPU, storage...etc.) for every 15 seconds.
Until every 5 minutes of cooling down (no scaling actions), the controller will calculate the resource utilization in the past 5 minutes (which uses the historical data traces in every 15 seconds above). Then it estimates number of resource (i.e., number of replicas) requires for the next 5-min time window by the equation:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue /
desiredMetricValue )]
Other pro-active auto-scaler also works in the similar manner. Different points is that they may apply different techniques (queue theory, machine learning, or time series model) to estimate desiredReplicas as what done in the above equation.