Automatic scaling

Cluster autoscaling
Horizontal pod autoscaling
Vertical pod autoscaling

Automatic scaling is a way to modify the size of a node group, the number of pods, or the amount of resources allocated to each pod based on resource requests for pods running on the group's nodes. Autoscaling is available as of Kubernetes version 1.15.

In a Managed Service for Kubernetes cluster, three types of automatic scaling are available:

Cluster autoscaling (Cluster Autoscaler). Managed Service for Kubernetes monitors the load on the nodes and modifies the number of nodes within specified limits as required.
Horizontal pod scaling (Horizontal Pod Autoscaler). Kubernetes dynamically changes the number of pods running on each node of the group.
Vertical pod scaling (Vertical Pod Autoscaler). When load increases, Kubernetes allocates additional resources to each pod within established limits.

You can use several types of automatic scaling in the same cluster. However, using Horizontal Pod Autoscaler and Vertical Pod Autoscaler together is not recommended.

Cluster autoscaling

Cluster Autoscaler automatically modifies the number of nodes in a group depending on the load.

When creating a node group, select an automatic scaling type and set the minimum, maximum, and initial number of nodes in the group. Kubernetes will periodically check the pod status and node load on the nodes, adjusting the group size as required:

If pods can't be assigned because of a shortage of vCPUs or RAM on the existing nodes, the number of nodes in the group will gradually increase until it reaches the specified maximum.
If the load on the nodes is insufficient and all pods can be scheduled onto fewer nodes in the group, the number of nodes gradually decreases to the minimum specified. If the pods in a node can't be relocated in the span of a specified period of time (7 minutes), the node is forced to stop. You can't change the waiting time.

Note

When calculating the current limits and quotas, Managed Service for Kubernetes uses the specified maximum node group size as the actual size, regardless of the current group size.

Cluster Autoscaler activation is only available when creating a node group.

Learn more about Cluster Autoscaler in the Kubernetes documentation.

Horizontal pod autoscaling

When using horizontal pod scaling, Kubernetes changes the number of pods depending on vCPU load.

When creating a Horizontal Pod Autoscaler, specify the following using parameters:

Desired average percentage vCPU load for each pod.
Minimum and maximum number of pod replicas.

Horizontal pod autoscaling is available for the following controllers:

Learn more about Horizontal Pod Autoscaler in the Kubernetes documentation.

Vertical pod autoscaling

Kubernetes restricts resource allocation for each application using the limits parameters. For a pod that has exceeded the vCPU limit, the processor clock cycle skip mode is enabled. A pod that has exceeded the RAM limit will be stopped.

If required, Vertical Pod Autoscaler allocates additional vCPU and RAM resources to pods.

When creating a Vertical Pod Autoscaler, set the autoscaling option in the specification:

updateMode: "Auto" for Vertical Pod Autoscaler to manage pod resources automatically.
updateMode: "Off" for Vertical Pod Autoscaler to provide recommendations on managing pod resources without modifying them.

Note

Vertical Pod Autoscaler will not apply new recommendations if an application is only deployed in a single pod.

Learn more about Vertical Pod Autoscaler in the Kubernetes documentation.