Skip to content

Scaling

AKS scaling operates at multiple layers: pods, nodes, and sometimes cluster topology. Stable scaling comes from correct workload requests, good probes, and realistic capacity boundaries.

Main Content

flowchart TD
    A[Demand Increase] --> B[HPA changes replicas]
    B --> C[Scheduler places pods]
    C --> D{{Enough node capacity?}}
    D -->|No| E[Cluster Autoscaler adds nodes]
    D -->|Yes| F[Pods become Ready]
    F --> G[VPA recommendations tune requests]

Scaling building blocks

  • Horizontal Pod Autoscaler (HPA) changes replica count.
  • Cluster Autoscaler adds or removes nodes when pods cannot schedule or capacity is idle.
  • Vertical Pod Autoscaler (VPA) recommends or applies request changes based on observed usage.

Operational examples

kubectl get hpa -A
kubectl top pods -A
az aks update     --resource-group $RG     --name $CLUSTER_NAME     --enable-cluster-autoscaler     --min-count 3     --max-count 10

Common failure modes

  • HPA scales replicas but requests are too large for existing nodes.
  • Autoscaler is enabled but subnet IPs or quotas block node growth.
  • Workloads have no CPU/memory requests, so autoscaling decisions are noisy.

See Also

Sources