Skip to content

Scaling Operations

Scaling is operationally safe only when you understand which layer is changing: replicas, requests, or node capacity. This runbook focuses on operating those controls during normal growth and incident response.

Prerequisites

  • Metrics pipeline is available for pods and nodes.
  • Workloads have requests and limits defined.
  • Autoscaler min/max boundaries are documented.

When to Use

  • Traffic growth requires more replicas or nodes.
  • Pending pods suggest cluster capacity constraints.
  • Cost review requires tuning autoscaler boundaries.

Procedure

flowchart TD
    A[Observe pressure] --> B[Check HPA and requests]
    B --> C[Check node capacity]
    C --> D[Adjust autoscaler or pool]
    D --> E[Verify readiness and latency]
kubectl get hpa -A
kubectl top nodes
kubectl top pods -A
az aks update     --resource-group $RG     --name $CLUSTER_NAME     --enable-cluster-autoscaler     --min-count 3     --max-count 10

Verification

kubectl describe hpa <hpa-name> -n <namespace>
kubectl get pods -A --field-selector=status.phase=Pending
az aks show --resource-group $RG --name $CLUSTER_NAME --query "agentPoolProfiles[].{name:name,min:minCount,max:maxCount,count:count}" --output table

Rollback / Troubleshooting

  • Reduce aggressive HPA targets if scale churn causes instability.
  • If node growth stalls, inspect quota and subnet IP capacity.
  • If scaling works but latency remains high, investigate application bottlenecks rather than adding more nodes blindly.

See Also

Sources