Scaling Operations¶
Scaling is operationally safe only when you understand which layer is changing: replicas, requests, or node capacity. This runbook focuses on operating those controls during normal growth and incident response.
Prerequisites¶
- Metrics pipeline is available for pods and nodes.
- Workloads have requests and limits defined.
- Autoscaler min/max boundaries are documented.
When to Use¶
- Traffic growth requires more replicas or nodes.
- Pending pods suggest cluster capacity constraints.
- Cost review requires tuning autoscaler boundaries.
Procedure¶
flowchart TD
A[Observe pressure] --> B[Check HPA and requests]
B --> C[Check node capacity]
C --> D[Adjust autoscaler or pool]
D --> E[Verify readiness and latency] kubectl get hpa -A
kubectl top nodes
kubectl top pods -A
az aks update --resource-group $RG --name $CLUSTER_NAME --enable-cluster-autoscaler --min-count 3 --max-count 10
Verification¶
kubectl describe hpa <hpa-name> -n <namespace>
kubectl get pods -A --field-selector=status.phase=Pending
az aks show --resource-group $RG --name $CLUSTER_NAME --query "agentPoolProfiles[].{name:name,min:minCount,max:maxCount,count:count}" --output table
Rollback / Troubleshooting¶
- Reduce aggressive HPA targets if scale churn causes instability.
- If node growth stalls, inspect quota and subnet IP capacity.
- If scaling works but latency remains high, investigate application bottlenecks rather than adding more nodes blindly.