Node Pool Operations¶
Use node pool changes to adjust cluster capacity and isolate workloads without rebuilding the cluster. Safe node pool operations depend on drain behavior, pod disruption budgets, and quota awareness.
Prerequisites¶
- Cluster credentials are current.
- You understand which workloads run on the target pool.
- PodDisruptionBudgets and autoscaler settings have been reviewed.
When to Use¶
- Adding a new workload class.
- Replacing VM sizes or OS images.
- Scaling specific workload pools independently.
Procedure¶
flowchart TD
A[Identify target pool] --> B[Add or scale pool]
B --> C[Cordon and drain if retiring]
C --> D[Validate workload rescheduling] az aks nodepool list --resource-group $RG --cluster-name $CLUSTER_NAME --output table
az aks nodepool add --resource-group $RG --cluster-name $CLUSTER_NAME --name apps01 --mode User --node-vm-size Standard_D4ds_v5 --node-count 3
kubectl get nodes -L kubernetes.azure.com/agentpool
kubectl cordon <node-name>
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
Verification¶
az aks nodepool show --resource-group $RG --cluster-name $CLUSTER_NAME --name apps01 --query "{count:count,mode:mode,vmSize:vmSize,provisioningState:provisioningState}" --output yaml
kubectl get pods -A -o wide
Rollback / Troubleshooting¶
- If drain blocks, inspect PodDisruptionBudgets and unmanaged pods.
- If scale-out fails, inspect quota, subnet IPs, and autoscaler bounds.
- If workloads land on the wrong pool, inspect taints, tolerations, selectors, and affinity.