Operations¶
This section contains day-2 runbooks for creating, changing, scaling, monitoring, and maintaining AKS clusters in production.
Main Content¶
flowchart TD
A[Operations] --> B[Cluster Creation]
A --> C[Node Pool Operations]
A --> D[Upgrades]
A --> E[Scaling Operations]
A --> F[Monitoring and Logging]
A --> G[Maintenance Windows]
A --> H[Credential Rotation] | Document | Description |
|---|---|
| Cluster Creation | Build a production-ready cluster with initial baseline settings |
| Node Pool Operations | Add, scale, cordon, drain, and retire node pools safely |
| Upgrades | Upgrade Kubernetes versions and node images with validation |
| Scaling Operations | Operate HPA, VPA, and cluster autoscaler safely |
| Monitoring and Logging | Configure observability, alerts, and diagnostic collection |
| Maintenance Windows | Align upgrades and platform maintenance with business windows |
| Credential Rotation | Rotate certificates, identities, kubeconfig access, and secrets |
Advanced Topics¶
- Treat every operational change as a runbook with pre-checks and post-checks.
- Keep non-production clusters close enough to production that upgrades and scaling tests are meaningful.