Skip to content

Operations

This section contains day-2 runbooks for creating, changing, scaling, monitoring, and maintaining AKS clusters in production.

Main Content

flowchart TD
    A[Operations] --> B[Cluster Creation]
    A --> C[Node Pool Operations]
    A --> D[Upgrades]
    A --> E[Scaling Operations]
    A --> F[Monitoring and Logging]
    A --> G[Maintenance Windows]
    A --> H[Credential Rotation]
Document Description
Cluster Creation Build a production-ready cluster with initial baseline settings
Node Pool Operations Add, scale, cordon, drain, and retire node pools safely
Upgrades Upgrade Kubernetes versions and node images with validation
Scaling Operations Operate HPA, VPA, and cluster autoscaler safely
Monitoring and Logging Configure observability, alerts, and diagnostic collection
Maintenance Windows Align upgrades and platform maintenance with business windows
Credential Rotation Rotate certificates, identities, kubeconfig access, and secrets

Advanced Topics

  • Treat every operational change as a runbook with pre-checks and post-checks.
  • Keep non-production clusters close enough to production that upgrades and scaling tests are meaningful.

See Also

Sources