Monitoring and Logging¶
AKS observability must cover cluster state, node health, workload health, and control-plane-related signals. Effective monitoring is the difference between guessing and diagnosing.
Prerequisites¶
- Log Analytics workspace or equivalent telemetry backend is available.
- Metrics Server and/or Azure Monitor pipelines are configured.
- Alert ownership and escalation paths are defined.
When to Use¶
- Building the baseline observability stack.
- Expanding alerts for new critical workloads.
- Diagnosing incidents with cluster and node evidence.
Procedure¶
flowchart TD
A[Cluster Metrics] --> B[Logs]
B --> C[Alerts]
C --> D[Dashboards]
D --> E[Incident Triage] az aks enable-addons --resource-group $RG --name $CLUSTER_NAME --addons monitoring
kubectl top nodes
kubectl top pods -A
kubectl get events -A --sort-by=.lastTimestamp
Verification¶
az aks show --resource-group $RG --name $CLUSTER_NAME --query addonProfiles.omsagent.enabled --output tsv
kubectl get pods -n kube-system
Rollback / Troubleshooting¶
- If metrics are missing, check Metrics Server and Azure Monitor agent health.
- If logs exist but are unusable, refine namespace, workload, and owner labeling.
- If alerts are noisy, fix thresholds and missing suppression logic instead of disabling visibility.