Monitoring and Logging¶

AKS observability must cover cluster state, node health, workload health, and control-plane-related signals. Effective monitoring is the difference between guessing and diagnosing.

Prerequisites¶

Log Analytics workspace or equivalent telemetry backend is available.
Metrics Server and/or Azure Monitor pipelines are configured.
Alert ownership and escalation paths are defined.

When to Use¶

Building the baseline observability stack.
Expanding alerts for new critical workloads.
Diagnosing incidents with cluster and node evidence.

Procedure¶

flowchart TD
    A[Cluster Metrics] --> B[Logs]
    B --> C[Alerts]
    C --> D[Dashboards]
    D --> E[Incident Triage]

az aks enable-addons     --resource-group $RG     --name $CLUSTER_NAME     --addons monitoring
kubectl top nodes
kubectl top pods -A
kubectl get events -A --sort-by=.lastTimestamp

Verification¶

az aks show --resource-group $RG --name $CLUSTER_NAME --query addonProfiles.omsagent.enabled --output tsv
kubectl get pods -n kube-system

Rollback / Troubleshooting¶

If metrics are missing, check Metrics Server and Azure Monitor agent health.
If logs exist but are unusable, refine namespace, workload, and owner labeling.
If alerts are noisy, fix thresholds and missing suppression logic instead of disabling visibility.

Monitoring and Logging¶

Prerequisites¶

When to Use¶

Procedure¶

Verification¶

Rollback / Troubleshooting¶

See Also¶

Sources¶