Skip to content

Monitoring and Logging

AKS observability must cover cluster state, node health, workload health, and control-plane-related signals. Effective monitoring is the difference between guessing and diagnosing.

Prerequisites

  • Log Analytics workspace or equivalent telemetry backend is available.
  • Metrics Server and/or Azure Monitor pipelines are configured.
  • Alert ownership and escalation paths are defined.

When to Use

  • Building the baseline observability stack.
  • Expanding alerts for new critical workloads.
  • Diagnosing incidents with cluster and node evidence.

Procedure

flowchart TD
    A[Cluster Metrics] --> B[Logs]
    B --> C[Alerts]
    C --> D[Dashboards]
    D --> E[Incident Triage]
az aks enable-addons     --resource-group $RG     --name $CLUSTER_NAME     --addons monitoring
kubectl top nodes
kubectl top pods -A
kubectl get events -A --sort-by=.lastTimestamp

Verification

az aks show --resource-group $RG --name $CLUSTER_NAME --query addonProfiles.omsagent.enabled --output tsv
kubectl get pods -n kube-system

Rollback / Troubleshooting

  • If metrics are missing, check Metrics Server and Azure Monitor agent health.
  • If logs exist but are unusable, refine namespace, workload, and owner labeling.
  • If alerts are noisy, fix thresholds and missing suppression logic instead of disabling visibility.

See Also

Sources