AKS Container Insights Diagnostics¶
Analyze Container Insights inventory and log data for Azure Kubernetes Service to identify unstable pods, repeated container restarts, and node-level warning signals before they become cluster-wide incidents.
Scenario¶
You need to find pods with repeated restarts or failed states in the last 24 hours and correlate them with the node that is hosting the workload.
KQL Query¶
KubePodInventory
| where TimeGenerated > ago(24h)
| summarize
RestartCount = max(ContainerRestartCount),
arg_max(TimeGenerated, PodStatus, ContainerStatus, Computer)
by ClusterName, Namespace, PodName, ContainerName
| where RestartCount > 0 or PodStatus in ("Failed", "Pending", "Unknown") or ContainerStatus != "running"
| project
ClusterName,
Namespace,
PodName,
ContainerName,
PodStatus,
ContainerStatus,
RestartCount,
Node = Computer,
LastSeen = TimeGenerated
| order by RestartCount desc, LastSeen desc
| take 15
Data Flow¶
graph TD
A[KubePodInventory table] --> B[Filter by 24h]
B --> C[Capture latest pod and container state]
C --> D[Highlight restart or failed status patterns]
D --> E[Project namespace, pod, and node]
E --> F[Review top unstable workloads] Sample Output¶
| ClusterName | Namespace | PodName | ContainerName | PodStatus | ContainerStatus | RestartCount | Node | LastSeen |
|---|---|---|---|---|---|---|---|---|
| aks-prod-01 | payments | api-7c9d8b6f6f-jx2lm | api | Running | waiting | 9 | aks-nodepool1-38291-vmss000003 | 2026-04-13 09:42:00Z |
| aks-prod-01 | ingress | nginx-ingress-5fd6d8b9d8-kt4rq | controller | Failed | terminated | 4 | aks-nodepool1-38291-vmss000001 | 2026-04-13 09:39:00Z |
How to Read This¶
High RestartCount values usually indicate crash loops, image pull retries, or readiness probe failures. If several affected pods map to the same Node, investigate node pressure, kubelet health, or underlying infrastructure conditions before focusing only on the application container.
Limitations¶
- Container Insights must be enabled and sending
KubePodInventorydata to the workspace. - This query shows the latest observed state, so short-lived transient failures can be missed if they recover quickly.
- Node condition details may require correlation with
KubeNodeInventory,InsightsMetrics, or Kubernetes events outside this query pack.