Skip to content

Troubleshooting

This section is a hypothesis-driven AKS troubleshooting hub. Use it to move from symptom to evidence, then to a focused playbook.

Main Content

flowchart TD
    A[Observe Symptom] --> B[Decision Tree]
    B --> C[First 10 Minutes]
    C --> D[Playbook]
    D --> E[Diagnostic Commands]
    E --> F[Validated Root Cause]
Need Start Here
Understand AKS failure surfaces Architecture Overview
Route a symptom quickly Decision Tree
Know what evidence to gather Evidence Map
Apply a mental framework Mental Model
Jump to symptom cards Quick Diagnosis Cards
Respond in the first minutes First 10 Minutes
Execute detailed runbooks Playbooks

Quick Routing Areas

  • Pods: image pulls, crashes, scheduling failures.
  • Connectivity: ingress routing, Services, DNS, and egress.
  • Nodes: readiness, pressure, and IP exhaustion.
  • Operations: upgrade failures, scaling failures, and maintenance side effects.

See Also

Sources