Troubleshooting¶
This section is a hypothesis-driven AKS troubleshooting hub. Use it to move from symptom to evidence, then to a focused playbook.
Main Content¶
flowchart TD
A[Observe Symptom] --> B[Decision Tree]
B --> C[First 10 Minutes]
C --> D[Playbook]
D --> E[Diagnostic Commands]
E --> F[Validated Root Cause] | Need | Start Here |
|---|---|
| Understand AKS failure surfaces | Architecture Overview |
| Route a symptom quickly | Decision Tree |
| Know what evidence to gather | Evidence Map |
| Apply a mental framework | Mental Model |
| Jump to symptom cards | Quick Diagnosis Cards |
| Respond in the first minutes | First 10 Minutes |
| Execute detailed runbooks | Playbooks |
Quick Routing Areas¶
- Pods: image pulls, crashes, scheduling failures.
- Connectivity: ingress routing, Services, DNS, and egress.
- Nodes: readiness, pressure, and IP exhaustion.
- Operations: upgrade failures, scaling failures, and maintenance side effects.