Image Pull Failure¶
1. Summary¶
A pod cannot start because the node cannot pull the required image. In AKS, this usually points to registry authentication, image reference, network reachability, or policy issues.
flowchart TD
A[Symptom] --> B[Hypotheses]
B --> C[Evidence]
C --> D[Disprove weak paths]
D --> E[Mitigation] 2. Common Misreadings¶
- The first visible symptom is the root cause.
- Restarting the pod proves the issue is fixed.
- If one namespace is affected, the cluster is healthy.
3. Competing Hypotheses¶
- H1: Image name or tag is wrong.
- H2: ACR or external registry authentication failed.
- H3: The node cannot reach the registry endpoint.
- H4: Admission policy or image allow-list blocked the pull.
4. What to Check First¶
kubectl describe pod <pod-name> -n <namespace>
kubectl get secret -n <namespace>
az aks check-acr --resource-group $RG --name $CLUSTER_NAME --acr <acr-name>.azurecr.io
5. Evidence to Collect¶
- Pod events showing
ErrImagePullorImagePullBackOff. - Image reference in the workload manifest.
- ACR integration or imagePullSecrets configuration.
- Node egress and DNS evidence if registry reachability is suspected.
6. Validation and Disproof by Hypothesis¶
- If
manifest unknownappears, disprove H2-H4 first and fix the image reference. - If
unauthorizedappears, focus on identity or secret issues. - If timeouts or name resolution failures appear, prioritize network investigation.
7. Likely Root Cause Patterns¶
- Missing ACR role assignment.
- Image tag deleted or never pushed.
- Private DNS or proxy issues affecting registry access.
- Secret drift after credential rotation.
8. Immediate Mitigations¶
- Correct the image reference.
- Reconnect the cluster to ACR or fix imagePullSecrets.
- Test registry resolution from a node or debug pod.
- Restart the deployment only after the root issue is fixed.
9. Prevention¶
- Standardize image publishing and retention.
- Prefer managed identity-based ACR access.
- Add pre-deployment image existence checks in CI.