CrashLoop¶

1. Summary¶

A pod starts and repeatedly exits or fails health checks. The immediate symptom is restart churn, but the real cause may be application configuration, dependency startup, or probe design.

flowchart TD
    A[Symptom] --> B[Hypotheses]
    B --> C[Evidence]
    C --> D[Disprove weak paths]
    D --> E[Mitigation]

2. Common Misreadings¶

The first visible symptom is the root cause.
Restarting the pod proves the issue is fixed.
If one namespace is affected, the cluster is healthy.

3. Competing Hypotheses¶

H1: The container process exits due to application error.
H2: Liveness/readiness/startup probes are misconfigured.
H3: Required configuration or secret data is missing.
H4: The process is OOMKilled or resource starved.

4. What to Check First¶

kubectl get pods -A
kubectl logs <pod-name> -n <namespace> --previous
kubectl describe pod <pod-name> -n <namespace>

5. Evidence to Collect¶

Previous container logs.
Exit code and termination reason.
Probe configuration and timing.
Secret, ConfigMap, and dependency readiness state.

6. Validation and Disproof by Hypothesis¶

If exit code and stack trace exist, H1 is strongest.
If logs are clean but probes fail, disprove application crash first and inspect probes.
If termination reason is OOMKilled, prioritize requests/limits and memory use.

7. Likely Root Cause Patterns¶

Invalid app config or missing secret.
Startup work taking longer than probe thresholds.
Memory limits too low for workload behavior.
Dependency endpoints unavailable at startup.

8. Immediate Mitigations¶

Scale down noisy rollout if needed.
Fix configuration or probe settings.
Increase limits only if evidence supports it.
Use startupProbe for slow initialization instead of weakening liveness blindly.

9. Prevention¶

Add startup validation in CI/CD.
Keep probe design workload-specific.
Capture restart alerts with namespace and owner labels.

CrashLoop¶

1. Summary¶

2. Common Misreadings¶

3. Competing Hypotheses¶

4. What to Check First¶

5. Evidence to Collect¶

6. Validation and Disproof by Hypothesis¶

7. Likely Root Cause Patterns¶

8. Immediate Mitigations¶

9. Prevention¶

See Also¶

Sources¶