Revision Provisioning Failure Lab¶
Reproduce a revision that is created but never becomes ready due to startup probe misconfiguration.
Lab Metadata¶
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Estimated Duration | 20-30 minutes |
| Tier | Consumption |
| Failure Mode | Revision created but startup probe fails repeatedly |
| Skills Practiced | Revision diagnostics, probe configuration, system log analysis |
1) Background¶
This lab demonstrates what happens when a revision is accepted by the Azure Container Apps control plane but never stabilizes. The trigger misconfigures a startup probe to target a non-existent path, causing the probe to fail repeatedly. The revision exists and containers may start, but the platform marks the revision as unhealthy because it never passes the startup probe.
This pattern is distinct from API validation failures (which reject the update before creating a revision) and from port mismatches (covered in a separate lab).
Architecture¶
flowchart TD
A[Deploy baseline app] --> B[Healthy revision running]
B --> C[Trigger: Add bad startup probe]
C --> D[New revision created]
D --> E[Startup probe targets /nonexistent]
E --> F[Probe fails repeatedly]
F --> G[Revision never becomes Ready]
G --> H[Fix: Disable startup probe]
H --> I[New revision becomes Healthy] Revision created ≠ Revision ready
A revision can exist in the system but remain in a Failed or Degraded state if health probes never pass. Always check revision health state, not just existence.
API validation vs runtime failure
Some configuration errors (like referencing a non-existent secret) are now rejected at the API layer with ContainerAppSecretRefNotFound. This lab focuses on errors that pass API validation but fail at runtime.
2) Hypothesis¶
IF a startup probe is configured to target a path that returns 404 or times out, THEN the revision will be created but never become ready, and system logs will show ProbeFailed events until the probe configuration is fixed.
| Variable | Control State | Experimental State |
|---|---|---|
| Startup probe path | Not configured or valid path | /nonexistent (returns 404) |
| Latest revision health | Healthy | Degraded or Failed |
| System logs | Normal startup events | ProbeFailed events |
| Recovery path | No action required | Disable startup probe and deploy new revision |
3) Runbook¶
Deploy baseline infrastructure¶
Prerequisites:
- Azure CLI with the Container Apps extension
- Permissions to deploy Container Apps resources
az extension add --name containerapp --upgrade
az login
export RG="rg-aca-lab-revprov"
export LOCATION="koreacentral"
az group create --name "$RG" --location "$LOCATION"
az deployment group create \
--name "lab-revprov" \
--resource-group "$RG" \
--template-file "./labs/revision-provisioning-failure/infra/main.bicep" \
--parameters baseName="labrevprov"
Expected output:
- Resource group creation succeeds.
- Deployment completes with
Succeededstate.
Capture deployment outputs¶
export APP_NAME="$(az deployment group show \
--resource-group "$RG" \
--name "lab-revprov" \
--query "properties.outputs.containerAppName.value" \
--output tsv)"
export ENVIRONMENT_NAME="$(az deployment group show \
--resource-group "$RG" \
--name "lab-revprov" \
--query "properties.outputs.environmentName.value" \
--output tsv)"
Verify baseline health¶
Expected output:
CreatedTime Active Replicas TrafficWeight HealthState ProvisioningState Name
------------------------- -------- ---------- --------------- ------------- ------------------- ---------------------------
2026-04-06T12:00:00+00:00 True 1 100 Healthy Provisioned ca-labrevprov-xxxxx--abc123
Trigger the failure¶
The trigger script adds a startup probe targeting a non-existent path:
az containerapp update \
--name "$APP_NAME" \
--resource-group "$RG" \
--set-env-vars "PROBE_TRIGGER=$(date +%s)" \
--container-name app \
--startup-probe-path "/nonexistent-health-endpoint" \
--startup-probe-port 80 \
--startup-probe-failure-threshold 3 \
--startup-probe-period-seconds 5
Observe the failure¶
Expected output shows the new revision in a non-Healthy state:
CreatedTime Active Replicas TrafficWeight HealthState ProvisioningState Name
------------------------- -------- ---------- --------------- ------------- ------------------- ---------------------------
2026-04-06T12:05:00+00:00 True 0 100 Degraded Provisioned ca-labrevprov-xxxxx--def456
2026-04-06T12:00:00+00:00 False 1 0 Healthy Provisioned ca-labrevprov-xxxxx--abc123
Check system logs for probe failures:
Expected log evidence:
Reason_s Log_s
------------------- -----------------------------------------------------------------
ProbeFailed Startup probe failed: HTTP probe failed with status code: 404
ContainerRestart Container 'app' was restarted
Fix the issue¶
Remove the bad probe configuration by deploying without the startup probe:
The verify script removes the startup probe and confirms recovery:
az containerapp update \
--name "$APP_NAME" \
--resource-group "$RG" \
--set-env-vars "PROBE_FIX=$(date +%s)" \
--container-name app \
--startup-probe-disabled
Verify recovery¶
Expected output:
4) Experiment Log¶
| Step | Action | Expected | Actual | Pass/Fail |
|---|---|---|---|---|
| 1 | Deploy baseline infrastructure | Deployment succeeds | ||
| 2 | Verify baseline health | Revision is Healthy | ||
| 3 | Run trigger.sh | New revision created with bad probe | ||
| 4 | Check revision list | New revision is Degraded/Failed | ||
| 5 | Check system logs | ProbeFailed events visible | ||
| 6 | Run verify.sh | Probe removed, new revision created | ||
| 7 | Verify recovery | Latest revision is Healthy |
Expected Evidence¶
During failure¶
| Evidence Source | Expected State |
|---|---|
az containerapp revision list | Latest revision shows Degraded or Failed |
az containerapp logs show --type system | ProbeFailed with 404 status code |
| Replica count | 0 or unstable |
After fix¶
| Evidence Source | Expected State |
|---|---|
az containerapp revision list | Latest revision shows Healthy |
| System logs | Normal startup events |
./verify.sh | PASS |
Clean Up¶
Related Playbook¶
See Also¶
- Probe and Port Mismatch Lab — covers port mismatch; this lab covers probe path mismatch
- Container Start Failure Playbook