Skip to content

First 10 Minutes: Quick Triage Checklist

Use this ordered checklist when a Container App is down, unhealthy, or unreachable. Run each step in sequence and stop when you find the first confirmed failure.

flowchart TD
    START["App Down / Unhealthy"] --> R["1) Revision Status"]
    R --> REP["2) Replica Status"]
    REP --> LOG["3) Container Logs"]
    LOG --> IMG["4) Image Pull"]
    IMG --> ING["5) Ingress Config"]
    ING --> PROBE["6) Health Probes"]
    PROBE --> REGAUTH["7) Registry Auth"]
    REGAUTH --> SEC["8) Secrets and Config"]
    SEC --> NET["9) Environment and Network"]
    NET --> DEP["10) Dependencies"]

Run from a clean shell session

Export variables once to avoid command mistakes:

RG="rg-myapp"
APP_NAME="ca-myapp"
ENVIRONMENT_NAME="cae-myapp"
ACR_NAME="acrmyapp"

1) Revision Status

az containerapp show --name "$APP_NAME" --resource-group "$RG" --query "properties.provisioningState" --output tsv

Expected baseline from a healthy deployment:

Succeeded
az containerapp revision list --name "$APP_NAME" --resource-group "$RG" --query "[].{name:name,active:properties.active,health:properties.healthState,running:properties.runningState,created:properties.createdTime}" --output table

Observed output pattern:

Name               Active    Health    Running    Created
-----------------  --------  --------  ---------  -------------------------
ca-myapp--0000001  True      Healthy   Running    2026-04-04T11:30:41+00:00
  • Look for the latest revision with health=Healthy and running=Running.
  • Failure patterns: Provisioning failed, Failed, Degraded, inactive latest revision.
  • If failed → go to Revision Provisioning Failure.

2) Replica Status

az containerapp replica list --name "$APP_NAME" --resource-group "$RG" --query "[].{replica:name,runningState:properties.runningState,created:properties.createdTime}" --output table

Observed output pattern:

Replica                                RunningState    Created
-------------------------------------  --------------  -------------------------
ca-myapp--0000001-646779b4c5-bhc2v     Running         2026-04-04T11:30:52+00:00
  • Look for replicas that remain in Running state.
  • Failure patterns: repeated short-lived replicas, no replicas created, restart loops.
  • If failed → go to Container Start Failure.

3) Container Logs

az containerapp logs show --name "$APP_NAME" --resource-group "$RG" --type console --tail 50

For continuous streaming, add --follow and press Ctrl+C to exit.

Observed healthy startup console sequence (Gunicorn):

Starting application...
PORT=8000
Workers=auto
[2026-04-04 11:30:53 +0000] [7] [INFO] Starting gunicorn 25.3.0
[2026-04-04 11:30:53 +0000] [7] [INFO] Listening at: http://0.0.0.0:8000 (7)
[2026-04-04 11:30:53 +0000] [7] [INFO] Using worker: sync
[2026-04-04 11:30:54 +0000] [8] [INFO] Booting worker with pid: 8
  • Look for Python traceback, startup command failures, bind errors, missing configuration.
  • Failure patterns: ModuleNotFoundError, Address already in use, connection refused, crash loops.
  • If failed → go to Container Start Failure.

4) Image Pull

az containerapp logs show --name "$APP_NAME" --resource-group "$RG" --type system
az acr repository show-tags --name "$ACR_NAME" --repository "$APP_NAME" --output table

Observed pull success pattern:

TimeGenerated              Reason_s      Log_s
-------------------------  ------------  ---------------------------------------------------------------
2026-04-04T12:54:11.477Z   PullingImage  Pulling image '<acr-name>.azurecr.io/myapp:v1.0.0'
2026-04-04T12:54:11.477Z   PulledImage   Successfully pulled image in 2.42s. Image size: 58720256 bytes.
  • Confirm image tag exists and system logs do not show pull/auth errors.
  • Failure patterns: ImagePullBackOff, manifest unknown, unauthorized, denied.
  • If failed → go to Image Pull Failure.

5) Ingress Configuration

az containerapp show --name "$APP_NAME" --resource-group "$RG" --query "properties.configuration.ingress" --output json
  • Confirm external setting matches your access model and targetPort matches app listening port.
  • Failure patterns: ingress disabled, wrong targetPort, internal app tested from public internet.
  • If failed → go to Ingress Not Reachable.

6) Health Probes

az containerapp show --name "$APP_NAME" --resource-group "$RG" --query "properties.template.containers[0].probes" --output json
  • Confirm liveness/readiness probe paths and ports are valid; startup probe timeout fits app boot time.
  • Failure patterns: probe path returns 404/500, startup timeout too short, wrong probe port.
  • If failed → go to Probe Failure and Slow Start.

Probe defaults can still fail

Apps with migrations, cold dependency checks, or large model loads often need a longer startup probe window.

7) Registry Authentication

az containerapp show --name "$APP_NAME" --resource-group "$RG" --query "identity" --output json
az role assignment list --scope "$(az acr show --name "$ACR_NAME" --query id --output tsv)" --assignee "$(az containerapp show --name "$APP_NAME" --resource-group "$RG" --query identity.principalId --output tsv)" --output table
  • Confirm managed identity exists and has AcrPull role on the registry scope.
  • Failure patterns: no principal ID, missing AcrPull, ACR firewall blocks environment egress.
  • If failed → go to Managed Identity Auth Failure and Image Pull Failure.

8) Secrets and Config

az containerapp secret list --name "$APP_NAME" --resource-group "$RG"
az containerapp show --name "$APP_NAME" --resource-group "$RG" --query "properties.template.containers[0].env" --output json

9) Environment and Network

az containerapp env show --name "$ENVIRONMENT_NAME" --resource-group "$RG" --output json
az network private-endpoint list --resource-group "$RG" --output table
  • Confirm environment is healthy and network dependencies (private DNS/private endpoints) are correctly configured.
  • Failure patterns: DNS resolution failures, blocked NSG outbound rules, missing private DNS link.
  • If failed → go to Internal DNS and Private Endpoint Failure.

10) Dependencies

az containerapp exec --name "$APP_NAME" --resource-group "$RG" --command "python -c 'import socket; print(socket.gethostbyname(\"example.database.windows.net\"))'"

Escalate with Context

Observed healthy system lifecycle sequence for reference:

ContainerAppUpdate    → Updating containerApp: ca-myapp
RevisionCreation      → Creating new revision
PullingImage          → Pulling image '<acr-name>.azurecr.io/myapp:v1.0.0'
PulledImage           → Successfully pulled image in 2.42s (58720256 bytes)
ContainerCreated      → Created container 'ca-myapp'
ContainerStarted      → Started container 'ca-myapp'
ProbeFailed (Warning) → Probe of StartUp failed (multiple times during startup)
RevisionReady         → Revision ready
ContainerAppReady     → Running state reached

If the checklist does not isolate root cause, continue with Troubleshooting Methodology and include:

  • failing revision name
  • exact error text from system/console logs
  • ingress mode and target port
  • dependency endpoint(s) that failed

See Also

Sources