Container App Job Execution Failure¶

1. Summary¶

Symptom¶

Job execution state is Failed or TimedOut.
Scheduled jobs skip expected run windows.
Console logs are empty or show startup/auth errors.

Why this scenario is confusing¶

Container Apps Jobs can fail before the workload does useful work, fail near the timeout boundary, or retry without ever fixing the real dependency problem. It is easy to blame the cron trigger or the job platform when the actual issue is trigger metadata, timeout policy, or missing configuration.

Troubleshooting decision flow¶

flowchart TD
    A[Job does not complete successfully] --> B{Execution created?}
    B -->|No| C[Fix trigger schedule or event configuration]
    B -->|Yes| D{Timed out before workload end?}
    D -->|Yes| E[Increase timeout and tune retry policy]
    D -->|No| F[Fix runtime errors, secrets, or identity access]
    C --> G[Re-run job and verify success]
    E --> G
    F --> G

2. Common Misreadings¶

"Cron trigger is broken." Invalid schedule timezone or overlapping run constraints are common.
"Retries mean eventual success." Repeated retries can amplify downstream failures.

3. Competing Hypotheses¶

Hypothesis	Typical Evidence For	Typical Evidence Against
H1: Trigger configuration incorrect	No executions for expected schedule/event	Manual execution succeeds with same image
H2: Timeout too low	Execution stops near timeout boundary	Runtime completes within configured timeout
H3: Missing secret or environment dependency	Job logs show auth/config failures	All env and secret checks pass

4. What to Check First¶

Metrics¶

Job execution success ratio and retry count over time.

Logs¶

let AppName = "job-myapp";
ContainerAppSystemLogs_CL
| where ContainerAppName_s == AppName
| where Log_s has_any ("job", "execution", "timeout", "retry", "failed")
| project TimeGenerated, RevisionName_s, Log_s
| order by TimeGenerated desc

Platform Signals¶

az containerapp job execution list --name "$APP_NAME" --resource-group "$RG" --output table
az containerapp job show --name "$APP_NAME" --resource-group "$RG" --output json

5. Evidence to Collect¶

Required Evidence¶

Evidence	Command/Query	Purpose
Execution list	`az containerapp job execution list --name "$APP_NAME" --resource-group "$RG" --output table`	Confirm whether executions are being created and how they end
Job definition	`az containerapp job show --name "$APP_NAME" --resource-group "$RG" --output json`	Inspect trigger type, timeout, retry, and configuration
Execution details	`az containerapp job execution show --name "$APP_NAME" --resource-group "$RG" --job-execution-name "<execution-name>" --output json`	Capture detailed state for a failed or timed out run
System logs	`az containerapp logs show --name "$APP_NAME" --resource-group "$RG" --type system`	Check platform-side execution, timeout, and retry signals
Console logs	`az containerapp logs show --name "$APP_NAME" --resource-group "$RG" --type console`	Check workload-side startup, auth, or configuration failures
Job lifecycle KQL	KQL on `ContainerAppSystemLogs_CL`	Correlate execution events and failures over time

Useful Context¶

Trigger type (manual, schedule, event)
Expected runtime versus configured timeout
Retry policy and downstream side effects
Recent changes to secret references, identity permissions, or trigger metadata

Observed successful job lifecycle sequence (baseline):

SuccessfulCreate    → Successfully created pod for Job Execution
AssigningReplica    → Replica scheduled to run on a node
PullingImage        → Pulling image '<acr-name>.azurecr.io/myapp-job:v1.0.0'
PulledImage         → Successfully pulled image in 2.42s (58720256 bytes)
ContainerCreated    → Created container 'job-container'
ContainerStarted    → Started container 'job-container'
ContainerTerminated → Container terminated with exit code '0'
Completed           → Execution has successfully completed
PodDeletion         → Pod exited with status Succeeded

6. Validation and Disproof by Hypothesis¶

H1: Trigger configuration incorrect¶

Signals that support:

No executions for the expected schedule or event.
Manual execution succeeds with the same image.
Trigger schedule or event metadata does not align with the expected run window.

Signals that weaken:

Executions are created on schedule.
The same trigger metadata has been stable and successful.
Failures occur after the execution starts rather than before creation.

What to verify:

az containerapp job execution list --name "$APP_NAME" --resource-group "$RG" --output table
az containerapp job show --name "$APP_NAME" --resource-group "$RG" --output json

Disproof logic: If executions are being created at the expected times and the same workload still fails after startup, the trigger configuration is not the primary fault domain.

H2: Timeout too low¶

Signals that support:

Execution stops near the timeout boundary.
Workload consistently needs longer than the configured timeout.
Retries restart the same long-running work without completion.

Signals that weaken:

Runtime completes well within the configured timeout.
Failures happen immediately at startup.
Logs show configuration or authentication errors before useful work begins.

What to verify:

az containerapp job execution show --name "$APP_NAME" --resource-group "$RG" --job-execution-name "<execution-name>" --output json
az containerapp job show --name "$APP_NAME" --resource-group "$RG" --output json

let AppName = "job-myapp";
ContainerAppSystemLogs_CL
| where ContainerAppName_s == AppName
| where Log_s has_any ("job", "execution", "timeout", "retry", "failed")
| project TimeGenerated, RevisionName_s, Log_s
| order by TimeGenerated desc

Disproof logic: If failed executions terminate far earlier than the timeout or fail for a clear runtime reason, increasing timeout will not resolve the incident.

H3: Missing secret or environment dependency¶

Signals that support:

Job logs show auth or configuration failures.
Console logs are empty or fail immediately after container startup.
Secret references, env values, or identity access changed recently.

Signals that weaken:

All env and secret checks pass.
Manual and scheduled executions succeed with the same configuration.
No auth or config failures appear in logs.

What to verify:

az containerapp logs show --name "$APP_NAME" --resource-group "$RG" --type system
az containerapp logs show --name "$APP_NAME" --resource-group "$RG" --type console
az containerapp job show --name "$APP_NAME" --resource-group "$RG" --output json

Disproof logic: If secrets, environment values, and identity permissions are correct and the workload still fails only near the timeout boundary, focus on timeout or trigger behavior instead.

7. Likely Root Cause Patterns¶

Pattern	Frequency	First Signal	Typical Resolution
Trigger metadata drift	Common	No execution created when expected	Fix schedule, event configuration, or run constraints
Timeout set below real runtime	Common	Execution ends near timeout boundary	Increase timeout and tune retry policy
Missing secret or env reference	Common	Console logs show auth/config failure	Correct secret references and configuration
Identity permission issue	Occasional	Startup/auth failure in job logs	Fix managed identity access
Retry storm against broken dependency	Occasional	Repeated retries without useful work	Repair dependency and make retries safer

Container App Job Execution Failure¶

1. Summary¶

Symptom¶

Why this scenario is confusing¶

Troubleshooting decision flow¶

2. Common Misreadings¶

3. Competing Hypotheses¶

4. What to Check First¶

Metrics¶

Logs¶

Platform Signals¶

5. Evidence to Collect¶

Required Evidence¶

Useful Context¶

6. Validation and Disproof by Hypothesis¶

H1: Trigger configuration incorrect¶

H2: Timeout too low¶

H3: Missing secret or environment dependency¶

7. Likely Root Cause Patterns¶

8. Immediate Mitigations¶

9. Prevention¶

See Also¶

Sources¶