Skip to content

Jobs Operations

This guide covers day-2 operations for Container Apps Jobs: listing executions, drilling into failures, replaying work, and tracking job health over time.

Prerequisites

  • Azure Container Apps environment and Job already deployed
  • Azure CLI and Container Apps extension available in the operator workstation
  • Log Analytics workspace connected to the environment for longer-term history
export RG="rg-aca-prod"
export JOB_NAME="job-orders-reconcile"
export EXECUTION_NAME="job-orders-reconcile-abc123"
export WORKSPACE_ID="<log-analytics-workspace-id>"

When to Use

Use this runbook when you need to:

  • inspect the most recent Job executions
  • stop a bad execution
  • replay a failed run
  • answer whether success rate, duration, or retry behavior is drifting

Procedure

1. List recent executions

az containerapp job execution list \
  --name "$JOB_NAME" \
  --resource-group "$RG" \
  --output table

2. Inspect a specific execution

az containerapp job execution show \
  --name "$JOB_NAME" \
  --resource-group "$RG" \
  --job-execution-name "$EXECUTION_NAME" \
  --output json

3. Stop an in-flight execution when needed

az containerapp job execution stop \
  --name "$JOB_NAME" \
  --resource-group "$RG" \
  --job-execution-name "$EXECUTION_NAME"

Confirm CLI command availability against your installed extension

The execution list, show, and stop patterns above reflect the expected long-form command group for current Container Apps Jobs operations. Verify them against the Container Apps extension version you run in production before codifying them in automation.

4. Replay a failed execution manually

Replay starts a new execution from the same job definition.

az containerapp job start \
  --name "$JOB_NAME" \
  --resource-group "$RG"

Before replaying, confirm whether you also need to:

  • requeue or unlock an input item
  • clean up partial output from the failed run
  • reduce parallelism or retries for the replay window

5. Query logs for a job or execution

When your workspace schema is known, filter directly on the job and execution fields. If schema differs across workspaces, use a defensive query that tolerates different column names.

let TargetJob = "job-orders-reconcile";
let TargetExecution = "job-orders-reconcile-abc123";
ContainerAppSystemLogs_CL
| extend JobName = tostring(column_ifexists("JobName_s", column_ifexists("ContainerAppName_s", "")))
| extend ExecutionName = tostring(column_ifexists("ExecutionName_s", column_ifexists("ExecutionId_g", "")))
| where JobName == TargetJob
| where isempty(TargetExecution) or ExecutionName == TargetExecution
| project TimeGenerated, JobName, ExecutionName, Reason=tostring(column_ifexists("Reason_s", "")), Log=tostring(column_ifexists("Log_s", ""))
| order by TimeGenerated desc

Exact Log Analytics column names vary by workspace schema

Existing repository KQL examples use JobName_s and ExecutionName_s in ContainerAppSystemLogs_CL. Re-check the actual columns in your workspace before you build dashboards or alerts around a fixed schema.

6. Track success rate, duration, and retry activity

Success and failure trend:

ContainerAppSystemLogs_CL
| extend JobName = tostring(column_ifexists("JobName_s", column_ifexists("ContainerAppName_s", "")))
| extend Reason = tostring(column_ifexists("Reason_s", ""))
| where JobName == "job-orders-reconcile"
| where Reason in ("Completed", "Failed")
| summarize Executions=count() by Reason, bin(TimeGenerated, 1h)
| order by TimeGenerated asc

Retry activity trend:

ContainerAppSystemLogs_CL
| extend JobName = tostring(column_ifexists("JobName_s", column_ifexists("ContainerAppName_s", "")))
| extend Reason = tostring(column_ifexists("Reason_s", ""))
| where JobName == "job-orders-reconcile"
| where Reason has "Retry"
| summarize RetryEvents=count() by bin(TimeGenerated, 1h)
| order by TimeGenerated asc

Duration example from structured application logs:

ContainerAppConsoleLogs_CL
| extend Payload = parse_json(Log_s)
| extend ExecutionName = tostring(Payload.execution_name)
| extend DurationMs = todouble(Payload.duration_ms)
| where tostring(Payload.message) == "Job execution completed"
| summarize P50Ms=percentile(DurationMs, 50), P95Ms=percentile(DurationMs, 95), MaxMs=max(DurationMs) by bin(TimeGenerated, 1h)
| order by TimeGenerated asc

Verification

Use the control loop below after any replay or stop action.

flowchart TD
    A[List executions] --> B[Inspect failed or long-running execution]
    B --> C{Stop or replay needed?}
    C -->|Stop| D[Stop execution]
    C -->|Replay| E[Start new execution]
    D --> F[Query logs and metrics]
    E --> F
    F --> G[Confirm success rate and duration return to baseline]

Basic verification commands:

az containerapp job execution list \
  --name "$JOB_NAME" \
  --resource-group "$RG" \
  --output table

Rollback / Troubleshooting

  • If a replay starts reprocessing bad input, stop it and quarantine the input item.
  • If failures are data-dependent, reduce retries and use the dead-letter path instead of repeated replay.
  • If logs are insufficient, update the job image to emit explicit execution correlation fields before the next incident.

Use Jobs Troubleshooting for symptom-based triage.

See Also

Sources