Container Apps Jobs¶

Azure Container Apps Jobs run finite background work instead of continuously serving traffic. Use jobs when execution has a clear start and finish, such as scheduled cleanup, data import, queue processing, and file transformation.

What are Jobs?¶

A Container App is optimized for long-running API or worker services. A Container Apps Job is optimized for one execution unit that can complete, fail, and retry according to policy.

Use jobs when:

Work is bounded and can be retried safely.
You need manual, scheduled, or event-driven triggers.
Scale behavior is tied to execution count, not HTTP traffic.

Use apps when:

You need always-on request handling.
You expose an ingress endpoint to clients.
You maintain long-lived process state in memory.

Jobs must be idempotent

Retries and parallel executions can reprocess the same work item. Design input handling so duplicate execution does not corrupt state.

Execution Models¶

Manual trigger¶

Manual jobs are ideal for one-off tasks: backfills, schema migrations, reprocessing failed records, or operator-driven maintenance.

az containerapp job start \
  --name "$JOB_NAME" \
  --resource-group "$RG"

Scheduled¶

Scheduled jobs run by cron expression and are useful for recurring operations such as nightly compaction, report generation, and stale artifact cleanup.

az containerapp job create \
  --name "$JOB_NAME" \
  --resource-group "$RG" \
  --environment "$ENVIRONMENT_NAME" \
  --trigger-type "Schedule" \
  --cron-expression "0 */6 * * *" \
  --image "$ACR_NAME.azurecr.io/python-job:v1"

Event-driven¶

Event-driven jobs react to external signals such as queue depth or blob events. This model is suited for asynchronous throughput pipelines where scale and cost efficiency matter.

az containerapp job create \
  --name "$JOB_NAME" \
  --resource-group "$RG" \
  --environment "$ENVIRONMENT_NAME" \
  --trigger-type "Event" \
  --scale-rule-name "queue-processor" \
  --scale-rule-type "azure-servicebus" \
  --scale-rule-metadata "queueName=jobs" "messageCount=10" "namespace=<servicebus-namespace>.servicebus.windows.net" \
  --image "$ACR_NAME.azurecr.io/python-job:v1"

App vs Job Decision Matrix¶

Decision area	Container App	Container Apps Job
Primary workload	API/service traffic	Batch or async task execution
Lifetime model	Long-running process	Finite run with completion
Trigger model	HTTP/event-driven scaling	Manual, cron, event trigger
Ingress requirement	Common	Usually none
Retry behavior	App-level logic	Job execution retry policy
Cost profile	Baseline runtime + scale	Pay during execution windows
Best examples	Public API, internal service	ETL, cleanup, periodic reporting

Configuration Patterns¶

Timeout and retry settings¶

Set --replica-timeout based on worst-case execution plus headroom.
Use --replica-retry-limit for transient failures only.
Ensure your code is idempotent before increasing retry counts.

Parallelism and replica completion¶

--parallelism controls concurrent replicas for one execution.
--replica-completion-count controls how many successful replicas mark completion.
Start with conservative values, then scale after verifying external dependency limits.

Trigger configuration¶

Manual: prefer for operator-controlled execution.
Scheduled: use UTC cron and document business timezone assumptions.
Event-driven: verify scale rule metadata and identity permissions to event source.

Identity and Secrets¶

Jobs use the same identity patterns as apps:

Prefer user-assigned or system-assigned managed identity.
Assign least-privilege RBAC to data stores and messaging services.
Store configuration in environment variables and sensitive values in secrets.
Use Key Vault references for centralized secret lifecycle management.

Monitoring Job Executions¶

Track executions and troubleshoot failures with Azure CLI:

az containerapp job execution list \
  --name "$JOB_NAME" \
  --resource-group "$RG" \
  --output table

az containerapp job logs show \
  --name "$JOB_NAME" \
  --resource-group "$RG"

Monitor these signals:

Execution success/failure ratio
Retry count trend
Duration distribution (p50/p95)
Dependency-specific errors (authentication, throttling, timeout)

Set timeouts from measured runtime

Start from observed p95 execution duration and add headroom. Avoid unlimited timeout behavior that hides stuck executions.

Common Patterns¶

Data processing pipeline¶

Ingest file or queue item, transform it, then write canonical output. Keep each step idempotent so retries are safe.

Scheduled cleanup¶

Run daily cleanup to remove expired blobs, temporary records, or stale artifacts. Add dry-run mode for safe validation.

Event-driven media processing¶

Trigger on new uploads, transcode or enrich content, and emit completion metadata for downstream consumers.

Job Lifecycle¶

flowchart LR
    A[Trigger] --> B[Execution Created]
    B --> C[Replica Starts]
    C --> D{Result}
    D -->|Success| E[Completed]
    D -->|Failure| F{Retry Limit Reached?}
    F -->|No| C
    F -->|Yes| G[Failed]