Skip to content

Operations

This section covers production operations for Azure Container Apps. It is language-agnostic and focuses on platform behavior, reliability, and cost control in running systems.

Variable naming in this section

Operations guides use production-style variable names (e.g., RG="rg-aca-prod") to reflect real operational contexts. Tutorial guides use demo-style names (e.g., RG="rg-aca-python-demo"). Substitute your own resource names as appropriate.

Prerequisites

  • An existing Container Apps environment and app
  • Azure CLI with Container Apps extension installed
  • Permissions to view and update Container App resources
export RG="rg-aca-prod"
export APP_NAME="app-python-api-prod"
export ENVIRONMENT_NAME="aca-env-prod"

az extension add --name containerapp --upgrade
az account show --output table

Main Content

Operations Documents

Document Description
Deployment CI/CD patterns, image build, registry authentication, production rollouts
Networking VNet deployment, private endpoints, egress controls
Revision Management Revision lifecycle, traffic splitting, rollback procedures
Monitoring Log Analytics, metrics, distributed tracing, alerting
Scaling KEDA scale rules, manual scaling, concurrency limits
Alerts SLO-driven alerts for availability, latency, and resource usage
Image Pull and Registry Private registry authentication, managed identity pull
Secret Rotation Credential rotation without downtime
Recovery Failed revision handling, replica restarts, regional failover

Quick Operational Commands

az containerapp show --resource-group $RG --name $APP_NAME --output json
az containerapp restart --resource-group $RG --name $APP_NAME
az containerapp revision list --resource-group $RG --name $APP_NAME --output table
az containerapp logs show --resource-group $RG --name $APP_NAME --type system --follow

Verification Steps

Validate that the operations baseline is healthy before changing configuration.

az containerapp show \
  --name "$APP_NAME" \
  --resource-group "$RG" \
  --query "{name:name,environmentId:properties.managedEnvironmentId,provisioningState:properties.provisioningState,runningStatus:properties.runningStatus}" \
  --output json

Example output (PII masked):

{
  "name": "app-python-api-prod",
  "environmentId": "/subscriptions/<subscription-id>/resourceGroups/rg-aca-prod/providers/Microsoft.App/managedEnvironments/aca-env-prod",
  "provisioningState": "Succeeded",
  "runningStatus": "Running"
}

Operations Control Loop

Use a repeatable control loop so every operational change is observable, reversible, and documented.

flowchart LR
    A[Baseline Health Check] --> B[Apply Change]
    B --> C[Validate Revision and Replicas]
    C --> D[Observe Metrics and Logs]
    D --> E{SLO Healthy?}
    E -->|Yes| F[Record Outcome and Close]
    E -->|No| G[Rollback and Escalate]
    G --> A

Operational Cadence Matrix

Cadence Primary Goal Required Commands Exit Criteria
Per deployment Prevent bad revision promotion az containerapp revision list, az containerapp ingress traffic show New revision healthy with expected traffic
Daily Catch platform drift early az containerapp show, az containerapp env show Running status is healthy and config matches baseline
Weekly Validate scale and alert posture az monitor metrics list, az monitor scheduled-query list Alerts are enabled and thresholds are current
Monthly Recovery readiness rollback simulation + runbook review Recovery target time met in drill

Use pre-change and post-change snapshots

Capture key fields before and after updates (ingress, scale rules, identity, revision mode). A small JSON diff dramatically reduces incident triage time.

Treat operations changes as production releases

Any update to scale rules, traffic weights, ingress, secrets, or identity can change customer impact immediately. Always run health verification and rollback checks after each change.

Baseline Snapshot Commands

az containerapp show \
  --name "$APP_NAME" \
  --resource-group "$RG" \
  --query "{name:name,latestRevision:properties.latestRevisionName,provisioningState:properties.provisioningState,runningStatus:properties.runningStatus}" \
  --output json

az containerapp ingress traffic show \
  --name "$APP_NAME" \
  --resource-group "$RG" \
  --output json

Operational Ownership Checklist

Capability Primary Owner Backup Owner Evidence
Deployment and rollback Application team Platform team Successful revision promotion logs
Alert tuning SRE/operations Application team Alert history and threshold review notes
Secret rotation Security/app owner Operations Rotation runbook and audit log entries
Recovery drills Operations lead Incident commander Quarterly drill report

Keep runbooks co-located with services

Store the operational runbook path in each service repository so on-call engineers can find the latest rollback and recovery guidance without context switching.

Operational readiness minimum:

  • Every production app has a tested rollback command set.
  • Every sev1 alert maps to a named incident owner.

Advanced Topics

  • Build an SLO-based operating model mapping each control to measurable service outcomes.
  • Keep runbooks and IaC synchronized so recovery steps are deterministic during incidents.
  • Validate production controls regularly through game days and restore exercises.

Language-Specific Details

For language-specific operational guidance, see: - Python Guide

See Also

Sources