Operations¶
This section covers production operations for Azure Container Apps. It is language-agnostic and focuses on platform behavior, reliability, and cost control in running systems.
Variable naming in this section
Operations guides use production-style variable names (e.g., RG="rg-aca-prod") to reflect real operational contexts. Tutorial guides use demo-style names (e.g., RG="rg-aca-python-demo"). Substitute your own resource names as appropriate.
Prerequisites¶
- An existing Container Apps environment and app
- Azure CLI with Container Apps extension installed
- Permissions to view and update Container App resources
export RG="rg-aca-prod"
export APP_NAME="app-python-api-prod"
export ENVIRONMENT_NAME="aca-env-prod"
az extension add --name containerapp --upgrade
az account show --output table
Main Content¶
Operations Documents¶
| Document | Description |
|---|---|
| Deployment | CI/CD patterns, image build, registry authentication, production rollouts |
| Networking | VNet deployment, private endpoints, egress controls |
| Revision Management | Revision lifecycle, traffic splitting, rollback procedures |
| Monitoring | Log Analytics, metrics, distributed tracing, alerting |
| Scaling | KEDA scale rules, manual scaling, concurrency limits |
| Alerts | SLO-driven alerts for availability, latency, and resource usage |
| Image Pull and Registry | Private registry authentication, managed identity pull |
| Secret Rotation | Credential rotation without downtime |
| Recovery | Failed revision handling, replica restarts, regional failover |
Quick Operational Commands¶
az containerapp show --resource-group $RG --name $APP_NAME --output json
az containerapp restart --resource-group $RG --name $APP_NAME
az containerapp revision list --resource-group $RG --name $APP_NAME --output table
az containerapp logs show --resource-group $RG --name $APP_NAME --type system --follow
Verification Steps¶
Validate that the operations baseline is healthy before changing configuration.
az containerapp show \
--name "$APP_NAME" \
--resource-group "$RG" \
--query "{name:name,environmentId:properties.managedEnvironmentId,provisioningState:properties.provisioningState,runningStatus:properties.runningStatus}" \
--output json
Example output (PII masked):
{
"name": "app-python-api-prod",
"environmentId": "/subscriptions/<subscription-id>/resourceGroups/rg-aca-prod/providers/Microsoft.App/managedEnvironments/aca-env-prod",
"provisioningState": "Succeeded",
"runningStatus": "Running"
}
Operations Control Loop¶
Use a repeatable control loop so every operational change is observable, reversible, and documented.
flowchart LR
A[Baseline Health Check] --> B[Apply Change]
B --> C[Validate Revision and Replicas]
C --> D[Observe Metrics and Logs]
D --> E{SLO Healthy?}
E -->|Yes| F[Record Outcome and Close]
E -->|No| G[Rollback and Escalate]
G --> A Operational Cadence Matrix¶
| Cadence | Primary Goal | Required Commands | Exit Criteria |
|---|---|---|---|
| Per deployment | Prevent bad revision promotion | az containerapp revision list, az containerapp ingress traffic show | New revision healthy with expected traffic |
| Daily | Catch platform drift early | az containerapp show, az containerapp env show | Running status is healthy and config matches baseline |
| Weekly | Validate scale and alert posture | az monitor metrics list, az monitor scheduled-query list | Alerts are enabled and thresholds are current |
| Monthly | Recovery readiness | rollback simulation + runbook review | Recovery target time met in drill |
Use pre-change and post-change snapshots
Capture key fields before and after updates (ingress, scale rules, identity, revision mode). A small JSON diff dramatically reduces incident triage time.
Treat operations changes as production releases
Any update to scale rules, traffic weights, ingress, secrets, or identity can change customer impact immediately. Always run health verification and rollback checks after each change.
Baseline Snapshot Commands¶
az containerapp show \
--name "$APP_NAME" \
--resource-group "$RG" \
--query "{name:name,latestRevision:properties.latestRevisionName,provisioningState:properties.provisioningState,runningStatus:properties.runningStatus}" \
--output json
az containerapp ingress traffic show \
--name "$APP_NAME" \
--resource-group "$RG" \
--output json
Operational Ownership Checklist¶
| Capability | Primary Owner | Backup Owner | Evidence |
|---|---|---|---|
| Deployment and rollback | Application team | Platform team | Successful revision promotion logs |
| Alert tuning | SRE/operations | Application team | Alert history and threshold review notes |
| Secret rotation | Security/app owner | Operations | Rotation runbook and audit log entries |
| Recovery drills | Operations lead | Incident commander | Quarterly drill report |
Keep runbooks co-located with services
Store the operational runbook path in each service repository so on-call engineers can find the latest rollback and recovery guidance without context switching.
Operational readiness minimum:
- Every production app has a tested rollback command set.
- Every sev1 alert maps to a named incident owner.
Advanced Topics¶
- Build an SLO-based operating model mapping each control to measurable service outcomes.
- Keep runbooks and IaC synchronized so recovery steps are deterministic during incidents.
- Validate production controls regularly through game days and restore exercises.
Language-Specific Details¶
For language-specific operational guidance, see: - Python Guide