Disaster Recovery¶
Azure Container Apps disaster recovery is an architecture decision rather than a single platform toggle: you must decide whether zonal resilience, multi-region deployment, or both are required for the workload.
Prerequisites¶
- A documented RTO and RPO for the workload
- Infrastructure as Code for recreating environments in multiple regions
- Replication plans for stateful dependencies such as databases, secrets, and images
export PRIMARY_LOCATION="eastus"
export SECONDARY_LOCATION="centralus"
export FRONT_DOOR_NAME="afd-aca-prod"
When to Use¶
- When a single-region outage exceeds workload tolerance
- When you need a documented active-active or active-passive pattern
- When you need a clear distinction between zone redundancy and region failover
Procedure¶
- Decide whether the failure domain to address is zone, region, or both.
- Keep the application stateless where possible.
- Replicate backend dependencies before assuming traffic failover is safe.
- Put a global routing layer in front of regional environments if region failover is required.
Common production pattern:
- one Container Apps environment per region
- one app deployment per region
- Azure Front Door or Traffic Manager in front
- replicated backend state such as databases, Key Vault strategy, and ACR replication
Microsoft Learn explicitly documents that "Container Apps is a single-region service. If the region becomes unavailable, your environment and apps are also unavailable." For multi-region resiliency, Microsoft Learn directs you to deploy environments across multiple regions and "configure load balancing and failover policies by using a service like Azure Front Door or Azure Traffic Manager."
flowchart TD
A[Client] --> B[Azure Front Door]
B --> C[Primary region ACA environment]
B --> D[Secondary region ACA environment]
C --> E[Replicated backend services]
D --> E Verification¶
- Confirm regional deployments are functionally equivalent.
- Confirm the global routing layer can stop sending traffic to an unhealthy region.
- Confirm backend state replication meets the workload RPO.
Rollback / Troubleshooting¶
- If failover is manual, document exact operator actions and DNS impact.
- If a failed region remains in rotation, inspect global health probe and origin status.
- If backend state diverges, delay failback until data is reconciled.