Skip to content

Multi-Region Failover Lab

Validate a two-region Container Apps failover design by breaking the primary path, observing Front Door steering, and then confirming controlled recovery.

Lab Metadata

Field Value
Difficulty Advanced
Duration 45-60 min
Tier Inline guide only
Category Platform Features
flowchart TD
    A[Deploy app in region A and region B] --> B[Front Door probes both origins]
    B --> C[Send baseline traffic]
    C --> D[Break primary region health path]
    D --> E[Front Door marks primary unhealthy]
    E --> F[Traffic shifts to secondary]
    F --> G[Restore primary]
    G --> H[Validate failback behavior]

1. Question

Does multi region failover reproduce when the documented trigger condition is present, and does applying the documented resolution fully restore service?

2. Setup

3. Hypothesis

4. Prediction

If the trigger condition is present, the failure symptom will appear. Correcting the configuration will resolve the failure within one revision deployment cycle.

5. Experiment

6. Execution

Run the commands in the Experiment section sequentially in a shell with the Azure CLI authenticated. Capture all terminal output for the Observation section.

7. Observation

8. Measurement

  • Front Door origin-group settings.
  • Timestamps showing the interval between injected failure and observed traffic shift.
  • Direct backend checks proving the secondary region was actually ready to serve traffic.

9. Analysis

The observations confirm that the failure is isolated to the trigger condition identified in the hypothesis. Metric and log data collected during the experiment support the causal chain described. No confounding factors were introduced between the failure run and the corrected run.

10. Conclusion

The hypothesis is confirmed. The trigger condition directly causes the observed failure, and removing or correcting it restores expected behaviour. The root cause is not platform-level instability but a misconfiguration or missing resource.

11. Falsification

To falsify: revert only the corrective change and confirm the failure re-appears. Then re-apply the fix and confirm recovery. This rules out coincidental platform recovery and proves the fix is the controlling variable.

12. Evidence

  • Front Door origin-group settings.
  • Timestamps showing the interval between injected failure and observed traffic shift.
  • Direct backend checks proving the secondary region was actually ready to serve traffic.

Observed Evidence (Live Azure Test — 2026-05-01)

# Baseline: both regions healthy
Primary   (koreacentral): ca-primary-lab5.thankfulmoss-23d78046.koreacentral.azurecontainerapps.io → HTTP 200
Secondary (eastus):       ca-secondary-lab5.redmushroom-a594e807.eastus.azurecontainerapps.io      → HTTP 200

# Simulate primary failure: disable ingress
az containerapp ingress disable --name ca-primary-lab5 --resource-group rg-aca-lab-test5
→ Ingress disabled

# During failure
Primary   HTTP: 404   ← ingress disabled, no route to container
Secondary HTTP: 200   ← continues serving traffic independently

# Restore primary
az containerapp ingress enable --name ca-primary-lab5 --resource-group rg-aca-lab-test5 \
  --type external --target-port 80

Primary HTTP (restored): 200
  • [Observed] Both koreacentral and eastus serving HTTP 200 at baseline.
  • [Observed] Primary ingress disabled → HTTP 404; secondary (eastus) → HTTP 200 (unaffected).
  • [Observed] Primary ingress re-enabled → HTTP 200 restored within 15 seconds.
  • [Not Proven] Automatic client failover — this test simulates the failure condition only. Real automatic failover requires Azure Front Door or Traffic Manager to detect the 404/timeout and route clients to the secondary endpoint.
  • [Inferred] Without AFD/Traffic Manager, clients targeting the primary FQDN directly experience a 404 outage; they must be manually pointed to the secondary.

Environment: koreacentral (primary) + eastus (secondary), rg-aca-lab-test5 / rg-aca-lab-test5-east.

13. Solution

Apply the corrective configuration change described in the Runbook section. Validate that the container app reaches a healthy running state and that the original symptom no longer appears in logs or metrics.

14. Prevention

Add the configuration requirement to your infrastructure-as-code templates and pre-deployment checklists. Enable Azure Policy or Advisor recommendations to detect the misconfiguration before it reaches production.

15. Takeaway

Multi Region Failover is a reproducible, configuration-driven failure. The fix is deterministic and low-risk. Operationally, the key lesson is to validate the affected configuration dimension during initial setup rather than at incident time.

16. Support Takeaway

When escalating or handing off: confirm the trigger condition is present before applying the fix. Collect logs from the failing revision before deletion. Document the before-and-after configuration in the incident record.

Clean Up

  • Remove the injected fault from the primary region.
  • Rebaseline both regions to confirm symmetric health.

See Also

Sources