Skip to content

Revision Failover and Rollback Lab

Practice safe rollback by intentionally creating an unhealthy revision and routing traffic back to a healthy one.

Lab Metadata

Attribute Value
Difficulty Intermediate
Estimated Duration 20-30 minutes
Tier Consumption
Failure Mode Latest revision unhealthy after ingress target port is changed to the wrong value
Skills Practiced Revision management, rollback, traffic shifting, system log analysis

1) Background

This lab starts with a healthy revision, then introduces a wrong ingress target port on a new revision. In multi-revision mode, rollback is primarily a traffic decision: keep a healthy revision available and shift traffic away from the bad one while you correct the misconfiguration.

Traffic shifting is usually faster than rebuilding during an incident, but it only works if at least one known-good revision remains healthy.

Architecture

flowchart LR
    A[Revision N healthy] --> B[Deploy revision N+1 with wrong target port]
    B --> C[Revision N+1 becomes unhealthy]
    C --> D[Requests fail or return 5xx]
    D --> E[Route traffic back to revision N]
    E --> F[Service stabilized]

2) Hypothesis

IF a new revision is created with ingress targetPort changed from 8000 to 9999, THEN the latest revision will become non-healthy while a previous healthy revision can still receive traffic after rollback.

Variable Control State Experimental State
Active revisions mode Multiple revisions enabled Multiple revisions enabled
Latest revision target port 8000 9999
Latest revision health Healthy Non-Healthy
Traffic routing outcome Stable on healthy revision Requires traffic reassignment to healthy revision

3) Runbook

Deploy baseline infrastructure

export RG="rg-aca-lab-revision"
export LOCATION="koreacentral"

az extension add --name containerapp --upgrade
az login

az group create --name "$RG" --location "$LOCATION"

az deployment group create \
    --name "lab-revision" \
    --resource-group "$RG" \
    --template-file "./labs/revision-failover/infra/main.bicep" \
    --parameters baseName="labrevision"

Expected output pattern: deployment shows Succeeded.

Capture deployment outputs

export APP_NAME="$(az deployment group show \
    --resource-group "$RG" \
    --name "lab-revision" \
    --query "properties.outputs.containerAppName.value" \
    --output tsv)"

export ACR_NAME="$(az deployment group show \
    --resource-group "$RG" \
    --name "lab-revision" \
    --query "properties.outputs.containerRegistryName.value" \
    --output tsv)"

export ENVIRONMENT_NAME="$(az deployment group show \
    --resource-group "$RG" \
    --name "lab-revision" \
    --query "properties.outputs.environmentName.value" \
    --output tsv)"

Expected output: no output; variables are set.

Confirm baseline healthy revision

az containerapp revision list --name "$APP_NAME" --resource-group "$RG" --output table

Expected output pattern:

Name               Active    TrafficWeight    HealthState
-----------------  --------  ---------------  -----------
ca-myapp--0000001  True      100              Healthy

Trigger the bad rollout

./labs/revision-failover/trigger.sh

The trigger script performs these actions:

az acr build --registry "$ACR_NAME" --image "${APP_NAME}:v1" ./workload

az containerapp update \
    --name "$APP_NAME" \
    --resource-group "$RG" \
    --image "${ACR_LOGIN_SERVER}/${APP_NAME}:v1" \
    --target-port 8000 \
    --registry-server "$ACR_LOGIN_SERVER" \
    --registry-username "$ACR_USERNAME" \
    --registry-password "$ACR_PASSWORD"

sleep 40

az containerapp update --name "$APP_NAME" --resource-group "$RG" --target-port 9999
sleep 40

az containerapp revision list --name "$APP_NAME" --resource-group "$RG" --output table
az containerapp logs show --name "$APP_NAME" --resource-group "$RG" --type system --tail 20

Expected output: a new revision appears with unhealthy status and system logs show probe or connection failures related to the wrong target port.

Investigate the failure signal

az containerapp logs show \
    --name "$APP_NAME" \
    --resource-group "$RG" \
    --type system

Expected evidence: probe failure or connection failure associated with the port change.

Roll traffic back to a healthy revision

export HEALTHY_REVISION="$(az containerapp revision list \
    --name "$APP_NAME" \
    --resource-group "$RG" \
    --query "sort_by([?properties.healthState=='Healthy'].{name:name,created:properties.createdTime}, &created)[-1].name" \
    --output tsv)"

az containerapp ingress traffic set \
    --name "$APP_NAME" \
    --resource-group "$RG" \
    --revision-weight "${HEALTHY_REVISION}=100"

Expected output: traffic update succeeds and the healthy revision handles requests.

Restore the correct target port and verify stabilization

./labs/revision-failover/verify.sh

The verify script confirms the latest revision is unhealthy, finds a healthy revision for rollback, then runs:

az containerapp ingress traffic set --name "$APP_NAME" --resource-group "$RG" --revision-weight "${HEALTHY_REVISION}=100"
az containerapp update --name "$APP_NAME" --resource-group "$RG" --target-port 8000
sleep 40
az containerapp revision list --name "$APP_NAME" --resource-group "$RG" --query "sort_by([].{name:name,created:properties.createdTime,health:properties.healthState}, &created)[-1].health" --output tsv

Expected output pattern:

RevisionUpdate        → New revision updated
RevisionDeactivating  → Prior bad revision deactivated
RevisionReady         → Stable revision ready
ContainerAppReady     → Running state reached

4) Experiment Log

Step Action Expected Actual Pass/Fail
1 Deploy baseline Single healthy revision
2 Capture outputs Variables populated
3 Run trigger.sh New unhealthy revision appears
4 Review system logs Port or probe failure evidence appears
5 Shift traffic to healthy revision Healthy revision serves traffic
6 Run verify.sh Corrected revision becomes healthy

Expected Evidence

Evidence Source Expected State
az containerapp revision list --name "$APP_NAME" --resource-group "$RG" --output table Healthy baseline revision exists before trigger; latest revision becomes non-healthy after targetPort changes to 9999
az containerapp logs show --name "$APP_NAME" --resource-group "$RG" --type system Probe failure or connection failure related to wrong target port
az containerapp ingress traffic set --name "$APP_NAME" --resource-group "$RG" --revision-weight "${HEALTHY_REVISION}=100" Traffic can be restored to a healthy revision without rebuilding first
./labs/revision-failover/verify.sh Rollback path succeeds and latest post-fix revision health improves

Clean Up

az group delete --name "$RG" --yes --no-wait

See Also

Sources