Revision Failover and Rollback Lab¶

Practice safe rollback by intentionally creating an unhealthy revision and routing traffic back to a healthy one.

Lab Metadata¶

Attribute	Value
Difficulty	Intermediate
Estimated Duration	20-30 minutes
Tier	Consumption
Failure Mode	Latest revision unhealthy after ingress target port is changed to the wrong value
Skills Practiced	Revision management, rollback, traffic shifting, system log analysis

1) Background¶

This lab starts with a healthy revision, then introduces a wrong ingress target port on a new revision. In multi-revision mode, rollback is primarily a traffic decision: keep a healthy revision available and shift traffic away from the bad one while you correct the misconfiguration.

Traffic shifting is usually faster than rebuilding during an incident, but it only works if at least one known-good revision remains healthy.

Architecture¶

flowchart LR
    A[Revision N healthy] --> B[Deploy revision N+1 with wrong target port]
    B --> C[Revision N+1 becomes unhealthy]
    C --> D[Requests fail or return 5xx]
    D --> E[Route traffic back to revision N]
    E --> F[Service stabilized]

2) Hypothesis¶

IF a new revision is created with ingress targetPort changed from 8000 to 9999, THEN the latest revision will become non-healthy while a previous healthy revision can still receive traffic after rollback.

Variable	Control State	Experimental State
Active revisions mode	Multiple revisions enabled	Multiple revisions enabled
Latest revision target port	`8000`	`9999`
Latest revision health	`Healthy`	Non-`Healthy`
Traffic routing outcome	Stable on healthy revision	Requires traffic reassignment to healthy revision

3) Runbook¶

Deploy baseline infrastructure¶

export RG="rg-aca-lab-revision"
export LOCATION="koreacentral"

az extension add --name containerapp --upgrade
az login

az group create --name "$RG" --location "$LOCATION"

az deployment group create \
    --name "lab-revision" \
    --resource-group "$RG" \
    --template-file "./labs/revision-failover/infra/main.bicep" \
    --parameters baseName="labrevision"

Expected output pattern: deployment shows Succeeded.

Capture deployment outputs¶

export APP_NAME="$(az deployment group show \
    --resource-group "$RG" \
    --name "lab-revision" \
    --query "properties.outputs.containerAppName.value" \
    --output tsv)"

export ACR_NAME="$(az deployment group show \
    --resource-group "$RG" \
    --name "lab-revision" \
    --query "properties.outputs.containerRegistryName.value" \
    --output tsv)"

export ENVIRONMENT_NAME="$(az deployment group show \
    --resource-group "$RG" \
    --name "lab-revision" \
    --query "properties.outputs.environmentName.value" \
    --output tsv)"

Expected output: no output; variables are set.

Confirm baseline healthy revision¶

az containerapp revision list --name "$APP_NAME" --resource-group "$RG" --output table

Expected output pattern:

Name               Active    TrafficWeight    HealthState
-----------------  --------  ---------------  -----------
ca-myapp--0000001  True      100              Healthy

Trigger the bad rollout¶

./labs/revision-failover/trigger.sh

The trigger script performs these actions:

az acr build --registry "$ACR_NAME" --image "${APP_NAME}:v1" ./workload

az containerapp update \
    --name "$APP_NAME" \
    --resource-group "$RG" \
    --image "${ACR_LOGIN_SERVER}/${APP_NAME}:v1" \
    --target-port 8000 \
    --registry-server "$ACR_LOGIN_SERVER" \
    --registry-username "$ACR_USERNAME" \
    --registry-password "$ACR_PASSWORD"

sleep 40

az containerapp update --name "$APP_NAME" --resource-group "$RG" --target-port 9999
sleep 40

az containerapp revision list --name "$APP_NAME" --resource-group "$RG" --output table
az containerapp logs show --name "$APP_NAME" --resource-group "$RG" --type system --tail 20

Expected output: a new revision appears with unhealthy status and system logs show probe or connection failures related to the wrong target port.

Investigate the failure signal¶

az containerapp logs show \
    --name "$APP_NAME" \
    --resource-group "$RG" \
    --type system

Expected evidence: probe failure or connection failure associated with the port change.

Roll traffic back to a healthy revision¶

export HEALTHY_REVISION="$(az containerapp revision list \
    --name "$APP_NAME" \
    --resource-group "$RG" \
    --query "sort_by([?properties.healthState=='Healthy'].{name:name,created:properties.createdTime}, &created)[-1].name" \
    --output tsv)"

az containerapp ingress traffic set \
    --name "$APP_NAME" \
    --resource-group "$RG" \
    --revision-weight "${HEALTHY_REVISION}=100"

Expected output: traffic update succeeds and the healthy revision handles requests.

Restore the correct target port and verify stabilization¶

./labs/revision-failover/verify.sh

The verify script confirms the latest revision is unhealthy, finds a healthy revision for rollback, then runs:

az containerapp ingress traffic set --name "$APP_NAME" --resource-group "$RG" --revision-weight "${HEALTHY_REVISION}=100"
az containerapp update --name "$APP_NAME" --resource-group "$RG" --target-port 8000
sleep 40
az containerapp revision list --name "$APP_NAME" --resource-group "$RG" --query "sort_by([].{name:name,created:properties.createdTime,health:properties.healthState}, &created)[-1].health" --output tsv

Expected output pattern:

RevisionUpdate        → New revision updated
RevisionDeactivating  → Prior bad revision deactivated
RevisionReady         → Stable revision ready
ContainerAppReady     → Running state reached

4) Experiment Log¶

Step	Action	Expected
1	Deploy baseline	Single healthy revision
2	Capture outputs	Variables populated
3	Run `trigger.sh`	New unhealthy revision appears
4	Review system logs	Port or probe failure evidence appears
5	Shift traffic to healthy revision	Healthy revision serves traffic
6	Run `verify.sh`	Corrected revision becomes healthy

Expected Evidence¶

Evidence Source	Expected State
`az containerapp revision list --name "$APP_NAME" --resource-group "$RG" --output table`	Healthy baseline revision exists before trigger; latest revision becomes non-healthy after `targetPort` changes to `9999`
`az containerapp logs show --name "$APP_NAME" --resource-group "$RG" --type system`	Probe failure or connection failure related to wrong target port
`az containerapp ingress traffic set --name "$APP_NAME" --resource-group "$RG" --revision-weight "${HEALTHY_REVISION}=100"`	Traffic can be restored to a healthy revision without rebuilding first
`./labs/revision-failover/verify.sh`	Rollback path succeeds and latest post-fix revision health improves

Clean Up¶

az group delete --name "$RG" --yes --no-wait

Bad Revision Rollout and Rollback

Revision Failover and Rollback Lab¶

Lab Metadata¶

1) Background¶

Architecture¶

2) Hypothesis¶

3) Runbook¶

Deploy baseline infrastructure¶

Capture deployment outputs¶

Confirm baseline healthy revision¶

Trigger the bad rollout¶

Investigate the failure signal¶

Roll traffic back to a healthy revision¶

Restore the correct target port and verify stabilization¶

4) Experiment Log¶

Expected Evidence¶

Clean Up¶

See Also¶

Sources¶

Revision Failover and Rollback Lab¶

Lab Metadata¶

1) Background¶

Architecture¶

2) Hypothesis¶

3) Runbook¶

Deploy baseline infrastructure¶

Capture deployment outputs¶

Confirm baseline healthy revision¶

Trigger the bad rollout¶

Investigate the failure signal¶

Roll traffic back to a healthy revision¶

Restore the correct target port and verify stabilization¶

4) Experiment Log¶

Expected Evidence¶

Clean Up¶

Related Playbook¶

See Also¶

Sources¶