Lab: Immutable Update Rollback¶
Practice diagnosing an immutable deployment that launches a replacement batch but rolls back before promotion because the new instances never pass health checks.
Lab Metadata¶
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Duration | 45 minutes |
| Tier | Load-balanced web server environment |
| Failure Mode | Immutable deployment replacement batch stays unhealthy and EB aborts promotion |
| Skills Practiced | Deployment policy analysis, batch health verification, Auto Scaling activity review, rollback interpretation |
1) Background¶
1.1 Why this lab exists¶
Immutable deployments reduce blast radius but consume extra capacity and depend on replacement instances becoming healthy. This lab teaches how to prove why the replacement batch never became promotable.
1.2 Platform behavior model¶
During an immutable update, EB launches a temporary Auto Scaling group, deploys the new version there, and waits for healthy instances before switching traffic. If that batch remains unhealthy, EB discards it and preserves the old fleet.
1.3 Diagram (Mermaid)¶
flowchart TD
A[Immutable deployment selected] --> B[Launch temporary batch]
B --> C[Deploy new version on replacement instances]
C --> D{Replacement batch healthy?}
D -->|No| E[Abort promotion]
E --> F[Terminate temporary batch]
F --> G[Rollback to original fleet] 2) Hypothesis¶
2.1 Original hypothesis¶
The immutable update rolls back because replacement instances start, but the new application version never becomes healthy enough for promotion.
2.2 Causal chain¶
Immutable policy -> extra instances launched -> bad version on temporary batch -> target health never stabilizes -> EB aborts immutable update -> original instances remain in service.
2.3 Proof criteria¶
- Deployment policy shows
Immutable. - Auto Scaling and EB events show temporary replacement capacity creation and later termination.
- Target health for the replacement batch is unhealthy until rollback.
2.4 Disproof criteria¶
- Replacement batch becomes healthy and rollback instead results from quota, launch, or unrelated control-plane failure.
3) Runbook¶
- Confirm immutable deployment policy.
aws elasticbeanstalk describe-configuration-settings \
--application-name "$APP_NAME" \
--environment-name "$ENV_NAME"
- Deploy the failing immutable update.
- Watch EB events and scaling activities.
eb events --environment-name "$ENV_NAME" --all
aws autoscaling describe-scaling-activities \
--auto-scaling-group-name "$ASG_NAME" \
--max-items 50
- Validate replacement batch health.
aws elbv2 describe-target-health --target-group-arn "$TARGET_GROUP_ARN"
aws elasticbeanstalk describe-environment-health \
--environment-name "$ENV_NAME" \
--attribute-names Status Color Causes InstancesHealth
- Inspect deployment logs from one failed replacement instance.
eb logs --environment-name "$ENV_NAME" --all
sudo less /var/log/eb-engine.log
sudo less /var/log/web.stdout.log
4) Experiment Log¶
| Time (UTC) | Observation | Evidence |
|---|---|---|
| 11:00 | Environment configured for immutable updates | describe-configuration-settings |
| 11:05 | Temporary instances launch for replacement batch | describe-scaling-activities |
| 11:11 | Replacement targets remain unhealthy | describe-target-health |
| 11:14 | EB aborts immutable update and terminates new batch | eb events |
| 11:18 | Original fleet still serves traffic and health returns to Ok | describe-environment-health |
Expected Evidence¶
Before Trigger (Baseline)¶
- Existing instances are healthy.
- Current deployment policy is immutable.
During Incident¶
- Temporary capacity appears in Auto Scaling activities.
- New targets do not pass health checks.
- EB event stream reports immutable rollback or abort.
After Recovery¶
- Temporary instances disappear.
- Original version remains active.
- Environment health returns to its pre-change state.
Evidence Timeline (Mermaid sequence diagram)¶
sequenceDiagram
participant User
participant EB as Elastic Beanstalk
participant ASG as Auto Scaling
participant ALB
User->>EB: Deploy immutable update
EB->>ASG: Launch temporary batch
ASG->>ALB: Register new targets
ALB-->>EB: Targets unhealthy
EB->>ASG: Terminate temporary batch
EB-->>User: Immutable update rolled back Evidence Chain: Why This Proves the Hypothesis¶
The environment only rolls back after the replacement batch fails health evaluation. Auto Scaling evidence proves the extra immutable capacity existed, and ALB plus EB health evidence proves that the replacement fleet, not the original fleet, was the failing component.
Clean Up¶
eb terminate "$ENV_NAME"
aws cloudformation delete-stack --stack-name "$STACK_NAME" --region "$AWS_REGION"