Lab: Load Balancer 5xx¶

Reproduce a case where the Application Load Balancer returns 502 or 504 because the backend instances respond incorrectly or too slowly to complete the request.

Lab Metadata¶

Attribute	Value
Difficulty	Intermediate
Duration	40 minutes
Tier	Load-balanced web server environment
Failure Mode	ALB returns `502` or `504` while instances are reachable
Skills Practiced	ALB access log analysis, target health inspection, timeout correlation, backend log review

1) Background¶

1.1 Why this lab exists¶

Load balancer 5xx errors are frequently misread as immediate instance crashes. This lab teaches how to prove whether the ALB is failing because backend responses are malformed, reset, or too slow.

1.2 Platform behavior model¶

The ALB sits between clients and EB instances. If targets close connections, return invalid responses, or exceed timeout expectations, the ALB can emit 502 or 504 even when instances still appear online.

1.3 Diagram (Mermaid)¶

flowchart LR
    A[Client] --> B[ALB]
    B --> C[EB instance]
    C -->|slow or bad response| B
    B --> D[502 or 504 to client]
    D --> E[EB health warning or degraded]

2) Hypothesis¶

2.1 Original hypothesis¶

The ALB returns 5xx because the backend application response behavior is invalid or too slow for successful proxying.

2.2 Causal chain¶

Request forwarded to target -> target times out, resets, or returns bad upstream response -> ALB emits 502 or 504 -> user sees edge-side failure.

2.3 Proof criteria¶

ALB access logs or metrics show HTTPCode_ELB_5XX_Count rise.
Target health or app logs reveal the backend cause.
Reproducing the request directly against the instance shows the same slow or broken behavior.

2.4 Disproof criteria¶

Backend responses are normal and the issue instead comes from listener, certificate, or client routing configuration.

3) Runbook¶

Deploy the baseline version and confirm successful requests.

eb deploy "$ENV_NAME" --staged
curl --silent --show-error --location "$BASE_URL/"

Trigger the broken backend behavior.

bash "trigger.sh"

Review ALB and EB health evidence.

aws cloudwatch get-metric-statistics \
    --namespace AWS/ApplicationELB \
    --metric-name HTTPCode_ELB_5XX_Count \
    --dimensions Name=LoadBalancer,Value="$LOAD_BALANCER_DIMENSION" \
    --statistics Sum \
    --start-time 2026-04-07T05:00:00Z \
    --end-time 2026-04-07T05:20:00Z \
    --period 60

aws elbv2 describe-target-health --target-group-arn "$TARGET_GROUP_ARN"

Collect backend logs.

eb logs --environment-name "$ENV_NAME" --all
sudo less /var/log/nginx/error.log
sudo less /var/log/web.stdout.log

Query ALB or nginx logs with Logs Insights if enabled.

aws logs start-query \
    --log-group-name "/aws/elasticbeanstalk/$ENV_NAME/var/log/nginx/access.log" \
    --start-time 1712484000 \
    --end-time 1712486400 \
    --query-string 'fields @timestamp, @message | filter @message like / 502 / or @message like / 504 / | sort @timestamp desc | limit 20'

4) Experiment Log¶

Time (UTC)	Observation	Evidence
16:00	Baseline requests succeed	`curl` output
16:05	Broken route or timeout behavior triggered	`trigger.sh` output
16:09	ALB 5xx count rises	CloudWatch metric
16:11	Backend logs show timeout or upstream failure signature	nginx or app logs
16:14	Fixing backend restores successful responses	retest output

Expected Evidence¶

Before Trigger (Baseline)¶

ALB 5xx count is zero or near zero.
Target health is healthy.

During Incident¶

502 or 504 responses appear at the ALB.
Backend logs show matching timeouts or malformed responses.
EB health may move to Warning or Degraded.

After Recovery¶

ALB 5xx metric falls back to baseline.
Direct and ALB-routed requests both succeed.

Evidence Timeline (Mermaid sequence diagram)¶

sequenceDiagram
    participant Client
    participant ALB
    participant App
    Client->>ALB: HTTP request
    ALB->>App: Forward request
    App-->>ALB: Timeout or bad upstream response
    ALB-->>Client: 502 or 504
    Client->>App: Inspect logs and retry after fix

Evidence Chain: Why This Proves the Hypothesis¶

The proxy-layer symptom and backend-layer logs point to the same failing request path. Because the ALB 5xx spike aligns with backend errors and clears after backend correction, the load balancer is exposing an application-side failure rather than originating it.

Clean Up¶

eb terminate "$ENV_NAME"
aws cloudformation delete-stack --stack-name "$STACK_NAME" --region "$AWS_REGION"

Load Balancer Returns 5xx Errors

Lab: Load Balancer 5xx¶

Lab Metadata¶

1) Background¶

1.1 Why this lab exists¶

1.2 Platform behavior model¶

1.3 Diagram (Mermaid)¶

2) Hypothesis¶

2.1 Original hypothesis¶

2.2 Causal chain¶

2.3 Proof criteria¶

2.4 Disproof criteria¶

3) Runbook¶

4) Experiment Log¶

Expected Evidence¶

Before Trigger (Baseline)¶

During Incident¶

After Recovery¶

Evidence Timeline (Mermaid sequence diagram)¶

Evidence Chain: Why This Proves the Hypothesis¶

Clean Up¶

See Also¶

Sources¶

Lab: Load Balancer 5xx¶

Lab Metadata¶

1) Background¶

1.1 Why this lab exists¶

1.2 Platform behavior model¶

1.3 Diagram (Mermaid)¶

2) Hypothesis¶

2.1 Original hypothesis¶

2.2 Causal chain¶

2.3 Proof criteria¶

2.4 Disproof criteria¶

3) Runbook¶

4) Experiment Log¶

Expected Evidence¶

Before Trigger (Baseline)¶

During Incident¶

After Recovery¶

Evidence Timeline (Mermaid sequence diagram)¶

Evidence Chain: Why This Proves the Hypothesis¶

Clean Up¶

Related Playbook¶

See Also¶

Sources¶