Lab: CPU and Memory Exhaustion¶

Generate sustained CPU pressure and memory growth in the application process so you can practice distinguishing resource saturation from deployment and network issues.

Lab Metadata¶

Attribute	Value
Difficulty	Advanced
Duration	45 minutes
Tier	Single-instance or load-balanced web server environment
Failure Mode	Application process consumes excessive CPU and memory, causing slow responses and restarts
Skills Practiced	EC2 resource inspection, EB health interpretation, application log analysis, CloudWatch metric correlation

1) Background¶

1.1 Why this lab exists¶

CPU and memory incidents often look like generic slowness at first. This lab teaches how to prove compute saturation with direct evidence from instance state and logs.

1.2 Platform behavior model¶

When the application process consumes too much CPU or memory, request latency increases, worker processes may be killed or restarted, and EB health can degrade from Ok to Warning, Degraded, or Severe depending on user impact.

1.3 Diagram (Mermaid)¶

flowchart TD
    A[Trigger workload] --> B[App CPU usage spikes]
    A --> C[App memory grows]
    B --> D[Slow request handling]
    C --> E[Process recycle or kill]
    D --> F[EB degraded health]
    E --> F

2) Hypothesis¶

2.1 Original hypothesis¶

The incident is driven by instance-level CPU and memory exhaustion caused by the application workload.

2.2 Causal chain¶

Heavy request pattern -> process CPU and memory rise -> response time and error rate worsen -> web process restarts or becomes unstable -> EB health degrades.

2.3 Proof criteria¶

CloudWatch metrics show CPU growth and memory-related symptoms during the trigger window.
Process-level inspection shows the application consuming abnormal resources.
Application or system logs show worker restart, termination, or timeout behavior.

2.4 Disproof criteria¶

CPU and memory stay normal while errors still occur, indicating a different bottleneck such as database, listener, or health configuration.

3) Runbook¶

Establish baseline resource usage.

aws elasticbeanstalk describe-environment-health \
    --environment-name "$ENV_NAME" \
    --attribute-names Status Color Causes ApplicationMetrics InstancesHealth

eb logs --environment-name "$ENV_NAME" --all

Trigger CPU and memory stress in the application.

bash "trigger.sh"

Inspect CloudWatch and EB health changes.

aws cloudwatch get-metric-statistics \
    --namespace AWS/EC2 \
    --metric-name CPUUtilization \
    --dimensions Name=InstanceId,Value="$INSTANCE_ID" \
    --statistics Average Maximum \
    --start-time 2026-04-07T04:00:00Z \
    --end-time 2026-04-07T04:20:00Z \
    --period 60

aws elasticbeanstalk describe-environment-health \
    --environment-name "$ENV_NAME" \
    --attribute-names Status Color Causes InstancesHealth

Inspect instance-level evidence.

top
free --mega
sudo less /var/log/web.stdout.log
sudo less /var/log/nginx/error.log

Query streamed app logs for restart clues if available.

aws logs start-query \
    --log-group-name "/aws/elasticbeanstalk/$ENV_NAME/var/log/web.stdout.log" \
    --start-time 1712480400 \
    --end-time 1712482800 \
    --query-string 'fields @timestamp, @message | filter @message like /killed|timeout|memory|worker/ | sort @timestamp desc | limit 20'

4) Experiment Log¶

Time (UTC)	Observation	Evidence
14:00	Environment healthy and responsive	baseline health check
14:06	Stress trigger starts CPU-intensive and memory-heavy path	`trigger.sh` output
14:09	CPU utilization spikes and request latency rises	CloudWatch metrics
14:11	Web process logs show worker timeout or recycle	`web.stdout.log`
14:15	Health degrades until load stops or process recovers	`describe-environment-health`

Expected Evidence¶

Before Trigger (Baseline)¶

CPU is moderate.
No memory-related process restarts.
Health is Ok.

During Incident¶

CPU approaches saturation.
Memory pressure or worker recycle messages appear.
Response time and health state worsen in the same interval.

After Recovery¶

Resource usage returns toward baseline.
Application stabilizes and health improves.

Evidence Timeline (Mermaid sequence diagram)¶

sequenceDiagram
    participant Load as Trigger Script
    participant App
    participant EC2
    participant EB as Elastic Beanstalk
    Load->>App: CPU and memory heavy requests
    App->>EC2: Resource consumption increases
    EC2-->>App: Slow scheduling / memory pressure
    App-->>EB: Slow or failed responses
    EB-->>Load: Degraded health visible

Evidence Chain: Why This Proves the Hypothesis¶

The evidence aligns across three layers: instance metrics show saturation, process logs show instability, and user-facing health degrades in the same window. That multi-layer timing supports CPU and memory exhaustion as the primary mechanism.

Clean Up¶

eb terminate "$ENV_NAME"
aws cloudformation delete-stack --stack-name "$STACK_NAME" --region "$AWS_REGION"

CPU and Memory Exhaustion

Lab: CPU and Memory Exhaustion¶

Lab Metadata¶

1) Background¶

1.1 Why this lab exists¶

1.2 Platform behavior model¶

1.3 Diagram (Mermaid)¶

2) Hypothesis¶

2.1 Original hypothesis¶

2.2 Causal chain¶

2.3 Proof criteria¶

2.4 Disproof criteria¶

3) Runbook¶

4) Experiment Log¶

Expected Evidence¶

Before Trigger (Baseline)¶

During Incident¶

After Recovery¶

Evidence Timeline (Mermaid sequence diagram)¶

Evidence Chain: Why This Proves the Hypothesis¶

Clean Up¶

See Also¶

Sources¶

Lab: CPU and Memory Exhaustion¶

Lab Metadata¶

1) Background¶

1.1 Why this lab exists¶

1.2 Platform behavior model¶

1.3 Diagram (Mermaid)¶

2) Hypothesis¶

2.1 Original hypothesis¶

2.2 Causal chain¶

2.3 Proof criteria¶

2.4 Disproof criteria¶

3) Runbook¶

4) Experiment Log¶

Expected Evidence¶

Before Trigger (Baseline)¶

During Incident¶

After Recovery¶

Evidence Timeline (Mermaid sequence diagram)¶

Evidence Chain: Why This Proves the Hypothesis¶

Clean Up¶

Related Playbook¶

See Also¶

Sources¶