CPU Throttling¶
Use this playbook when latency rises under burst traffic, startup becomes slow under load, or replicas stay healthy but CPU saturation prevents them from keeping up.
Symptom¶
- Request latency climbs during bursts even though the app stays nominally available.
- Health probes begin to fail only when traffic or background work increases.
- CPU metrics stay pinned near the configured limit while replica count lags demand.
- Environment or revision events suggest quota pressure rather than an application crash.
Possible Causes¶
- Per-replica CPU allocation is too small for the request burst.
- HTTP concurrency is too high, so each replica accepts more work than its CPU budget can sustain.
- Scale-out is too slow for the burst profile.
- Environment-level quota pressure reduces available headroom.
- Startup or warm-up logic competes with live traffic for the same CPU budget.
Diagnosis Steps¶
flowchart TD
A[High latency or probe failures] --> B[Check CPU metrics and replica count]
B --> C{CPU near limit?}
C -->|No| D[Look for memory, network, or dependency bottlenecks]
C -->|Yes| E[Compare requests, replicas, and scale settings]
E --> F{Replica count increased in time?}
F -->|No| G[Scale rule or min replica issue]
F -->|Yes| H[Per-replica CPU budget too low]
G --> I[Adjust scale rule thresholds or pre-scale]
H --> J[Increase CPU or move to a larger profile] -
Confirm the current resource envelope and scale settings.
az containerapp show \ --name "$APP_NAME" \ --resource-group "$RG" \ --query "{resources:properties.template.containers[0].resources,scale:properties.template.scale}" \ --output json az containerapp env list-usages \ --subscription "<subscription-id>" \ --resource-group "$RG" \ --name "$CONTAINER_ENV" -
Check whether CPU usage is saturating while requests continue to rise.
-
Correlate request pressure, replica count, and throttling symptoms in Log Analytics.
let AppName = "ca-myapp"; ContainerAppSystemLogs_CL | where ContainerAppName_s == AppName | where TimeGenerated > ago(1h) | where Reason_s has_any ("ProbeFailed", "ReplicaStarted", "ReplicaReady") or Log_s has_any ("timeout", "probe", "throttle", "cpu") | project TimeGenerated, RevisionName_s, ReplicaName_s, Reason_s, Log_s | order by TimeGenerated desc -
Inspect whether the scale rule is allowing too much work per replica before additional replicas appear.
| Command or Query | Why it is used |
|---|---|
az containerapp show --query "{resources...,scale...}" | Confirms the current CPU allocation and scaling configuration. |
az containerapp env list-usages ... | Checks whether the environment is already near its quota boundary. |
az monitor metrics list --metric UsageNanoCores ... | Verifies whether the app is CPU-bound rather than merely slow. |
KQL against ContainerAppSystemLogs_CL | Correlates probe failures and replica events with the CPU saturation window. |
Resolution¶
-
Increase per-replica CPU when each request genuinely needs more compute.
-
Reduce HTTP concurrency or lower scale thresholds so new replicas arrive before each replica is overloaded.
- Increase
minReplicasfor known burst windows to avoid slow scale-out. - Move the workload to a workload profile with more predictable CPU headroom if burst traffic is normal.
- If startup is CPU-heavy, separate warm-up work from live request handling or extend startup timing appropriately.
Prevention¶
- Baseline
UsageNanoCores, latency, and replica count together, not in isolation. - Keep load-test evidence for the chosen CPU setting and scale thresholds.
- Alert on sustained high CPU plus rising latency, not on CPU alone.
- Avoid very high HTTP concurrency on CPU-intensive endpoints.
- Revisit workload profile and quota assumptions before major traffic events.