HTTP Scaler in Azure Container Apps¶
The HTTP scaler is the default platform fit for interactive APIs and web frontends. It scales a revision based on concurrent request pressure instead of waiting for resource saturation.
HTTP rule shape¶
The HTTP scale rule uses concurrentRequests inside the rule metadata.
template:
scale:
minReplicas: 0
maxReplicas: 5
rules:
- name: http-rule
http:
metadata:
concurrentRequests: "100"
Documented defaults from Microsoft Learn:
concurrentRequestsdefault: 10- documented minimum: 1
flowchart TD
A[Incoming HTTP requests] --> B[Ingress measures concurrent load]
B --> C[HTTP scaler compares concurrentRequests threshold]
C --> D[Desired replica count]
D --> E[More or fewer revision replicas]
E --> F[Observed latency and throughput] Cold start and scale-to-zero¶
The HTTP scaler can scale a revision down to zero when minReplicas is 0.
That is attractive for:
- low-frequency APIs
- admin tools
- internal services with loose first-request latency targets
It is risky for:
- public APIs with strict latency SLOs
- apps with heavy startup paths
- services that must respond instantly after idle periods
When to prefer HTTP over CPU or memory¶
Prefer HTTP scaling when the user-facing bottleneck is request concurrency.
| Signal | Best for | Limitation |
|---|---|---|
| HTTP concurrency | Interactive web and API workloads | Can still produce cold-start bias if minReplicas is 0 |
| CPU | Sustained compute pressure | Reacts later than direct request pressure |
| Memory | Sustained memory pressure | Usually a lagging signal for traffic bursts |
Use HTTP scaling for request-driven apps first
Microsoft Learn explicitly recommends preferring HTTP scale rules to CPU or memory rules when possible.
Practical tuning guidance¶
- Lower
concurrentRequestsfor CPU-heavy handlers. - Raise
minReplicasabove 0 when cold starts are user-visible. - Keep
maxReplicasaligned with downstream capacity, not just frontend demand.
az containerapp update \
--name "$APP_NAME" \
--resource-group "$RG" \
--min-replicas 1 \
--max-replicas 20 \
--scale-rule-name "http-concurrency" \
--scale-rule-type http \
--scale-rule-metadata "concurrentRequests=50"
See Also¶
- Scaling Overview
- Scaling Rules Reference
- CPU & Memory Scalers
- Scaling Best Practices
- HTTP Scaling Not Triggering