Skip to content

HTTP Scaler in Azure Container Apps

The HTTP scaler is the default platform fit for interactive APIs and web frontends. It scales a revision based on concurrent request pressure instead of waiting for resource saturation.

HTTP rule shape

The HTTP scale rule uses concurrentRequests inside the rule metadata.

template:
  scale:
    minReplicas: 0
    maxReplicas: 5
    rules:
      - name: http-rule
        http:
          metadata:
            concurrentRequests: "100"

Documented defaults from Microsoft Learn:

  • concurrentRequests default: 10
  • documented minimum: 1
flowchart TD
    A[Incoming HTTP requests] --> B[Ingress measures concurrent load]
    B --> C[HTTP scaler compares concurrentRequests threshold]
    C --> D[Desired replica count]
    D --> E[More or fewer revision replicas]
    E --> F[Observed latency and throughput]

Cold start and scale-to-zero

The HTTP scaler can scale a revision down to zero when minReplicas is 0.

That is attractive for:

  • low-frequency APIs
  • admin tools
  • internal services with loose first-request latency targets

It is risky for:

  • public APIs with strict latency SLOs
  • apps with heavy startup paths
  • services that must respond instantly after idle periods

When to prefer HTTP over CPU or memory

Prefer HTTP scaling when the user-facing bottleneck is request concurrency.

Signal Best for Limitation
HTTP concurrency Interactive web and API workloads Can still produce cold-start bias if minReplicas is 0
CPU Sustained compute pressure Reacts later than direct request pressure
Memory Sustained memory pressure Usually a lagging signal for traffic bursts

Use HTTP scaling for request-driven apps first

Microsoft Learn explicitly recommends preferring HTTP scale rules to CPU or memory rules when possible.

Practical tuning guidance

  • Lower concurrentRequests for CPU-heavy handlers.
  • Raise minReplicas above 0 when cold starts are user-visible.
  • Keep maxReplicas aligned with downstream capacity, not just frontend demand.
az containerapp update \
  --name "$APP_NAME" \
  --resource-group "$RG" \
  --min-replicas 1 \
  --max-replicas 20 \
  --scale-rule-name "http-concurrency" \
  --scale-rule-type http \
  --scale-rule-metadata "concurrentRequests=50"

See Also

Sources