Skip to content

CPU and Memory Scalers in Azure Container Apps

CPU and memory scalers protect a revision when the real bottleneck is sustained resource pressure. They are useful, but they should rarely be your only scaling signal for user-facing traffic.

Rule shape

CPU and memory scalers use custom rules with the scaler type set to cpu or memory.

template:
  scale:
    minReplicas: 1
    maxReplicas: 10
    rules:
      - name: cpu-rule
        custom:
          type: cpu
          metadata:
            type: Utilization
            value: "70"
      - name: memory-rule
        custom:
          type: memory
          metadata:
            type: Utilization
            value: "80"
flowchart TD
    A[Workload demand rises] --> B[Request or queue pressure increases]
    B --> C[CPU and memory usage increase later]
    C --> D[Resource scaler requests more replicas]
    D --> E[Revision returns toward utilization target]

What Microsoft Learn confirms

  • CPU scaling can add replicas when average CPU utilization reaches the configured threshold.
  • Memory scaling can add replicas when average memory utilization reaches the configured threshold.
  • CPU and memory scaling do not allow scale-to-zero.

Workload profiles requirement

Dedicated workload profile requirement is unverified in current Microsoft Learn documentation

Microsoft Learn documents workload profiles and documents CPU and memory scale rules, but the current Learn pages do not state that CPU or memory scalers require a Dedicated workload profile. If your architecture review depends on that claim, treat it as unverified and validate against the current product documentation before enforcing it as policy.

What Learn does confirm is that workload profiles define the compute and billing model for the environment:

  • Consumption
  • Dedicated
  • Consumption + Dedicated mix
  • Flex Consumption preview behavior

When to use CPU and memory rules

Use CPU or memory scaling when:

  • requests are CPU-heavy and sustained
  • memory growth tracks real work
  • you need a protective signal against saturation

Do not rely on CPU or memory alone when:

  • incoming work is visible sooner through HTTP or queue backlog
  • bursts arrive faster than resource counters react

Common gotchas

  • Lagging signal — resource utilization usually rises after demand arrives.
  • Oscillation risk — mismatched thresholds and low maxReplicas can cause noisy scaling.
  • No scale-to-zero — these rules keep at least one replica.
az containerapp update \
  --name "$APP_NAME" \
  --resource-group "$RG" \
  --min-replicas 1 \
  --max-replicas 10 \
  --scale-rule-name "cpu-protect" \
  --scale-rule-type cpu \
  --scale-rule-metadata "type=Utilization" "value=70"

See Also

Sources