Skip to content

Scaling Operations

This guide explains how to operate scaling in production, including manual replica control, KEDA-based autoscaling, and scale-to-zero behavior.

Prerequisites

  • Existing Container App in a managed environment
  • Baseline performance targets (latency, throughput, queue delay)
export RG="rg-aca-prod"
export APP_NAME="app-python-api-prod"
export ENVIRONMENT_NAME="aca-env-prod"

Manual Scaling for Controlled Events

Use manual scaling for maintenance windows or expected short-term load.

az containerapp update \
  --name "$APP_NAME" \
  --resource-group "$RG" \
  --min-replicas 3 \
  --max-replicas 10

Check current replica settings:

az containerapp show \
  --name "$APP_NAME" \
  --resource-group "$RG" \
  --query "properties.template.scale" \
  --output json

KEDA Rule Operations

Scale based on HTTP concurrency:

az containerapp update \
  --name "$APP_NAME" \
  --resource-group "$RG" \
  --scale-rule-name "http-concurrency" \
  --scale-rule-type "http" \
  --scale-rule-metadata "concurrentRequests=100" \
  --min-replicas 1 \
  --max-replicas 20

Example queue scaler operation (Azure Service Bus):

az containerapp update \
  --name "$APP_NAME" \
  --resource-group "$RG" \
  --scale-rule-name "sb-queue" \
  --scale-rule-type "azure-servicebus" \
  --scale-rule-metadata "queueName=orders" "messageCount=50" "namespace=<servicebus-namespace>.servicebus.windows.net"

Use Azure Monitor metrics to tune thresholds:

az monitor metrics list \
  --resource "/subscriptions/<subscription-id>/resourceGroups/$RG/providers/Microsoft.App/containerApps/$APP_NAME" \
  --metric "Requests" \
  --interval "PT1M" \
  --output table

Scale-to-Zero Operations

Enable scale-to-zero for event-driven or intermittent workloads:

az containerapp update \
  --name "$APP_NAME" \
  --resource-group "$RG" \
  --min-replicas 0 \
  --max-replicas 10

Use this mode only when cold start impact is acceptable.

Verification Steps

az containerapp replica list \
  --name "$APP_NAME" \
  --resource-group "$RG" \
  --output table
az containerapp show \
  --name "$APP_NAME" \
  --resource-group "$RG" \
  --query "{minReplicas:properties.template.scale.minReplicas,maxReplicas:properties.template.scale.maxReplicas,rules:properties.template.scale.rules}" \
  --output json

Example output (PII masked):

{
  "minReplicas": 0,
  "maxReplicas": 20,
  "rules": [
    {
      "name": "http-concurrency",
      "custom": {
        "type": "http",
        "metadata": {
          "concurrentRequests": "100"
        }
      }
    }
  ]
}

Scaling Decision Framework

flowchart LR
    A[Traffic or Queue Increase] --> B{Workload Type}
    B -->|HTTP interactive| C[HTTP concurrency rule]
    B -->|Queue driven| D[Event scaler rule]
    B -->|Scheduled batch| E[Container Apps Job]
    C --> F[Set min/max replicas]
    D --> F
    E --> G[Set parallelism and retry]
Symptom Primary Knob First Adjustment Validation Signal
p95 latency rises while CPU moderate HTTP concurrency threshold Lower concurrentRequests target Latency drops without excessive replicas
Queue delay grows steadily Queue message threshold Decrease messageCount trigger Queue depth recovers within SLO
Cost spike overnight Min replicas and max guardrail Reduce min-replicas and cap max-replicas Cost trend normalizes with acceptable latency
Frequent cold starts Minimum replicas Raise min-replicas from 0 to 1-2 Startup-related errors decrease

Tune one variable at a time

Change only one scaler parameter per evaluation cycle so you can attribute impact correctly.

Scaling cannot fix application bottlenecks alone

If database limits or downstream API quotas are saturated, adding replicas may increase failure rate. Validate dependency capacity before aggressive scale-out.

Troubleshooting

Autoscaling does not trigger

  • Confirm scaler metadata values and key names.
  • Check if incoming load actually reaches configured thresholds.
  • Validate identity/secret references for external event sources.
az containerapp logs show \
  --name "$APP_NAME" \
  --resource-group "$RG" \
  --type system \
  --follow false

Advanced Topics

  • Combine multiple KEDA rules and set max replica guardrails.
  • Separate interactive and batch workloads into different apps.
  • Define pre-warming strategies for predictable peak windows.

See Also

Sources