Scaling Operations¶
This guide explains how to operate scaling in production, including manual replica control, KEDA-based autoscaling, and scale-to-zero behavior.
Prerequisites¶
- Existing Container App in a managed environment
- Baseline performance targets (latency, throughput, queue delay)
export RG="rg-aca-prod"
export APP_NAME="app-python-api-prod"
export ENVIRONMENT_NAME="aca-env-prod"
Manual Scaling for Controlled Events¶
Use manual scaling for maintenance windows or expected short-term load.
az containerapp update \
--name "$APP_NAME" \
--resource-group "$RG" \
--min-replicas 3 \
--max-replicas 10
Check current replica settings:
az containerapp show \
--name "$APP_NAME" \
--resource-group "$RG" \
--query "properties.template.scale" \
--output json
KEDA Rule Operations¶
Scale based on HTTP concurrency:
az containerapp update \
--name "$APP_NAME" \
--resource-group "$RG" \
--scale-rule-name "http-concurrency" \
--scale-rule-type "http" \
--scale-rule-metadata "concurrentRequests=100" \
--min-replicas 1 \
--max-replicas 20
Example queue scaler operation (Azure Service Bus):
az containerapp update \
--name "$APP_NAME" \
--resource-group "$RG" \
--scale-rule-name "sb-queue" \
--scale-rule-type "azure-servicebus" \
--scale-rule-metadata "queueName=orders" "messageCount=50" "namespace=<servicebus-namespace>.servicebus.windows.net"
Use Azure Monitor metrics to tune thresholds:
az monitor metrics list \
--resource "/subscriptions/<subscription-id>/resourceGroups/$RG/providers/Microsoft.App/containerApps/$APP_NAME" \
--metric "Requests" \
--interval "PT1M" \
--output table
Scale-to-Zero Operations¶
Enable scale-to-zero for event-driven or intermittent workloads:
az containerapp update \
--name "$APP_NAME" \
--resource-group "$RG" \
--min-replicas 0 \
--max-replicas 10
Use this mode only when cold start impact is acceptable.
Verification Steps¶
az containerapp show \
--name "$APP_NAME" \
--resource-group "$RG" \
--query "{minReplicas:properties.template.scale.minReplicas,maxReplicas:properties.template.scale.maxReplicas,rules:properties.template.scale.rules}" \
--output json
Example output (PII masked):
{
"minReplicas": 0,
"maxReplicas": 20,
"rules": [
{
"name": "http-concurrency",
"custom": {
"type": "http",
"metadata": {
"concurrentRequests": "100"
}
}
}
]
}
Scaling Decision Framework¶
flowchart LR
A[Traffic or Queue Increase] --> B{Workload Type}
B -->|HTTP interactive| C[HTTP concurrency rule]
B -->|Queue driven| D[Event scaler rule]
B -->|Scheduled batch| E[Container Apps Job]
C --> F[Set min/max replicas]
D --> F
E --> G[Set parallelism and retry] | Symptom | Primary Knob | First Adjustment | Validation Signal |
|---|---|---|---|
| p95 latency rises while CPU moderate | HTTP concurrency threshold | Lower concurrentRequests target | Latency drops without excessive replicas |
| Queue delay grows steadily | Queue message threshold | Decrease messageCount trigger | Queue depth recovers within SLO |
| Cost spike overnight | Min replicas and max guardrail | Reduce min-replicas and cap max-replicas | Cost trend normalizes with acceptable latency |
| Frequent cold starts | Minimum replicas | Raise min-replicas from 0 to 1-2 | Startup-related errors decrease |
Tune one variable at a time
Change only one scaler parameter per evaluation cycle so you can attribute impact correctly.
Scaling cannot fix application bottlenecks alone
If database limits or downstream API quotas are saturated, adding replicas may increase failure rate. Validate dependency capacity before aggressive scale-out.
Troubleshooting¶
Autoscaling does not trigger¶
- Confirm scaler metadata values and key names.
- Check if incoming load actually reaches configured thresholds.
- Validate identity/secret references for external event sources.
az containerapp logs show \
--name "$APP_NAME" \
--resource-group "$RG" \
--type system \
--follow false
Advanced Topics¶
- Combine multiple KEDA rules and set max replica guardrails.
- Separate interactive and batch workloads into different apps.
- Define pre-warming strategies for predictable peak windows.