Scaling in Azure Container Apps with KEDA¶
Azure Container Apps uses KEDA (Kubernetes Event-Driven Autoscaling) to scale replicas based on demand signals such as HTTP requests, queue depth, and custom metrics.
This model enables both reactive scale-out and cost-efficient scale-in, including scale-to-zero in supported scenarios.
How KEDA-Based Scaling Works¶
flowchart LR
M[Metric Source\nHTTP / Queue / Custom] --> K[KEDA Scaler]
K --> D[Desired Replica Count]
D --> R[Container App Revision Replicas]
R --> O[Observed Throughput/Latency]
O --> M KEDA continuously evaluates rules and updates desired replica count within configured bounds.
Scaling is revision-scoped
Scale decisions apply to the active revision(s) receiving traffic. During progressive rollouts, evaluate scaling behavior for each active revision mix.
Min and Max Replicas¶
- minReplicas: lower bound of warm capacity.
- maxReplicas: upper bound to protect cost and downstream dependencies.
Think of these as your scaling guardrails:
| Setting | Primary Effect | Common Use |
|---|---|---|
| minReplicas = 0 | Lowest idle cost, potential cold starts | Event-driven/background workloads |
| minReplicas > 0 | Faster response, warm baseline | Public APIs with latency targets |
| maxReplicas tuned low | Controls blast radius | Protect fragile dependencies |
| maxReplicas tuned high | Handles bursts | High-volume services with resilient backends |
Scale Rule Types (Conceptual)¶
| Rule Type | Trigger Signal | Typical Workload |
|---|---|---|
| HTTP | Concurrency/request pressure | APIs and web frontends |
| Queue/Event | Queue depth or event lag | Workers and async processing |
| CPU/Memory (supporting signal) | Resource pressure | Compute-heavy containers |
| Custom metrics | Domain KPI | Advanced autoscaling strategies |
Practical Example: API + Worker Pattern¶
graph TD
U[Users] --> API[API App\nminReplicas: 1]
API --> Q[Queue]
Q --> W[Worker App\nminReplicas: 0] - API keeps one warm replica for predictable latency.
- Worker scales from zero when queue depth rises.
- Both apps can scale independently even inside one environment.
Common Scaling Trade-offs¶
- Lower idle cost vs cold-start sensitivity.
- Aggressive scale-out vs downstream database saturation.
- High max replicas vs budget predictability.
Good scaling design balances user experience, system stability, and cost controls.
Max replicas without dependency limits can cause outages
Aggressive scale-out can overload databases, caches, or third-party APIs. Set max replicas based on downstream capacity, not only frontend demand.
Advanced Topics¶
- Coordinated scaling policies for multi-service pipelines.
- Using custom metrics to scale on business throughput, not just infrastructure signals.
- Managing revision-level scaling behavior during canary traffic splits.
See Also¶
- How Container Apps Works
- Environments and Apps
- Networking
- Revision Management and Traffic Splitting
- KEDA open-source scalers documentation