Skip to content

Scaling in Azure Container Apps with KEDA

Azure Container Apps uses KEDA (Kubernetes Event-Driven Autoscaling) to scale replicas based on demand signals such as HTTP requests, queue depth, and custom metrics.

This model enables both reactive scale-out and cost-efficient scale-in, including scale-to-zero in supported scenarios.

How KEDA-Based Scaling Works

flowchart LR
    M[Metric Source\nHTTP / Queue / Custom] --> K[KEDA Scaler]
    K --> D[Desired Replica Count]
    D --> R[Container App Revision Replicas]
    R --> O[Observed Throughput/Latency]
    O --> M

KEDA continuously evaluates rules and updates desired replica count within configured bounds.

Scaling is revision-scoped

Scale decisions apply to the active revision(s) receiving traffic. During progressive rollouts, evaluate scaling behavior for each active revision mix.

Min and Max Replicas

  • minReplicas: lower bound of warm capacity.
  • maxReplicas: upper bound to protect cost and downstream dependencies.

Think of these as your scaling guardrails:

Setting Primary Effect Common Use
minReplicas = 0 Lowest idle cost, potential cold starts Event-driven/background workloads
minReplicas > 0 Faster response, warm baseline Public APIs with latency targets
maxReplicas tuned low Controls blast radius Protect fragile dependencies
maxReplicas tuned high Handles bursts High-volume services with resilient backends

Scale Rule Types (Conceptual)

Rule Type Trigger Signal Typical Workload
HTTP Concurrency/request pressure APIs and web frontends
Queue/Event Queue depth or event lag Workers and async processing
CPU/Memory (supporting signal) Resource pressure Compute-heavy containers
Custom metrics Domain KPI Advanced autoscaling strategies

Practical Example: API + Worker Pattern

graph TD
    U[Users] --> API[API App\nminReplicas: 1]
    API --> Q[Queue]
    Q --> W[Worker App\nminReplicas: 0]
  • API keeps one warm replica for predictable latency.
  • Worker scales from zero when queue depth rises.
  • Both apps can scale independently even inside one environment.

Common Scaling Trade-offs

  • Lower idle cost vs cold-start sensitivity.
  • Aggressive scale-out vs downstream database saturation.
  • High max replicas vs budget predictability.

Good scaling design balances user experience, system stability, and cost controls.

Max replicas without dependency limits can cause outages

Aggressive scale-out can overload databases, caches, or third-party APIs. Set max replicas based on downstream capacity, not only frontend demand.

Advanced Topics

  • Coordinated scaling policies for multi-service pipelines.
  • Using custom metrics to scale on business throughput, not just infrastructure signals.
  • Managing revision-level scaling behavior during canary traffic splits.

See Also

Sources