Min Replica Change Impact¶
Reducing minReplicas (e.g., 100 → 50) is a common Day-2 operation for cost optimization. This page documents the actual runtime behavior, verified with a live Azure deployment.
Prerequisites¶
- Existing Container App with active traffic
- Azure CLI with
containerappextension
When to Use¶
- Reducing guaranteed replica count for cost savings
- Adjusting scaling floor after peak traffic period ends
- Right-sizing after initial over-provisioning
Key Question¶
Does reducing
minReplicascause downtime?
No. The change is applied without restarting or immediately terminating existing replicas. The KEDA autoscaler evaluates current load and gradually scales down to the new minimum during its cooldown window.
Procedure¶
Change Min Replicas¶
| Command/Parameter | Purpose |
|---|---|
az containerapp update | Updates container app configuration |
--min-replicas 2 | Sets the new minimum replica count |
Monitor Replica Count¶
Behavior Summary¶
flowchart TD
A[Admin changes minReplicas 5→2] --> B[New revision created]
B --> C[Existing 5 replicas continue serving]
C --> D{Traffic still active?}
D -->|Yes| E[Autoscaler maintains current count]
D -->|No| F[Cooldown period ~5min]
F --> G[Gradual scale-down begins]
G --> H[Reaches new min=2] | Phase | Duration | Behavior |
|---|---|---|
| Configuration update | ~20s | CLI returns; new revision created |
| Active traffic period | Indefinite | Replicas stay at current count (autoscaler evaluates load) |
| Cooldown after traffic stops | ~5 min (KEDA default 300s) | No scale-down yet |
| Gradual termination | 3–5 min | Replicas terminated one at a time with connection draining |
| Steady state | — | Replica count = new minReplicas |
Validated Results¶
Lab Validation: 2026-05-18, az CLI 2.73.0, Korea Central
Test Environment: External ingress Container App, mcr.microsoft.com/k8se/quickstart:latest, single-revision mode.
| # | Test | Result |
|---|---|---|
| 1 | Initial replica count (min=5) | ✅ 5 replicas running |
| 2 | Continuous HTTP requests during change (120 requests) | ✅ All returned HTTP 200 |
| 3 | Non-200 responses during/after change | ✅ Zero errors |
| 4 | Replica count immediately after change | 5 (unchanged) |
| 5 | Replica count after ~5min cooldown | 4 (gradual reduction started) |
| 6 | Replica count after ~8min | 2 (reached new minimum) |
| 7 | App responsiveness at min=2 | ✅ HTTP 200 |
Key Findings:
- [Observed] Zero downtime — all 120 requests during the min replica change returned HTTP 200.
- [Observed] Existing replicas are NOT immediately terminated. The autoscaler waits for the KEDA cooldown period (~5 min) before starting gradual scale-down.
- [Observed] Scale-down is gradual (5→4→2), not instant (5→2). Replicas are drained one at a time.
- [Observed] Total time from change to reaching new minimum: approximately 8 minutes (with no traffic).
- [Observed] If traffic is still active during the change, the autoscaler keeps replicas at the level needed to handle load, regardless of the new lower minimum.
Impact Assessment for Production (100 → 50)¶
| Concern | Impact | Explanation |
|---|---|---|
| Immediate downtime | None | Existing replicas continue serving |
| Request failures during change | None | No connection resets or 5xx |
| Cold starts | Possible later | If traffic drops and stays low, only 50 replicas remain. Sudden spike needs scale-up time. |
| Scaling speed back up | ~30s per replica | If load increases, KEDA scales up from 50 (not from 0) |
Peak Traffic Consideration
If your service consistently needs 80+ replicas under normal load, reducing min to 50 means the autoscaler must scale up 30+ replicas on every traffic spike. Factor in the ~30s/replica scale-up time for your latency SLO.
Recommended Approach
- Change min replicas during low-traffic window
- Monitor p95 latency for the next 24h
- If latency spikes appear during traffic ramp-up, increase min back
Rollback¶
If issues are observed after reducing min replicas:
Scale-up to the previous minimum is immediate — new replicas start within ~30 seconds.