HTTP Scaling Not Triggering¶
1. Summary¶
Symptom¶
Request load increases significantly but replica count remains flat or scales too slowly. Users experience high latency while the app appears under-provisioned. Load tests do not produce expected scale-out behavior. KEDA scalers are configured but seem inactive.
Why this scenario is confusing¶
HTTP scaling involves multiple components: KEDA metrics server, stabilization windows, threshold calculations, and min/max boundaries. A misconfiguration in any of these can make it look like "scaling is broken" when it's actually working as configured—just not as intended.
Troubleshooting decision flow¶
graph TD
A[Symptom: Latency high, replicas flat] --> B{maxReplicas reached?}
B -->|Yes| H1[H1: Max replicas too low]
B -->|No| C{HTTP scale rule exists?}
C -->|No| H2[H2: No HTTP scale rule configured]
C -->|Yes| D{Traffic hitting ingress?}
D -->|No| H3[H3: Test traffic bypassing app]
D -->|Yes| E{Threshold realistic for load?}
E -->|No| H4[H4: Threshold too high]
E -->|Yes| H5[H5: Stabilization window or polling delay] 2. Common Misreadings¶
- "KEDA is broken" — Most incidents are min/max boundaries, threshold mismatches, or test configuration issues.
- "Any traffic should instantly scale out" — HTTP scaling has polling intervals (15-30s) and stabilization windows.
- "Replicas didn't increase, so something is wrong" — If load is below threshold, no scaling is expected behavior.
- "I set concurrentRequests=10, so 100 requests should give me 10 replicas" — It's concurrent requests per replica, not total requests.
- "Scale worked in staging but not production" — Different traffic patterns, DNS, or ingress configurations.
3. Competing Hypotheses¶
| Hypothesis | Typical Evidence For | Typical Evidence Against |
|---|---|---|
| H1: Max replicas too low | Replica count capped at configured max during load | Replicas remain well below max |
| H2: No HTTP scale rule configured | No scale rules in template, only min/max | HTTP rule exists in configuration |
| H3: Test traffic bypassing app | No request logs during load test | App logs show request surge |
| H4: Threshold too high | Concurrent requests < threshold × replica count | Traffic exceeds threshold significantly |
| H5: Stabilization window delay | Scale-out happens but delayed by 30-60 seconds | Immediate scaling when expected |
4. What to Check First¶
Metrics¶
- Request rate and concurrent connections over time
- P95/P99 latency correlation with replica count
- Replica count timeline during load test
Logs¶
let AppName = "ca-myapp";
ContainerAppSystemLogs_CL
| where ContainerAppName_s == AppName
| where TimeGenerated > ago(1h)
| where Reason_s has_any ("KEDAScaler", "Scaling", "Replica")
or Log_s has_any ("scale", "keda", "replica", "http")
| project TimeGenerated, RevisionName_s, Reason_s, Log_s
| order by TimeGenerated desc
Platform Signals¶
# Check scale configuration
az containerapp show --name "$APP_NAME" --resource-group "$RG" \
--query "properties.template.scale" --output json
# Check current replica count
az containerapp replica list --name "$APP_NAME" --resource-group "$RG" --output table
# Check system logs for scaling events
az containerapp logs show --name "$APP_NAME" --resource-group "$RG" --type system
5. Evidence to Collect¶
Required Evidence¶
| Evidence | Command/Query | Purpose |
|---|---|---|
| Scale config | az containerapp show ... --query template.scale | Verify min/max/rules |
| Replica count | az containerapp replica list | Current state |
| System logs | KQL for scaling events | KEDA activity |
| Request logs | Console logs during load | Traffic confirmation |
| Ingress metrics | Azure Portal metrics | Request rate |
Useful Context¶
- Expected concurrent requests during peak
- Load test tool and configuration
- Target FQDN being tested
- Previous scaling behavior (if any)
6. Validation and Disproof by Hypothesis¶
H1: Max replicas too low¶
Signals that support:
- Replica count reaches exactly maxReplicas and stays there
- Latency keeps increasing while at max
- System logs show no further scale-out attempts
Signals that weaken:
- Replica count stays well below max
- Max is set high (e.g., 30) but only 2 replicas running
What to verify:
# Check min/max configuration
az containerapp show --name "$APP_NAME" --resource-group "$RG" \
--query "properties.template.scale.{min:minReplicas,max:maxReplicas}" --output json
# Check if currently at max
CURRENT=$(az containerapp replica list --name "$APP_NAME" --resource-group "$RG" --query "length(@)" --output tsv)
MAX=$(az containerapp show --name "$APP_NAME" --resource-group "$RG" --query "properties.template.scale.maxReplicas" --output tsv)
echo "Current: $CURRENT, Max: $MAX"
// Check replica count over time
let AppName = "ca-myapp";
ContainerAppSystemLogs_CL
| where ContainerAppName_s == AppName
| where TimeGenerated > ago(2h)
| where Reason_s has_any ("AssigningReplica", "TerminatingReplica", "Scaling")
| summarize count() by bin(TimeGenerated, 5m), Reason_s
| render timechart
Fix:
# Increase max replicas
az containerapp update --name "$APP_NAME" --resource-group "$RG" \
--max-replicas 20
H2: No HTTP scale rule configured¶
Signals that support:
- Scale rules array is empty or null
- Only min/max specified without rules
- Scale to zero is disabled but no scale-out rule
Signals that weaken:
- HTTP rule present in configuration
- KEDAScalersStarted events in system logs
What to verify:
# Check if HTTP scale rule exists
az containerapp show --name "$APP_NAME" --resource-group "$RG" \
--query "properties.template.scale.rules" --output json
# Expected output for HTTP rule:
# [
# {
# "name": "http-rule",
# "http": {
# "metadata": {
# "concurrentRequests": "10"
# }
# }
# }
# ]
// Check for KEDA scaler activity
let AppName = "ca-myapp";
ContainerAppSystemLogs_CL
| where ContainerAppName_s == AppName
| where TimeGenerated > ago(1h)
| where Reason_s == "KEDAScalersStarted"
| project TimeGenerated, Log_s
Fix:
# Add HTTP scale rule
az containerapp update --name "$APP_NAME" --resource-group "$RG" \
--scale-rule-name "http-rule" \
--scale-rule-type "http" \
--scale-rule-http-concurrency 10
# Or with full scale configuration
az containerapp update --name "$APP_NAME" --resource-group "$RG" \
--min-replicas 1 \
--max-replicas 10 \
--scale-rule-name "http-rule" \
--scale-rule-type "http" \
--scale-rule-http-concurrency 10
H3: Test traffic bypassing app¶
Signals that support:
- Load test shows requests sent, but no app logs
- Testing wrong URL (e.g., old revision, wrong environment)
- DNS not resolving to correct environment
Signals that weaken:
- App logs show request surge matching test
- Latency increases correlate with test timing
What to verify:
# Get correct FQDN
APP_FQDN=$(az containerapp show --name "$APP_NAME" --resource-group "$RG" \
--query "properties.configuration.ingress.fqdn" --output tsv)
echo "Correct FQDN: $APP_FQDN"
# Test connectivity
curl -s -o /dev/null -w "%{http_code}" "https://${APP_FQDN}/health"
# Check DNS resolution
nslookup "$APP_FQDN"
// Check if requests are hitting the app
let AppName = "ca-myapp";
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == AppName
| where TimeGenerated > ago(30m)
| where Log_s has_any ("GET", "POST", "request", "200", "404")
| summarize RequestCount=count() by bin(TimeGenerated, 1m)
| render timechart
Fix:
- Verify load test targets correct FQDN
- Check if ingress is external (for internet tests) or internal (for VNet tests)
- Ensure revision receiving traffic is the one with scale rules
H4: Threshold too high¶
Signals that support:
- concurrentRequests set very high (e.g., 100) but traffic is lower
- Replica count stays at min even under load
- Math doesn't work out: 50 concurrent / 100 threshold = 0.5 replicas (rounds to min)
Signals that weaken:
- Threshold is low (e.g., 10) and traffic exceeds it significantly
- Scale-out does happen, just slowly
What to verify:
# Check configured threshold
az containerapp show --name "$APP_NAME" --resource-group "$RG" \
--query "properties.template.scale.rules[?http].http.metadata.concurrentRequests" --output tsv
# Calculate expected replicas
# Formula: ceil(concurrent_requests / threshold)
# Example: 75 concurrent / 10 threshold = 8 replicas needed
Fix:
# Lower the threshold for more aggressive scaling
az containerapp update --name "$APP_NAME" --resource-group "$RG" \
--scale-rule-name "http-rule" \
--scale-rule-type "http" \
--scale-rule-http-concurrency 5 # Lower = more aggressive scaling
H5: Stabilization window delay¶
Signals that support:
- Scale-out happens but takes 30-60+ seconds
- KEDA metrics visible but replica changes lag
- Bursty traffic triggers scaling late
Signals that weaken:
- No scaling even after extended load period
- Scale events don't appear at all
What to verify:
// Check timing between load and scale events
let AppName = "ca-myapp";
ContainerAppSystemLogs_CL
| where ContainerAppName_s == AppName
| where TimeGenerated > ago(1h)
| where Reason_s has_any ("KEDAScaler", "AssigningReplica", "Scaling")
| project TimeGenerated, Reason_s, Log_s
| order by TimeGenerated asc
Explanation:
KEDA has built-in stabilization to prevent thrashing: - Polling interval: Metrics checked every 15-30 seconds - Scale-up stabilization: Wait for metrics to stay above threshold - Scale-down stabilization: Longer wait (5+ minutes) before reducing replicas
This is expected behavior, not a bug.
7. Likely Root Cause Patterns¶
| Pattern | Frequency | First Signal | Typical Resolution |
|---|---|---|---|
| Threshold too high | Very common | Replicas stay at min | Lower concurrentRequests |
| Max replicas too low | Common | Capped at max during load | Increase maxReplicas |
| No scale rule | Common | Empty rules array | Add HTTP scale rule |
| Wrong target URL | Occasional | No app logs during test | Fix load test target |
| Stabilization delay | Expected | Delayed but eventual scale | Not a bug; optimize threshold |
8. Immediate Mitigations¶
-
Quick scale out (manual):
-
Lower threshold for faster response:
-
Increase max if capped:
-
Add HTTP rule if missing:
9. Prevention¶
- Keep baseline load profiles and target thresholds documented per service
- Set up alerts on latency increase without replica growth
- Review scale settings during release readiness
- Load test in staging with realistic traffic patterns before production
- Monitor replica count alongside latency in dashboards
HTTP Scaling Calculation Reference¶
Desired Replicas = ceil(Current Concurrent Requests / concurrentRequests threshold)
Example:
- concurrentRequests: 10
- Current load: 75 concurrent requests
- Desired replicas: ceil(75/10) = 8
Actual replicas = max(minReplicas, min(Desired, maxReplicas))
| Concurrent Requests | Threshold | Desired Replicas |
|---|---|---|
| 10 | 10 | 1 |
| 25 | 10 | 3 |
| 100 | 10 | 10 |
| 100 | 50 | 2 |
| 100 | 100 | 1 |
See Also¶
- Event Scaler Mismatch
- CrashLoop OOM and Resource Pressure
- Ingress Not Reachable
- Scaling Events KQL
- Replica Count Over Time KQL
- Scale Rule Mismatch Lab