HTTP Scaling Not Triggering¶

1. Summary¶

Symptom¶

Request load increases significantly but replica count remains flat or scales too slowly. Users experience high latency while the app appears under-provisioned. Load tests do not produce expected scale-out behavior. KEDA scalers are configured but seem inactive.

Why this scenario is confusing¶

HTTP scaling involves multiple components: KEDA metrics server, stabilization windows, threshold calculations, and min/max boundaries. A misconfiguration in any of these can make it look like "scaling is broken" when it's actually working as configured—just not as intended.

Troubleshooting decision flow¶

graph TD
    A[Symptom: Latency high, replicas flat] --> B{maxReplicas reached?}
    B -->|Yes| H1[H1: Max replicas too low]
    B -->|No| C{HTTP scale rule exists?}
    C -->|No| H2[H2: No HTTP scale rule configured]
    C -->|Yes| D{Traffic hitting ingress?}
    D -->|No| H3[H3: Test traffic bypassing app]
    D -->|Yes| E{Threshold realistic for load?}
    E -->|No| H4[H4: Threshold too high]
    E -->|Yes| H5[H5: Stabilization window or polling delay]

2. Common Misreadings¶

"KEDA is broken" — Most incidents are min/max boundaries, threshold mismatches, or test configuration issues.
"Any traffic should instantly scale out" — HTTP scaling has polling intervals (15-30s) and stabilization windows.
"Replicas didn't increase, so something is wrong" — If load is below threshold, no scaling is expected behavior.
"I set concurrentRequests=10, so 100 requests should give me 10 replicas" — It's concurrent requests per replica, not total requests.
"Scale worked in staging but not production" — Different traffic patterns, DNS, or ingress configurations.

3. Competing Hypotheses¶

Hypothesis	Typical Evidence For	Typical Evidence Against
H1: Max replicas too low	Replica count capped at configured max during load	Replicas remain well below max
H2: No HTTP scale rule configured	No scale rules in template, only min/max	HTTP rule exists in configuration
H3: Test traffic bypassing app	No request logs during load test	App logs show request surge
H4: Threshold too high	Concurrent requests < threshold × replica count	Traffic exceeds threshold significantly
H5: Stabilization window delay	Scale-out happens but delayed by 30-60 seconds	Immediate scaling when expected

4. What to Check First¶

Metrics¶

Request rate and concurrent connections over time
P95/P99 latency correlation with replica count
Replica count timeline during load test

Logs¶

let AppName = "ca-myapp";
ContainerAppSystemLogs_CL
| where ContainerAppName_s == AppName
| where TimeGenerated > ago(1h)
| where Reason_s has_any ("KEDAScaler", "Scaling", "Replica")
   or Log_s has_any ("scale", "keda", "replica", "http")
| project TimeGenerated, RevisionName_s, Reason_s, Log_s
| order by TimeGenerated desc

Platform Signals¶

# Check scale configuration
az containerapp show --name "$APP_NAME" --resource-group "$RG" \
  --query "properties.template.scale" --output json

# Check current replica count
az containerapp replica list --name "$APP_NAME" --resource-group "$RG" --output table

# Check system logs for scaling events
az containerapp logs show --name "$APP_NAME" --resource-group "$RG" --type system

5. Evidence to Collect¶

Required Evidence¶

Evidence	Command/Query	Purpose
Scale config	`az containerapp show ... --query template.scale`	Verify min/max/rules
Replica count	`az containerapp replica list`	Current state
System logs	KQL for scaling events	KEDA activity
Request logs	Console logs during load	Traffic confirmation
Ingress metrics	Azure Portal metrics	Request rate

Useful Context¶

Expected concurrent requests during peak
Load test tool and configuration
Target FQDN being tested
Previous scaling behavior (if any)

6. Validation and Disproof by Hypothesis¶

H1: Max replicas too low¶

Signals that support:

Replica count reaches exactly maxReplicas and stays there
Latency keeps increasing while at max
System logs show no further scale-out attempts

Signals that weaken:

Replica count stays well below max
Max is set high (e.g., 30) but only 2 replicas running

What to verify:

# Check min/max configuration
az containerapp show --name "$APP_NAME" --resource-group "$RG" \
  --query "properties.template.scale.{min:minReplicas,max:maxReplicas}" --output json

# Check if currently at max
CURRENT=$(az containerapp replica list --name "$APP_NAME" --resource-group "$RG" --query "length(@)" --output tsv)
MAX=$(az containerapp show --name "$APP_NAME" --resource-group "$RG" --query "properties.template.scale.maxReplicas" --output tsv)
echo "Current: $CURRENT, Max: $MAX"

// Check replica count over time
let AppName = "ca-myapp";
ContainerAppSystemLogs_CL
| where ContainerAppName_s == AppName
| where TimeGenerated > ago(2h)
| where Reason_s has_any ("AssigningReplica", "TerminatingReplica", "Scaling")
| summarize count() by bin(TimeGenerated, 5m), Reason_s
| render timechart

Fix:

# Increase max replicas
az containerapp update --name "$APP_NAME" --resource-group "$RG" \
  --max-replicas 20

H2: No HTTP scale rule configured¶

Signals that support:

Scale rules array is empty or null
Only min/max specified without rules
Scale to zero is disabled but no scale-out rule

Signals that weaken:

HTTP rule present in configuration
KEDAScalersStarted events in system logs

What to verify:

# Check if HTTP scale rule exists
az containerapp show --name "$APP_NAME" --resource-group "$RG" \
  --query "properties.template.scale.rules" --output json

# Expected output for HTTP rule:
# [
#   {
#     "name": "http-rule",
#     "http": {
#       "metadata": {
#         "concurrentRequests": "10"
#       }
#     }
#   }
# ]

// Check for KEDA scaler activity
let AppName = "ca-myapp";
ContainerAppSystemLogs_CL
| where ContainerAppName_s == AppName
| where TimeGenerated > ago(1h)
| where Reason_s == "KEDAScalersStarted"
| project TimeGenerated, Log_s

Fix:

# Add HTTP scale rule
az containerapp update --name "$APP_NAME" --resource-group "$RG" \
  --scale-rule-name "http-rule" \
  --scale-rule-type "http" \
  --scale-rule-http-concurrency 10

# Or with full scale configuration
az containerapp update --name "$APP_NAME" --resource-group "$RG" \
  --min-replicas 1 \
  --max-replicas 10 \
  --scale-rule-name "http-rule" \
  --scale-rule-type "http" \
  --scale-rule-http-concurrency 10

H3: Test traffic bypassing app¶

Signals that support:

Load test shows requests sent, but no app logs
Testing wrong URL (e.g., old revision, wrong environment)
DNS not resolving to correct environment

Signals that weaken:

App logs show request surge matching test
Latency increases correlate with test timing

What to verify:

# Get correct FQDN
APP_FQDN=$(az containerapp show --name "$APP_NAME" --resource-group "$RG" \
  --query "properties.configuration.ingress.fqdn" --output tsv)
echo "Correct FQDN: $APP_FQDN"

# Test connectivity
curl -s -o /dev/null -w "%{http_code}" "https://${APP_FQDN}/health"

# Check DNS resolution
nslookup "$APP_FQDN"

// Check if requests are hitting the app
let AppName = "ca-myapp";
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == AppName
| where TimeGenerated > ago(30m)
| where Log_s has_any ("GET", "POST", "request", "200", "404")
| summarize RequestCount=count() by bin(TimeGenerated, 1m)
| render timechart

Fix:

Verify load test targets correct FQDN
Check if ingress is external (for internet tests) or internal (for VNet tests)
Ensure revision receiving traffic is the one with scale rules

H4: Threshold too high¶

Signals that support:

concurrentRequests set very high (e.g., 100) but traffic is lower
Replica count stays at min even under load
Math doesn't work out: 50 concurrent / 100 threshold = 0.5 replicas (rounds to min)

Signals that weaken:

Threshold is low (e.g., 10) and traffic exceeds it significantly
Scale-out does happen, just slowly

What to verify:

# Check configured threshold
az containerapp show --name "$APP_NAME" --resource-group "$RG" \
  --query "properties.template.scale.rules[?http].http.metadata.concurrentRequests" --output tsv

# Calculate expected replicas
# Formula: ceil(concurrent_requests / threshold)
# Example: 75 concurrent / 10 threshold = 8 replicas needed

Fix:

# Lower the threshold for more aggressive scaling
az containerapp update --name "$APP_NAME" --resource-group "$RG" \
  --scale-rule-name "http-rule" \
  --scale-rule-type "http" \
  --scale-rule-http-concurrency 5  # Lower = more aggressive scaling

H5: Stabilization window delay¶

Signals that support:

Scale-out happens but takes 30-60+ seconds
KEDA metrics visible but replica changes lag
Bursty traffic triggers scaling late

Signals that weaken:

No scaling even after extended load period
Scale events don't appear at all

What to verify:

// Check timing between load and scale events
let AppName = "ca-myapp";
ContainerAppSystemLogs_CL
| where ContainerAppName_s == AppName
| where TimeGenerated > ago(1h)
| where Reason_s has_any ("KEDAScaler", "AssigningReplica", "Scaling")
| project TimeGenerated, Reason_s, Log_s
| order by TimeGenerated asc

Explanation:

KEDA has built-in stabilization to prevent thrashing: - Polling interval: Metrics checked every 15-30 seconds - Scale-up stabilization: Wait for metrics to stay above threshold - Scale-down stabilization: Longer wait (5+ minutes) before reducing replicas

This is expected behavior, not a bug.

7. Likely Root Cause Patterns¶

Pattern	Frequency	First Signal	Typical Resolution
Threshold too high	Very common	Replicas stay at min	Lower concurrentRequests
Max replicas too low	Common	Capped at max during load	Increase maxReplicas
No scale rule	Common	Empty rules array	Add HTTP scale rule
Wrong target URL	Occasional	No app logs during test	Fix load test target
Stabilization delay	Expected	Delayed but eventual scale	Not a bug; optimize threshold

8. Immediate Mitigations¶

Quick scale out (manual):

az containerapp update --name "$APP_NAME" --resource-group "$RG" \
  --min-replicas 5  # Force minimum higher

Lower threshold for faster response:

az containerapp update --name "$APP_NAME" --resource-group "$RG" \
  --scale-rule-name "http-rule" \
  --scale-rule-type "http" \
  --scale-rule-http-concurrency 5

Increase max if capped:

az containerapp update --name "$APP_NAME" --resource-group "$RG" \
  --max-replicas 30

Add HTTP rule if missing:

az containerapp update --name "$APP_NAME" --resource-group "$RG" \
  --scale-rule-name "http-rule" \
  --scale-rule-type "http" \
  --scale-rule-http-concurrency 10

9. Prevention¶

Keep baseline load profiles and target thresholds documented per service
Set up alerts on latency increase without replica growth
Review scale settings during release readiness
Load test in staging with realistic traffic patterns before production
Monitor replica count alongside latency in dashboards

HTTP Scaling Calculation Reference¶

Desired Replicas = ceil(Current Concurrent Requests / concurrentRequests threshold)

Example:
- concurrentRequests: 10
- Current load: 75 concurrent requests
- Desired replicas: ceil(75/10) = 8

Actual replicas = max(minReplicas, min(Desired, maxReplicas))

Concurrent Requests	Threshold	Desired Replicas
10	10	1
25	10	3
100	10	10
100	50	2
100	100	1

HTTP Scaling Not Triggering¶

1. Summary¶

Symptom¶

Why this scenario is confusing¶

Troubleshooting decision flow¶

2. Common Misreadings¶

3. Competing Hypotheses¶

4. What to Check First¶

Metrics¶

Logs¶

Platform Signals¶

5. Evidence to Collect¶

Required Evidence¶

Useful Context¶

6. Validation and Disproof by Hypothesis¶

H1: Max replicas too low¶

H2: No HTTP scale rule configured¶

H3: Test traffic bypassing app¶

H4: Threshold too high¶

H5: Stabilization window delay¶

7. Likely Root Cause Patterns¶

8. Immediate Mitigations¶

9. Prevention¶

HTTP Scaling Calculation Reference¶

See Also¶

Sources¶