Scaling Rule Conflicts: HTTP, CPU, and Queue Scalers¶
Status: Draft - Awaiting Execution
Experiment designed but not yet executed. This draft targets real Azure Container Apps scaling issue patterns reported in GitHub issues #468, #536, and #972: mixed scaling rules behaving unexpectedly, scale-out being driven by one signal while another appears ignored, and scale-in failing when connections or metrics remain artificially active.
1. Question¶
How do multiple Azure Container Apps scaling rules interact in practice? Specifically:
- what happens when HTTP scaling and CPU scaling are configured together
- what happens when HTTP scaling and a queue-driven KEDA scaler are configured together
- why does scale-in sometimes not happen even after user traffic or queue backlog appears to be gone
2. Why this matters¶
This is a frequent support pattern because customers often assume multiple scale rules combine intuitively: HTTP handles request bursts, CPU handles heavy work, and queue scalers handle background load. In reality, the observed behavior is often surprising:
- HTTP traffic appears to stop scaling once CPU scaling is added
- the app scales out correctly but stays over-provisioned long after load stops
- tuning
cooldownPeriodappears ineffective because the scaler never actually sees an inactive state activation*thresholds are misunderstood, causing scalers to remain active or never activate at all
Support engineers need evidence to answer three recurring questions:
- whether HTTP ingress scaling and KEDA-driven custom scaling are handled by the same control path
- whether multiple rules are truly additive or whether one rule dominates practical behavior
- whether failed scale-in is a platform cooldown problem, an active-connection problem, or a metric-threshold problem
3. Customer symptom¶
Typical ticket phrasing:
- "HTTP autoscaling stopped working after I added a CPU rule."
- "The app scales out, but never scales back in."
- "
cooldownPeriodis set, but replicas remain high forever." - "Queue length is zero, yet the app still does not scale down."
- "Everything looks idle except for a few long-lived HTTP/2 clients."
4. Hypothesis¶
- HTTP scaling uses a different internal mechanism than KEDA external/custom scalers, even though both surface as scale rules in Container Apps.
- When HTTP and CPU rules coexist, practical control will often be dominated by the non-HTTP rule once replicas are already active, making HTTP behavior appear ignored.
- When HTTP and queue-based KEDA rules coexist, scale-out may be driven by whichever signal reaches its threshold first, but scale-in will wait until all active signals fall below deactivation conditions.
- HTTP/2 keep-alive or long-lived streams can block scale-in even after meaningful user traffic stops because the ingress layer still considers connections active.
- Lowering
cooldownPeriodalone will not help if metrics never cross back below the effective activation/deactivation boundary. - Misconfigured
activationValue/activationMessageCount/ threshold values frequently create false scale-in expectations.
5. Environment¶
| Parameter | Value |
|---|---|
| Service | Azure Container Apps |
| SKU / Plan | Consumption |
| Region | Korea Central |
| Runtime | Python 3.11 custom container |
| OS | Linux |
| App shape | Single HTTP app with normal, CPU-heavy, long-lived connection, and queue helper endpoints |
| Queue service | Azure Service Bus queue |
| Ingress | External, target port 8080, HTTP/2 enabled for dedicated scenarios |
| Scale rules under test | HTTP, CPU, Azure Service Bus queue |
| Logging | Log Analytics + Container Apps system/console logs |
| Date tested | Not yet executed |
6. Variables¶
Experiment type: Config / behavior comparison across scaling-rule combinations
Controlled:
- same Container Apps environment and region
- same application image and revision baseline
- same CPU/memory sizing unless intentionally changed for threshold sensitivity
- same request generator cadence per scenario
- same queue producer rate per scenario
- same observation window for scale-out and scale-in
- same Log Analytics workspace and KQL queries
Independent variables:
- scaling rule set: HTTP only, CPU only, HTTP+CPU, HTTP+Service Bus
- HTTP concurrency threshold
- CPU utilization threshold
- Service Bus message count threshold
cooldownPeriodandpollingIntervalactivationMessageCounton queue rule- client connection mode: short-lived HTTP/1.1 vs persistent HTTP/2
Observed:
- replica count over time
- revision/replica activation and termination timestamps
- HTTP status, latency, and concurrency seen by the test client
- CPU usage during
/cpuendpoint execution - Service Bus queue depth and dequeue rate
- KEDA/system log events related to scaler activation/deactivation
- whether scale-in occurs, and how long after load removal
7. Instrumentation¶
Planned evidence sources:
- ContainerAppSystemLogs_CL for scaling decisions, replica lifecycle, and scaler-related warnings
- ContainerAppConsoleLogs_CL for app-level markers (
REQUEST_START,CPU_WORK_START,STREAM_OPEN,STREAM_CLOSE,QUEUE_SEND) - Azure CLI for current scale rule configuration, replica inventory, and revision state
- Azure Monitor metrics for CPU and replica count
- Service Bus metrics and queue length inspection for backlog-driven scenarios
- Synthetic load scripts for HTTP/1.1, HTTP/2, CPU-heavy requests, and queue message generation
Recommended application log markers:
REQUEST_NORMALREQUEST_CPU_HEAVYSTREAM_OPENSTREAM_CLOSEQUEUE_ENQUEUEQUEUE_DEQUEUE
Log Analytics queries¶
// Scaling and replica lifecycle timeline
ContainerAppSystemLogs_CL
| where ContainerAppName_s == "ca-scale-rule-conflicts"
| project TimeGenerated, RevisionName_s, ReplicaName_s, Reason_s, Log_s
| order by TimeGenerated asc
// Application-side markers from the test app
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "ca-scale-rule-conflicts"
| project TimeGenerated, RevisionName_s, Log_s
| where Log_s has_any ("REQUEST_", "CPU_WORK_", "STREAM_", "QUEUE_")
| order by TimeGenerated asc
// Suspected scaler / cooldown related entries
ContainerAppSystemLogs_CL
| where ContainerAppName_s == "ca-scale-rule-conflicts"
| where Log_s has_any ("scale", "KEDA", "cooldown", "HTTP", "cpu", "servicebus")
| project TimeGenerated, Reason_s, Log_s, ReplicaName_s
| order by TimeGenerated asc
8. Procedure¶
8.1 Infrastructure setup¶
# Resource group
az group create \
--name rg-aca-scale-rule-conflicts-lab \
--location koreacentral
# Log Analytics workspace
az monitor log-analytics workspace create \
--resource-group rg-aca-scale-rule-conflicts-lab \
--workspace-name law-aca-scale-rule-conflicts \
--location koreacentral
LAW_ID=$(az monitor log-analytics workspace show \
--resource-group rg-aca-scale-rule-conflicts-lab \
--workspace-name law-aca-scale-rule-conflicts \
--query customerId -o tsv)
LAW_KEY=$(az monitor log-analytics workspace get-shared-keys \
--resource-group rg-aca-scale-rule-conflicts-lab \
--workspace-name law-aca-scale-rule-conflicts \
--query primarySharedKey -o tsv)
# Container Apps environment
az containerapp env create \
--name cae-scale-rule-conflicts \
--resource-group rg-aca-scale-rule-conflicts-lab \
--location koreacentral \
--logs-workspace-id "$LAW_ID" \
--logs-workspace-key "$LAW_KEY"
# Azure Container Registry
az acr create \
--name acrscaleruleconflicts \
--resource-group rg-aca-scale-rule-conflicts-lab \
--sku Basic \
--admin-enabled true \
--location koreacentral
# Service Bus namespace and queue for KEDA-backed queue scenarios
az servicebus namespace create \
--name sbscaleruleconflicts \
--resource-group rg-aca-scale-rule-conflicts-lab \
--location koreacentral \
--sku Standard
az servicebus queue create \
--resource-group rg-aca-scale-rule-conflicts-lab \
--namespace-name sbscaleruleconflicts \
--name work-items
SB_CONNECTION_STRING=$(az servicebus namespace authorization-rule keys list \
--resource-group rg-aca-scale-rule-conflicts-lab \
--namespace-name sbscaleruleconflicts \
--name RootManageSharedAccessKey \
--query primaryConnectionString -o tsv)
8.2 Test application¶
The app must expose four behaviors in a single revision so that only scaling rules change between scenarios:
/— lightweight normal endpoint/cpu?seconds=30— CPU-heavy endpoint that spins for a controllable duration/stream?seconds=300— long-lived streaming endpoint for persistent connection tests/enqueue?count=100— helper endpoint that sends messages to Service Bus for queue scenarios
app.py¶
import json
import math
import os
import threading
import time
from datetime import datetime, timezone
from flask import Flask, Response, jsonify, request
from azure.servicebus import ServiceBusClient, ServiceBusMessage
app = Flask(__name__)
SB_CONNECTION_STRING = os.environ.get("SERVICEBUS_CONNECTION_STRING")
SB_QUEUE_NAME = os.environ.get("SERVICEBUS_QUEUE_NAME", "work-items")
def log_event(event, **kwargs):
payload = {
"ts": datetime.now(timezone.utc).isoformat(),
"event": event,
**kwargs,
}
print(json.dumps(payload), flush=True)
@app.route("/")
def index():
log_event("REQUEST_NORMAL")
return jsonify({"status": "ok", "mode": "normal"})
@app.route("/cpu")
def cpu_heavy():
seconds = float(request.args.get("seconds", "30"))
stop_at = time.monotonic() + seconds
iterations = 0
log_event("CPU_WORK_START", seconds=seconds)
value = 0.0001
while time.monotonic() < stop_at:
value = math.sqrt(value + 123.456789)
iterations += 1
log_event("CPU_WORK_END", seconds=seconds, iterations=iterations)
return jsonify({"status": "ok", "mode": "cpu", "seconds": seconds, "iterations": iterations})
@app.route("/stream")
def stream():
seconds = int(request.args.get("seconds", "300"))
interval = int(request.args.get("interval", "5"))
log_event("STREAM_OPEN", seconds=seconds, interval=interval)
def generate():
started = time.monotonic()
try:
while time.monotonic() - started < seconds:
yield f"data: heartbeat {datetime.now(timezone.utc).isoformat()}\\n\\n"
time.sleep(interval)
finally:
log_event("STREAM_CLOSE", seconds=seconds)
return Response(generate(), mimetype="text/event-stream")
@app.route("/enqueue", methods=["POST"])
def enqueue():
count = int(request.args.get("count", "100"))
if not SB_CONNECTION_STRING:
return jsonify({"error": "SERVICEBUS_CONNECTION_STRING not configured"}), 500
with ServiceBusClient.from_connection_string(SB_CONNECTION_STRING) as client:
with client.get_queue_sender(SB_QUEUE_NAME) as sender:
for i in range(count):
sender.send_messages(ServiceBusMessage(f"msg-{i}"))
log_event("QUEUE_ENQUEUE", count=count)
return jsonify({"status": "ok", "enqueued": count})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8080)
worker.py¶
import json
import os
import time
from datetime import datetime, timezone
from azure.servicebus import ServiceBusClient
SB_CONNECTION_STRING = os.environ["SERVICEBUS_CONNECTION_STRING"]
SB_QUEUE_NAME = os.environ.get("SERVICEBUS_QUEUE_NAME", "work-items")
def log_event(event, **kwargs):
payload = {
"ts": datetime.now(timezone.utc).isoformat(),
"event": event,
**kwargs,
}
print(json.dumps(payload), flush=True)
with ServiceBusClient.from_connection_string(SB_CONNECTION_STRING) as client:
receiver = client.get_queue_receiver(queue_name=SB_QUEUE_NAME, max_wait_time=5)
with receiver:
while True:
messages = receiver.receive_messages(max_message_count=10, max_wait_time=5)
if not messages:
log_event("QUEUE_IDLE")
time.sleep(2)
continue
for message in messages:
log_event("QUEUE_DEQUEUE", message_id=str(message.message_id))
time.sleep(2)
receiver.complete_message(message)
requirements.txt¶
Dockerfile¶
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py worker.py ./
EXPOSE 8080
CMD ["sh", "-c", "python worker.py & gunicorn --bind 0.0.0.0:8080 --workers 1 --threads 8 --timeout 0 app:app"]
8.3 Build and push image¶
az acr build \
--registry acrscaleruleconflicts \
--resource-group rg-aca-scale-rule-conflicts-lab \
--image scale-rule-conflicts:v1 \
--file Dockerfile .
8.4 Baseline deployment¶
ACR_USER=$(az acr credential show \
--name acrscaleruleconflicts \
--resource-group rg-aca-scale-rule-conflicts-lab \
--query username -o tsv)
ACR_PASS=$(az acr credential show \
--name acrscaleruleconflicts \
--resource-group rg-aca-scale-rule-conflicts-lab \
--query "passwords[0].value" -o tsv)
az containerapp create \
--name ca-scale-rule-conflicts \
--resource-group rg-aca-scale-rule-conflicts-lab \
--environment cae-scale-rule-conflicts \
--image acrscaleruleconflicts.azurecr.io/scale-rule-conflicts:v1 \
--registry-server acrscaleruleconflicts.azurecr.io \
--registry-username "$ACR_USER" \
--registry-password "$ACR_PASS" \
--target-port 8080 \
--ingress external \
--min-replicas 0 \
--max-replicas 10 \
--cpu 0.5 \
--memory 1.0Gi \
--env-vars SERVICEBUS_CONNECTION_STRING="$SB_CONNECTION_STRING" SERVICEBUS_QUEUE_NAME=work-items \
--scale-rule-name http-rule \
--scale-rule-type http \
--scale-rule-http-concurrency 10
APP_URL=$(az containerapp show \
--name ca-scale-rule-conflicts \
--resource-group rg-aca-scale-rule-conflicts-lab \
--query properties.configuration.ingress.fqdn -o tsv)
8.5 Scenario matrix¶
| Scenario | Scale rules | Primary load source | Expected discriminator |
|---|---|---|---|
| 1. HTTP only | HTTP concurrency only | GET / burst |
Baseline for normal HTTP scale-out/scale-in |
| 2. CPU only | CPU rule only | GET /cpu |
Baseline for KEDA-style resource scaling |
| 3. HTTP + CPU | Both HTTP and CPU | mixed / + /cpu |
Determine whether HTTP scaling appears subordinated or ignored |
| 4. HTTP + Service Bus | HTTP + Service Bus queue | / burst + queue backlog |
Determine whether both signals can keep replicas active |
| 5. HTTP/2 scale-in | HTTP rule only or HTTP + CPU | long-lived /stream clients |
Determine whether active connections prevent scale-in |
8.6 Scale rule update commands per scenario¶
Scenario 1: HTTP only¶
az containerapp update \
--name ca-scale-rule-conflicts \
--resource-group rg-aca-scale-rule-conflicts-lab \
--min-replicas 0 \
--max-replicas 10 \
--scale-rule-name http-rule \
--scale-rule-type http \
--scale-rule-http-concurrency 10
Scenario 2: CPU only¶
az containerapp update \
--name ca-scale-rule-conflicts \
--resource-group rg-aca-scale-rule-conflicts-lab \
--scale-rule-name cpu-rule \
--scale-rule-type cpu \
--scale-rule-metadata type=Utilization value=60
Scenario 3: HTTP + CPU¶
Use YAML because multiple scale rules are easier to express explicitly.
properties:
template:
scale:
minReplicas: 0
maxReplicas: 10
rules:
- name: http-rule
http:
metadata:
concurrentRequests: "10"
- name: cpu-rule
custom:
type: cpu
metadata:
type: Utilization
value: "60"
az containerapp show \
--name ca-scale-rule-conflicts \
--resource-group rg-aca-scale-rule-conflicts-lab \
--output yaml > scenario-http-cpu.yaml
# Replace the scale block with the YAML above, then apply:
az containerapp update \
--name ca-scale-rule-conflicts \
--resource-group rg-aca-scale-rule-conflicts-lab \
--yaml scenario-http-cpu.yaml
Scenario 4: HTTP + Service Bus queue¶
properties:
configuration:
secrets:
- name: sb-connection
value: <service-bus-connection-string>
template:
scale:
minReplicas: 0
maxReplicas: 10
rules:
- name: http-rule
http:
metadata:
concurrentRequests: "10"
- name: sb-rule
custom:
type: azure-servicebus
auth:
- secretRef: sb-connection
triggerParameter: connection
metadata:
namespace: sbscaleruleconflicts
queueName: work-items
messageCount: "5"
activationMessageCount: "0"
az containerapp show \
--name ca-scale-rule-conflicts \
--resource-group rg-aca-scale-rule-conflicts-lab \
--output yaml > scenario-http-sb.yaml
# Replace the scale block and secret value, then apply:
az containerapp update \
--name ca-scale-rule-conflicts \
--resource-group rg-aca-scale-rule-conflicts-lab \
--yaml scenario-http-sb.yaml
Scenario 5: cooldown and activation retest¶
Repeat Scenarios 3 and 4 with these controlled variations:
- reduce
cooldownPeriodto30 - increase
cooldownPeriodto300 - change queue
activationMessageCountfrom0to5 - change HTTP load from short-lived requests to long-lived
/streamclients
8.7 Load scripts¶
Short-lived HTTP burst¶
#!/usr/bin/env bash
set -euo pipefail
URL="$1"
TOTAL="${2:-200}"
CONCURRENCY="${3:-40}"
seq "$TOTAL" | xargs -P "$CONCURRENCY" -I{} \
curl -skS "https://$URL/" -o /dev/null -w '%{http_code} %{time_total}\n'
CPU-heavy traffic¶
#!/usr/bin/env bash
set -euo pipefail
URL="$1"
REQUESTS="${2:-20}"
SECONDS_PER_REQUEST="${3:-30}"
PARALLEL="${4:-10}"
seq "$REQUESTS" | xargs -P "$PARALLEL" -I{} \
curl -skS "https://$URL/cpu?seconds=$SECONDS_PER_REQUEST" -o /dev/null -w '%{http_code} %{time_total}\n'
Queue backlog creation¶
#!/usr/bin/env bash
set -euo pipefail
URL="$1"
COUNT="${2:-200}"
curl -skS -X POST "https://$URL/enqueue?count=$COUNT"
Long-lived HTTP/2 clients¶
#!/usr/bin/env bash
set -euo pipefail
URL="$1"
CLIENTS="${2:-5}"
SECONDS="${3:-300}"
for n in $(seq 1 "$CLIENTS"); do
curl --http2 -skN "https://$URL/stream?seconds=$SECONDS&interval=5" \
> "stream-client-$n.log" 2>&1 &
done
wait
8.8 Per-scenario execution steps¶
For each scenario:
- Apply the intended scale-rule configuration.
- Confirm the active revision and scale rule block with
az containerapp show. - Wait for the app to scale to baseline (
0or1replicas as configured). - Start the relevant load script.
- Record replica count every 15 seconds with
az containerapp replica list. - Stop the load source and continue observing for at least 15 minutes.
- Query system and console logs immediately after the observation window.
- Repeat at least three independent runs per scenario.
9. Expected signal¶
- Scenario 1: HTTP only should scale out during request bursts and scale in after requests stop and active connections disappear.
- Scenario 2: CPU only should scale out only when
/cputraffic drives sustained CPU above threshold; scale-in should follow after CPU drops and cooldown expires. - Scenario 3: HTTP + CPU should reveal whether CPU activity keeps replicas active even after HTTP burst traffic is gone, making HTTP scale-in appear broken.
- Scenario 4: HTTP + Service Bus should show whether queue backlog or lingering worker activity keeps replicas alive after front-end traffic stops.
- Scenario 5: HTTP/2 should show that long-lived connections delay or block scale-in even with low request rate.
- Retesting with altered
cooldownPeriodshould show little or no effect when the scaler remains logically active.
10. Results¶
Pending execution.
Capture the following tables during execution.
10.1 Replica timeline summary¶
| Scenario | Run | First scale-out time | Peak replicas | Load stop time | Scale-in complete time | Scale-in blocked? |
|---|---|---|---|---|---|---|
| HTTP only | 1 | |||||
| CPU only | 1 | |||||
| HTTP + CPU | 1 | |||||
| HTTP + Service Bus | 1 | |||||
| HTTP/2 keep-alive | 1 |
10.2 Dominant-signal observations¶
| Scenario | HTTP traffic present | CPU above threshold | Queue backlog present | Expected inactive point | Actual inactive point |
|---|---|---|---|---|---|
| HTTP + CPU | n/a | ||||
| HTTP + Service Bus | n/a | ||||
| HTTP/2 keep-alive | optional | n/a |
10.3 Representative log excerpts¶
- system log excerpt showing scale-out trigger
- system log excerpt showing failed or delayed scale-in
- console log excerpt showing
STREAM_OPENwithoutSTREAM_CLOSE - queue worker logs showing
QUEUE_IDLEdespite lingering replicas
11. Interpretation¶
Pending execution. Interpret results using explicit evidence tags.
Planned interpretation questions:
- Observed / Measured: Did HTTP-only and CPU-only baselines behave normally on their own?
- Observed / Inferred: In mixed-rule scenarios, did one metric remain active and therefore prevent overall scale-in?
- Observed / Strongly Suggested: Did HTTP/2 streams correlate with blocked scale-in despite near-zero request volume?
- Not Proven / Unknown: If HTTP scaling appears ignored, is the cause controller precedence, metric sampling cadence, or threshold design?
12. What this proves¶
After execution, this experiment should be able to prove only the following kinds of claims:
- whether the tested rule combinations in this environment scaled out as expected
- whether scale-in required all active signals to become inactive
- whether long-lived HTTP/2 connections correlated with delayed scale-in
- whether
cooldownPeriodchanges alone changed behavior when metrics remained active
13. What this does NOT prove¶
Even after execution, this experiment will not by itself prove:
- the exact internal Azure control-plane implementation behind HTTP scaling
- that all Container Apps regions or environments behave identically
- that Event Hub scaling bugs are identical to Service Bus scaling behavior
- that every apparent conflict is a platform bug rather than a threshold-design issue
Issue #972 is included as motivation for queue-scaler scale-down anomalies, but this design uses Service Bus for easier controlled reproduction. Any similarity to Event Hub behavior would be correlated, not automatically equivalent.
14. Support takeaway¶
For cases involving unexpected Container Apps scaling behavior, support engineers should check:
- whether multiple scale signals are configured and which one is still active
- whether the customer expects
cooldownPeriodto override an active metric - whether HTTP/2, SSE, gRPC, or other long-lived connections are keeping ingress activity alive
- whether queue scaler
activation*values and thresholds are aligned with actual idle conditions - whether the customer has baseline evidence from single-rule testing before combining rules
Practical triage order:
- reproduce with HTTP only
- reproduce with CPU only or queue only
- combine rules only after each single-rule baseline is understood
- inspect active connections and scaler thresholds before escalating as a platform defect
15. Reproduction notes¶
- Keep revision mode fixed during all runs to avoid mixing revision-transition behavior with scaling behavior.
- Use the same image and same environment for every scenario.
- Wait long enough after stopping load; scale-in investigations often fail because observation stops too early.
- For HTTP/2 tests, prefer clients that keep connections open intentionally; short curl bursts may not reproduce the issue.
- Record exact rule configuration for every run because small threshold differences can completely change conclusions.
- If scale rules must be edited through YAML, archive the exact applied YAML alongside captured logs.