Scaling Rule Conflicts: HTTP, CPU, and Queue Scalers¶

Status: Draft - Awaiting Execution

Experiment designed but not yet executed. This draft targets real Azure Container Apps scaling issue patterns reported in GitHub issues #468, #536, and #972: mixed scaling rules behaving unexpectedly, scale-out being driven by one signal while another appears ignored, and scale-in failing when connections or metrics remain artificially active.

1. Question¶

How do multiple Azure Container Apps scaling rules interact in practice? Specifically:

what happens when HTTP scaling and CPU scaling are configured together
what happens when HTTP scaling and a queue-driven KEDA scaler are configured together
why does scale-in sometimes not happen even after user traffic or queue backlog appears to be gone

2. Why this matters¶

This is a frequent support pattern because customers often assume multiple scale rules combine intuitively: HTTP handles request bursts, CPU handles heavy work, and queue scalers handle background load. In reality, the observed behavior is often surprising:

HTTP traffic appears to stop scaling once CPU scaling is added
the app scales out correctly but stays over-provisioned long after load stops
tuning cooldownPeriod appears ineffective because the scaler never actually sees an inactive state
activation* thresholds are misunderstood, causing scalers to remain active or never activate at all

Support engineers need evidence to answer three recurring questions:

whether HTTP ingress scaling and KEDA-driven custom scaling are handled by the same control path
whether multiple rules are truly additive or whether one rule dominates practical behavior
whether failed scale-in is a platform cooldown problem, an active-connection problem, or a metric-threshold problem

3. Customer symptom¶

Typical ticket phrasing:

"HTTP autoscaling stopped working after I added a CPU rule."
"The app scales out, but never scales back in."
"cooldownPeriod is set, but replicas remain high forever."
"Queue length is zero, yet the app still does not scale down."
"Everything looks idle except for a few long-lived HTTP/2 clients."

4. Hypothesis¶

HTTP scaling uses a different internal mechanism than KEDA external/custom scalers, even though both surface as scale rules in Container Apps.
When HTTP and CPU rules coexist, practical control will often be dominated by the non-HTTP rule once replicas are already active, making HTTP behavior appear ignored.
When HTTP and queue-based KEDA rules coexist, scale-out may be driven by whichever signal reaches its threshold first, but scale-in will wait until all active signals fall below deactivation conditions.
HTTP/2 keep-alive or long-lived streams can block scale-in even after meaningful user traffic stops because the ingress layer still considers connections active.
Lowering cooldownPeriod alone will not help if metrics never cross back below the effective activation/deactivation boundary.
Misconfigured activationValue / activationMessageCount / threshold values frequently create false scale-in expectations.

5. Environment¶

Parameter	Value
Service	Azure Container Apps
SKU / Plan	Consumption
Region	Korea Central
Runtime	Python 3.11 custom container
OS	Linux
App shape	Single HTTP app with normal, CPU-heavy, long-lived connection, and queue helper endpoints
Queue service	Azure Service Bus queue
Ingress	External, target port `8080`, HTTP/2 enabled for dedicated scenarios
Scale rules under test	HTTP, CPU, Azure Service Bus queue
Logging	Log Analytics + Container Apps system/console logs
Date tested	Not yet executed

6. Variables¶

Experiment type: Config / behavior comparison across scaling-rule combinations

Controlled:

same Container Apps environment and region
same application image and revision baseline
same CPU/memory sizing unless intentionally changed for threshold sensitivity
same request generator cadence per scenario
same queue producer rate per scenario
same observation window for scale-out and scale-in
same Log Analytics workspace and KQL queries

Independent variables:

scaling rule set: HTTP only, CPU only, HTTP+CPU, HTTP+Service Bus
HTTP concurrency threshold
CPU utilization threshold
Service Bus message count threshold
cooldownPeriod and pollingInterval
activationMessageCount on queue rule
client connection mode: short-lived HTTP/1.1 vs persistent HTTP/2

Observed:

replica count over time
revision/replica activation and termination timestamps
HTTP status, latency, and concurrency seen by the test client
CPU usage during /cpu endpoint execution
Service Bus queue depth and dequeue rate
KEDA/system log events related to scaler activation/deactivation
whether scale-in occurs, and how long after load removal

7. Instrumentation¶

Planned evidence sources:

ContainerAppSystemLogs_CL for scaling decisions, replica lifecycle, and scaler-related warnings
ContainerAppConsoleLogs_CL for app-level markers (REQUEST_START, CPU_WORK_START, STREAM_OPEN, STREAM_CLOSE, QUEUE_SEND)
Azure CLI for current scale rule configuration, replica inventory, and revision state
Azure Monitor metrics for CPU and replica count
Service Bus metrics and queue length inspection for backlog-driven scenarios
Synthetic load scripts for HTTP/1.1, HTTP/2, CPU-heavy requests, and queue message generation

Recommended application log markers:

REQUEST_NORMAL
REQUEST_CPU_HEAVY
STREAM_OPEN
STREAM_CLOSE
QUEUE_ENQUEUE
QUEUE_DEQUEUE

Log Analytics queries¶

// Scaling and replica lifecycle timeline
ContainerAppSystemLogs_CL
| where ContainerAppName_s == "ca-scale-rule-conflicts"
| project TimeGenerated, RevisionName_s, ReplicaName_s, Reason_s, Log_s
| order by TimeGenerated asc

// Application-side markers from the test app
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "ca-scale-rule-conflicts"
| project TimeGenerated, RevisionName_s, Log_s
| where Log_s has_any ("REQUEST_", "CPU_WORK_", "STREAM_", "QUEUE_")
| order by TimeGenerated asc

// Suspected scaler / cooldown related entries
ContainerAppSystemLogs_CL
| where ContainerAppName_s == "ca-scale-rule-conflicts"
| where Log_s has_any ("scale", "KEDA", "cooldown", "HTTP", "cpu", "servicebus")
| project TimeGenerated, Reason_s, Log_s, ReplicaName_s
| order by TimeGenerated asc

8. Procedure¶

8.1 Infrastructure setup¶

# Resource group
az group create \
  --name rg-aca-scale-rule-conflicts-lab \
  --location koreacentral

# Log Analytics workspace
az monitor log-analytics workspace create \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --workspace-name law-aca-scale-rule-conflicts \
  --location koreacentral

LAW_ID=$(az monitor log-analytics workspace show \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --workspace-name law-aca-scale-rule-conflicts \
  --query customerId -o tsv)

LAW_KEY=$(az monitor log-analytics workspace get-shared-keys \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --workspace-name law-aca-scale-rule-conflicts \
  --query primarySharedKey -o tsv)

# Container Apps environment
az containerapp env create \
  --name cae-scale-rule-conflicts \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --location koreacentral \
  --logs-workspace-id "$LAW_ID" \
  --logs-workspace-key "$LAW_KEY"

# Azure Container Registry
az acr create \
  --name acrscaleruleconflicts \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --sku Basic \
  --admin-enabled true \
  --location koreacentral

# Service Bus namespace and queue for KEDA-backed queue scenarios
az servicebus namespace create \
  --name sbscaleruleconflicts \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --location koreacentral \
  --sku Standard

az servicebus queue create \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --namespace-name sbscaleruleconflicts \
  --name work-items

SB_CONNECTION_STRING=$(az servicebus namespace authorization-rule keys list \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --namespace-name sbscaleruleconflicts \
  --name RootManageSharedAccessKey \
  --query primaryConnectionString -o tsv)

8.2 Test application¶

The app must expose four behaviors in a single revision so that only scaling rules change between scenarios:

/ — lightweight normal endpoint
/cpu?seconds=30 — CPU-heavy endpoint that spins for a controllable duration
/stream?seconds=300 — long-lived streaming endpoint for persistent connection tests
/enqueue?count=100 — helper endpoint that sends messages to Service Bus for queue scenarios

app.py¶

import json
import math
import os
import threading
import time
from datetime import datetime, timezone

from flask import Flask, Response, jsonify, request
from azure.servicebus import ServiceBusClient, ServiceBusMessage

app = Flask(__name__)

SB_CONNECTION_STRING = os.environ.get("SERVICEBUS_CONNECTION_STRING")
SB_QUEUE_NAME = os.environ.get("SERVICEBUS_QUEUE_NAME", "work-items")


def log_event(event, **kwargs):
    payload = {
        "ts": datetime.now(timezone.utc).isoformat(),
        "event": event,
        **kwargs,
    }
    print(json.dumps(payload), flush=True)


@app.route("/")
def index():
    log_event("REQUEST_NORMAL")
    return jsonify({"status": "ok", "mode": "normal"})


@app.route("/cpu")
def cpu_heavy():
    seconds = float(request.args.get("seconds", "30"))
    stop_at = time.monotonic() + seconds
    iterations = 0
    log_event("CPU_WORK_START", seconds=seconds)
    value = 0.0001
    while time.monotonic() < stop_at:
        value = math.sqrt(value + 123.456789)
        iterations += 1
    log_event("CPU_WORK_END", seconds=seconds, iterations=iterations)
    return jsonify({"status": "ok", "mode": "cpu", "seconds": seconds, "iterations": iterations})


@app.route("/stream")
def stream():
    seconds = int(request.args.get("seconds", "300"))
    interval = int(request.args.get("interval", "5"))
    log_event("STREAM_OPEN", seconds=seconds, interval=interval)

    def generate():
        started = time.monotonic()
        try:
            while time.monotonic() - started < seconds:
                yield f"data: heartbeat {datetime.now(timezone.utc).isoformat()}\\n\\n"
                time.sleep(interval)
        finally:
            log_event("STREAM_CLOSE", seconds=seconds)

    return Response(generate(), mimetype="text/event-stream")


@app.route("/enqueue", methods=["POST"])
def enqueue():
    count = int(request.args.get("count", "100"))
    if not SB_CONNECTION_STRING:
        return jsonify({"error": "SERVICEBUS_CONNECTION_STRING not configured"}), 500

    with ServiceBusClient.from_connection_string(SB_CONNECTION_STRING) as client:
        with client.get_queue_sender(SB_QUEUE_NAME) as sender:
            for i in range(count):
                sender.send_messages(ServiceBusMessage(f"msg-{i}"))

    log_event("QUEUE_ENQUEUE", count=count)
    return jsonify({"status": "ok", "enqueued": count})


if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8080)

worker.py¶

import json
import os
import time
from datetime import datetime, timezone

from azure.servicebus import ServiceBusClient

SB_CONNECTION_STRING = os.environ["SERVICEBUS_CONNECTION_STRING"]
SB_QUEUE_NAME = os.environ.get("SERVICEBUS_QUEUE_NAME", "work-items")


def log_event(event, **kwargs):
    payload = {
        "ts": datetime.now(timezone.utc).isoformat(),
        "event": event,
        **kwargs,
    }
    print(json.dumps(payload), flush=True)


with ServiceBusClient.from_connection_string(SB_CONNECTION_STRING) as client:
    receiver = client.get_queue_receiver(queue_name=SB_QUEUE_NAME, max_wait_time=5)
    with receiver:
        while True:
            messages = receiver.receive_messages(max_message_count=10, max_wait_time=5)
            if not messages:
                log_event("QUEUE_IDLE")
                time.sleep(2)
                continue

            for message in messages:
                log_event("QUEUE_DEQUEUE", message_id=str(message.message_id))
                time.sleep(2)
                receiver.complete_message(message)

requirements.txt¶

flask==3.1.0
gunicorn==23.0.0
azure-servicebus==7.14.2

Dockerfile¶

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py worker.py ./
EXPOSE 8080
CMD ["sh", "-c", "python worker.py & gunicorn --bind 0.0.0.0:8080 --workers 1 --threads 8 --timeout 0 app:app"]

8.3 Build and push image¶

az acr build \
  --registry acrscaleruleconflicts \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --image scale-rule-conflicts:v1 \
  --file Dockerfile .

8.4 Baseline deployment¶

ACR_USER=$(az acr credential show \
  --name acrscaleruleconflicts \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --query username -o tsv)

ACR_PASS=$(az acr credential show \
  --name acrscaleruleconflicts \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --query "passwords[0].value" -o tsv)

az containerapp create \
  --name ca-scale-rule-conflicts \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --environment cae-scale-rule-conflicts \
  --image acrscaleruleconflicts.azurecr.io/scale-rule-conflicts:v1 \
  --registry-server acrscaleruleconflicts.azurecr.io \
  --registry-username "$ACR_USER" \
  --registry-password "$ACR_PASS" \
  --target-port 8080 \
  --ingress external \
  --min-replicas 0 \
  --max-replicas 10 \
  --cpu 0.5 \
  --memory 1.0Gi \
  --env-vars SERVICEBUS_CONNECTION_STRING="$SB_CONNECTION_STRING" SERVICEBUS_QUEUE_NAME=work-items \
  --scale-rule-name http-rule \
  --scale-rule-type http \
  --scale-rule-http-concurrency 10

APP_URL=$(az containerapp show \
  --name ca-scale-rule-conflicts \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --query properties.configuration.ingress.fqdn -o tsv)

8.5 Scenario matrix¶

Scenario	Scale rules	Primary load source	Expected discriminator
1. HTTP only	HTTP concurrency only	`GET /` burst	Baseline for normal HTTP scale-out/scale-in
2. CPU only	CPU rule only	`GET /cpu`	Baseline for KEDA-style resource scaling
3. HTTP + CPU	Both HTTP and CPU	mixed `/` + `/cpu`	Determine whether HTTP scaling appears subordinated or ignored
4. HTTP + Service Bus	HTTP + Service Bus queue	`/` burst + queue backlog	Determine whether both signals can keep replicas active
5. HTTP/2 scale-in	HTTP rule only or HTTP + CPU	long-lived `/stream` clients	Determine whether active connections prevent scale-in

8.6 Scale rule update commands per scenario¶

Scenario 1: HTTP only¶

az containerapp update \
  --name ca-scale-rule-conflicts \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --min-replicas 0 \
  --max-replicas 10 \
  --scale-rule-name http-rule \
  --scale-rule-type http \
  --scale-rule-http-concurrency 10

Scenario 2: CPU only¶

az containerapp update \
  --name ca-scale-rule-conflicts \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --scale-rule-name cpu-rule \
  --scale-rule-type cpu \
  --scale-rule-metadata type=Utilization value=60

Scenario 3: HTTP + CPU¶

Use YAML because multiple scale rules are easier to express explicitly.

properties:
  template:
    scale:
      minReplicas: 0
      maxReplicas: 10
      rules:
        - name: http-rule
          http:
            metadata:
              concurrentRequests: "10"
        - name: cpu-rule
          custom:
            type: cpu
            metadata:
              type: Utilization
              value: "60"

az containerapp show \
  --name ca-scale-rule-conflicts \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --output yaml > scenario-http-cpu.yaml

# Replace the scale block with the YAML above, then apply:
az containerapp update \
  --name ca-scale-rule-conflicts \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --yaml scenario-http-cpu.yaml

Scenario 4: HTTP + Service Bus queue¶

properties:
  configuration:
    secrets:
      - name: sb-connection
        value: <service-bus-connection-string>
  template:
    scale:
      minReplicas: 0
      maxReplicas: 10
      rules:
        - name: http-rule
          http:
            metadata:
              concurrentRequests: "10"
        - name: sb-rule
          custom:
            type: azure-servicebus
            auth:
              - secretRef: sb-connection
                triggerParameter: connection
            metadata:
              namespace: sbscaleruleconflicts
              queueName: work-items
              messageCount: "5"
              activationMessageCount: "0"

az containerapp show \
  --name ca-scale-rule-conflicts \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --output yaml > scenario-http-sb.yaml

# Replace the scale block and secret value, then apply:
az containerapp update \
  --name ca-scale-rule-conflicts \
  --resource-group rg-aca-scale-rule-conflicts-lab \
  --yaml scenario-http-sb.yaml

Scenario 5: cooldown and activation retest¶

Repeat Scenarios 3 and 4 with these controlled variations:

reduce cooldownPeriod to 30
increase cooldownPeriod to 300
change queue activationMessageCount from 0 to 5
change HTTP load from short-lived requests to long-lived /stream clients

8.7 Load scripts¶

Short-lived HTTP burst¶

#!/usr/bin/env bash
set -euo pipefail

URL="$1"
TOTAL="${2:-200}"
CONCURRENCY="${3:-40}"

seq "$TOTAL" | xargs -P "$CONCURRENCY" -I{} \
  curl -skS "https://$URL/" -o /dev/null -w '%{http_code} %{time_total}\n'

CPU-heavy traffic¶

#!/usr/bin/env bash
set -euo pipefail

URL="$1"
REQUESTS="${2:-20}"
SECONDS_PER_REQUEST="${3:-30}"
PARALLEL="${4:-10}"

seq "$REQUESTS" | xargs -P "$PARALLEL" -I{} \
  curl -skS "https://$URL/cpu?seconds=$SECONDS_PER_REQUEST" -o /dev/null -w '%{http_code} %{time_total}\n'

Queue backlog creation¶

#!/usr/bin/env bash
set -euo pipefail

URL="$1"
COUNT="${2:-200}"

curl -skS -X POST "https://$URL/enqueue?count=$COUNT"

Long-lived HTTP/2 clients¶

#!/usr/bin/env bash
set -euo pipefail

URL="$1"
CLIENTS="${2:-5}"
SECONDS="${3:-300}"

for n in $(seq 1 "$CLIENTS"); do
  curl --http2 -skN "https://$URL/stream?seconds=$SECONDS&interval=5" \
    > "stream-client-$n.log" 2>&1 &
done

wait

8.8 Per-scenario execution steps¶

For each scenario:

Apply the intended scale-rule configuration.
Confirm the active revision and scale rule block with az containerapp show.
Wait for the app to scale to baseline (0 or 1 replicas as configured).
Start the relevant load script.
Record replica count every 15 seconds with az containerapp replica list.
Stop the load source and continue observing for at least 15 minutes.
Query system and console logs immediately after the observation window.
Repeat at least three independent runs per scenario.

9. Expected signal¶

Scenario 1: HTTP only should scale out during request bursts and scale in after requests stop and active connections disappear.
Scenario 2: CPU only should scale out only when /cpu traffic drives sustained CPU above threshold; scale-in should follow after CPU drops and cooldown expires.
Scenario 3: HTTP + CPU should reveal whether CPU activity keeps replicas active even after HTTP burst traffic is gone, making HTTP scale-in appear broken.
Scenario 4: HTTP + Service Bus should show whether queue backlog or lingering worker activity keeps replicas alive after front-end traffic stops.
Scenario 5: HTTP/2 should show that long-lived connections delay or block scale-in even with low request rate.
Retesting with altered cooldownPeriod should show little or no effect when the scaler remains logically active.

10. Results¶

Pending execution.

Capture the following tables during execution.

10.1 Replica timeline summary¶

Scenario	Run	First scale-out time	Peak replicas	Load stop time	Scale-in complete time	Scale-in blocked?
HTTP only	1
CPU only	1
HTTP + CPU	1
HTTP + Service Bus	1
HTTP/2 keep-alive	1

10.2 Dominant-signal observations¶

Scenario	CPU above threshold	Queue backlog present
HTTP + CPU		n/a
HTTP + Service Bus	n/a
HTTP/2 keep-alive	optional	n/a

10.3 Representative log excerpts¶

system log excerpt showing scale-out trigger
system log excerpt showing failed or delayed scale-in
console log excerpt showing STREAM_OPEN without STREAM_CLOSE
queue worker logs showing QUEUE_IDLE despite lingering replicas

11. Interpretation¶

Pending execution. Interpret results using explicit evidence tags.

Planned interpretation questions:

Observed / Measured: Did HTTP-only and CPU-only baselines behave normally on their own?
Observed / Inferred: In mixed-rule scenarios, did one metric remain active and therefore prevent overall scale-in?
Observed / Strongly Suggested: Did HTTP/2 streams correlate with blocked scale-in despite near-zero request volume?
Not Proven / Unknown: If HTTP scaling appears ignored, is the cause controller precedence, metric sampling cadence, or threshold design?

12. What this proves¶

After execution, this experiment should be able to prove only the following kinds of claims:

whether the tested rule combinations in this environment scaled out as expected
whether scale-in required all active signals to become inactive
whether long-lived HTTP/2 connections correlated with delayed scale-in
whether cooldownPeriod changes alone changed behavior when metrics remained active

13. What this does NOT prove¶

Even after execution, this experiment will not by itself prove:

the exact internal Azure control-plane implementation behind HTTP scaling
that all Container Apps regions or environments behave identically
that Event Hub scaling bugs are identical to Service Bus scaling behavior
that every apparent conflict is a platform bug rather than a threshold-design issue

Issue #972 is included as motivation for queue-scaler scale-down anomalies, but this design uses Service Bus for easier controlled reproduction. Any similarity to Event Hub behavior would be correlated, not automatically equivalent.

14. Support takeaway¶

For cases involving unexpected Container Apps scaling behavior, support engineers should check:

whether multiple scale signals are configured and which one is still active
whether the customer expects cooldownPeriod to override an active metric
whether HTTP/2, SSE, gRPC, or other long-lived connections are keeping ingress activity alive
whether queue scaler activation* values and thresholds are aligned with actual idle conditions
whether the customer has baseline evidence from single-rule testing before combining rules

Practical triage order:

reproduce with HTTP only
reproduce with CPU only or queue only
combine rules only after each single-rule baseline is understood
inspect active connections and scaler thresholds before escalating as a platform defect

15. Reproduction notes¶

Keep revision mode fixed during all runs to avoid mixing revision-transition behavior with scaling behavior.
Use the same image and same environment for every scenario.
Wait long enough after stopping load; scale-in investigations often fail because observation stops too early.
For HTTP/2 tests, prefer clients that keep connections open intentionally; short curl bursts may not reproduce the issue.
Record exact rule configuration for every run because small threshold differences can completely change conclusions.
If scale rules must be edited through YAML, archive the exact applied YAML alongside captured logs.