HTTP Concurrency Cliffs on Flex Consumption¶

Status: Planned

1. Question¶

At what HTTP concurrency level does a single Flex Consumption instance begin to degrade (increased latency, errors, or worker restarts), and is this cliff predictable and consistent across runs?

2. Why this matters¶

Flex Consumption allows configuring http.maxConcurrentRequests in host.json. Setting this too high overloads a single instance; setting it too low wastes instances. Customers need to know the practical concurrency ceiling — not the theoretical limit — so they can configure scaling triggers appropriately. The "cliff" behavior (sudden degradation rather than gradual) is particularly dangerous because monitoring may not catch it until requests are already failing.

3. Customer symptom¶

"Everything is fine at 50 concurrent requests but at 80 it suddenly falls apart."
"We increased maxConcurrentRequests to 200 and now we're seeing 500 errors."
"Latency is stable until we hit some threshold, then p99 goes from 200ms to 30 seconds."

4. Hypothesis¶

Each Flex Consumption instance has a practical concurrency ceiling that depends on the function's resource consumption (CPU, memory, I/O). Beyond this ceiling:

Latency increases non-linearly (cliff behavior, not gradual degradation)
The cliff point is reproducible across runs (±10% variance)
The cliff point correlates with CPU saturation or memory pressure, not just request count

5. Environment¶

Parameter	Value
Service	Azure Functions
SKU / Plan	Flex Consumption
Region	Korea Central
Runtime	Python 3.11
OS	Linux
Date tested	—

6. Variables¶

Experiment type: Performance

Controlled:

Concurrency ramp: 10, 25, 50, 75, 100, 150, 200 concurrent requests
Function payload: CPU-light (JSON parse), CPU-medium (image resize), I/O-bound (HTTP call)
maxConcurrentRequests: 100, 200, unbounded
Duration per concurrency level: 5 minutes steady state

Observed:

Response latency distribution (p50, p95, p99) per concurrency level
Error rate per concurrency level
Worker restart events
Memory and CPU consumption per instance

Independent run definition: Fresh deployment, single instance pinned (always_ready=1, max scale=1), identical ramp profile

Planned runs per configuration: 5

Warm-up exclusion rule: First 2 minutes at each concurrency level

Primary metric: p95 latency; meaningful effect threshold: 2× increase from previous concurrency step

Comparison method: Per-run cliff detection; Mann-Whitney U comparing cliff-point concurrency across runs

7. Instrumentation¶

Application Insights: request traces, performance counters
Custom function middleware: per-request timing, concurrent request counter
Azure Monitor: ProcessCpuPercentage, ProcessMemoryMB
k6 load generator with step-up concurrency profile

8. Procedure¶

8.1 Infrastructure Setup¶

export SUBSCRIPTION_ID="<subscription-id>"
export RG="rg-http-concurrency-cliffs-lab"
export LOCATION="koreacentral"
export APP_NAME="func-http-concurrency-cliffs-$RANDOM"
export STORAGE_NAME="sthttpcliff$RANDOM"
export APP_INSIGHTS_NAME="appi-http-concurrency-cliffs"
export LOG_WORKSPACE_NAME="law-http-concurrency-cliffs"

az account set --subscription "$SUBSCRIPTION_ID"

az group create --name "$RG" --location "$LOCATION"

az storage account create \
  --resource-group "$RG" \
  --name "$STORAGE_NAME" \
  --location "$LOCATION" \
  --sku Standard_LRS \
  --kind StorageV2 \
  --allow-blob-public-access false \
  --allow-shared-key-access false \
  --min-tls-version TLS1_2

az monitor log-analytics workspace create \
  --resource-group "$RG" \
  --workspace-name "$LOG_WORKSPACE_NAME" \
  --location "$LOCATION"

az monitor app-insights component create \
  --resource-group "$RG" \
  --app "$APP_INSIGHTS_NAME" \
  --workspace "$LOG_WORKSPACE_NAME" \
  --location "$LOCATION" \
  --application-type web

az functionapp create \
  --resource-group "$RG" \
  --name "$APP_NAME" \
  --storage-account "$STORAGE_NAME" \
  --runtime python \
  --runtime-version 3.11 \
  --functions-version 4 \
  --flexconsumption-location "$LOCATION"

az functionapp identity assign \
  --resource-group "$RG" \
  --name "$APP_NAME"

PRINCIPAL_ID=$(az functionapp identity show \
  --resource-group "$RG" \
  --name "$APP_NAME" \
  --query principalId --output tsv)

STORAGE_ID=$(az storage account show \
  --resource-group "$RG" \
  --name "$STORAGE_NAME" \
  --query id --output tsv)

az role assignment create \
  --assignee-object-id "$PRINCIPAL_ID" \
  --assignee-principal-type ServicePrincipal \
  --role "Storage Blob Data Contributor" \
  --scope "$STORAGE_ID"

az role assignment create \
  --assignee-object-id "$PRINCIPAL_ID" \
  --assignee-principal-type ServicePrincipal \
  --role "Storage Queue Data Contributor" \
  --scope "$STORAGE_ID"

8.2 Application Code¶

import json
import threading
import time
from datetime import datetime, timezone

import azure.functions as func

app = func.FunctionApp(http_auth_level=func.AuthLevel.ANONYMOUS)
active_requests = 0
lock = threading.Lock()


@app.route(route="probe", methods=["GET"])
def probe(req: func.HttpRequest) -> func.HttpResponse:
    global active_requests
    mode = req.params.get("mode", "cpu-light")
    started = time.perf_counter()

    with lock:
        active_requests += 1
        current_concurrency = active_requests

    try:
        if mode == "cpu-medium":
            end = time.perf_counter() + 0.2
            x = 0
            while time.perf_counter() < end:
                x += 1
        elif mode == "io-bound":
            time.sleep(0.2)
        else:
            payload = json.loads('{"hello":"world","n":1}')
            _ = payload.get("hello")

        exec_ms = round((time.perf_counter() - started) * 1000, 2)
        body = {
            "timestamp_utc": datetime.now(timezone.utc).isoformat(),
            "mode": mode,
            "execution_ms": exec_ms,
            "observed_concurrency": current_concurrency,
        }
        return func.HttpResponse(
            body=json.dumps(body),
            status_code=200,
            mimetype="application/json",
            headers={
                "x-execution-ms": str(exec_ms),
                "x-observed-concurrency": str(current_concurrency),
            },
        )
    finally:
        with lock:
            active_requests -= 1

version: "2.0"
logging:
  applicationInsights:
    samplingSettings:
      isEnabled: true
      excludedTypes: Request
extensions:
  http:
    routePrefix: "api"
    maxConcurrentRequests: 100
    maxOutstandingRequests: 200
functionTimeout: "00:10:00"

8.3 Deploy¶

mkdir -p app-http-concurrency-cliffs

cat > app-http-concurrency-cliffs/function_app.py <<'PY'
# paste Python from section 8.2
PY

cat > app-http-concurrency-cliffs/host.json <<'JSON'
{
  "version": "2.0",
  "logging": {
    "applicationInsights": {
      "samplingSettings": {
        "isEnabled": true,
        "excludedTypes": "Request"
      }
    }
  },
  "extensions": {
    "http": {
      "routePrefix": "api",
      "maxConcurrentRequests": 100,
      "maxOutstandingRequests": 200
    }
  },
  "functionTimeout": "00:10:00"
}
JSON

cat > app-http-concurrency-cliffs/requirements.txt <<'TXT'
azure-functions
TXT

az functionapp config appsettings set \
  --resource-group "$RG" \
  --name "$APP_NAME" \
  --settings \
    AzureWebJobsStorage__accountName="$STORAGE_NAME" \
    FUNCTIONS_EXTENSION_VERSION="~4" \
    FUNCTIONS_WORKER_RUNTIME="python"

(cd app-http-concurrency-cliffs && zip -r ../app-http-concurrency-cliffs.zip .)

az functionapp deployment source config-zip \
  --resource-group "$RG" \
  --name "$APP_NAME" \
  --src app-http-concurrency-cliffs.zip

az functionapp restart --resource-group "$RG" --name "$APP_NAME"

8.4 Test Execution¶

export FUNCTION_URL="https://$APP_NAME.azurewebsites.net/api/probe"

cat > k6-cliff.js <<'JS'
import http from "k6/http";
import { check, sleep } from "k6";

const mode = __ENV.MODE || "cpu-light";
const target = __ENV.TARGET;

export const options = {
  scenarios: {
    stepped: {
      executor: "ramping-vus",
      startVUs: 10,
      stages: [
        { duration: "5m", target: 10 },
        { duration: "5m", target: 25 },
        { duration: "5m", target: 50 },
        { duration: "5m", target: 75 },
        { duration: "5m", target: 100 },
        { duration: "5m", target: 150 },
        { duration: "5m", target: 200 }
      ]
    }
  }
};

export default function () {
  const res = http.get(`${target}?mode=${mode}`);
  check(res, { "status is 200": (r) => r.status === 200 });
  sleep(0.1);
}
JS

# 1) Run cpu-light profile with maxConcurrentRequests=100
k6 run --env TARGET="$FUNCTION_URL" --env MODE="cpu-light" k6-cliff.js

# 2) Run cpu-medium profile
k6 run --env TARGET="$FUNCTION_URL" --env MODE="cpu-medium" k6-cliff.js

# 3) Run io-bound profile
k6 run --env TARGET="$FUNCTION_URL" --env MODE="io-bound" k6-cliff.js

# 4) Raise host HTTP concurrency and repeat matrix
cat > app-http-concurrency-cliffs/host.json <<'JSON'
{
  "version": "2.0",
  "extensions": {
    "http": {
      "routePrefix": "api",
      "maxConcurrentRequests": 200,
      "maxOutstandingRequests": 400
    }
  }
}
JSON

(cd app-http-concurrency-cliffs && zip -r ../app-http-concurrency-cliffs.zip .)
az functionapp deployment source config-zip \
  --resource-group "$RG" \
  --name "$APP_NAME" \
  --src app-http-concurrency-cliffs.zip

k6 run --env TARGET="$FUNCTION_URL" --env MODE="cpu-light" k6-cliff.js

# 5) For reproducibility, execute each profile 5 times and keep raw outputs
for run in 1 2 3 4 5; do
  k6 run --env TARGET="$FUNCTION_URL" --env MODE="cpu-medium" k6-cliff.js \
    --summary-export "summary-cpu-medium-run${run}.json"
done

8.5 Data Collection¶

APP_INSIGHTS_ID=$(az monitor app-insights component show \
  --resource-group "$RG" \
  --app "$APP_INSIGHTS_NAME" \
  --query appId --output tsv)

az monitor app-insights query \
  --app "$APP_INSIGHTS_ID" \
  --analytics-query "requests | where timestamp > ago(8h) | where name has '/api/probe' | summarize p50=percentile(duration,50), p95=percentile(duration,95), p99=percentile(duration,99), errors=countif(success == false) by bin(timestamp, 5m) | order by timestamp asc" \
  --output table

az monitor app-insights query \
  --app "$APP_INSIGHTS_ID" \
  --analytics-query "traces | where timestamp > ago(8h) | where customDimensions has 'observed_concurrency' | project timestamp, message, customDimensions | order by timestamp asc" \
  --output table

az monitor metrics list \
  --resource "/subscriptions/<subscription-id>/resourceGroups/$RG/providers/Microsoft.Web/sites/$APP_NAME" \
  --metric "Requests" "Http5xx" "AverageResponseTime" \
  --interval PT1M \
  --aggregation Average Maximum Total \
  --output table

az monitor metrics list \
  --resource "/subscriptions/<subscription-id>/resourceGroups/$RG/providers/Microsoft.Web/sites/$APP_NAME" \
  --metric "CpuPercentage" "MemoryWorkingSet" \
  --interval PT1M \
  --aggregation Average Maximum \
  --output table

8.6 Cleanup¶

az group delete --name "$RG" --yes --no-wait

9. Expected signal¶

Latency remains stable up to a concurrency threshold, then increases sharply
The cliff point varies by function type: CPU-light ~100-150, CPU-medium ~50-75, I/O-bound ~75-100
Error rate transitions from 0% to >5% within a single concurrency step at the cliff
Cliff point is consistent across 5 runs (±10 concurrent requests)

10. Results¶

Awaiting execution.

11. Interpretation¶

Awaiting execution.

12. What this proves¶

Awaiting execution.

13. What this does NOT prove¶

Awaiting execution.

14. Support takeaway¶

Awaiting execution.

15. Reproduction notes¶

Pin to a single instance using always_ready=1 and maximum instance count=1 to isolate per-instance behavior
Flex Consumption instance sizes may vary; check actual CPU/memory allocation
Worker process recycling can reset the concurrency state mid-test

HTTP Concurrency Cliffs on Flex Consumption¶

1. Question¶

2. Why this matters¶

3. Customer symptom¶

4. Hypothesis¶

5. Environment¶

6. Variables¶

7. Instrumentation¶

8. Procedure¶

8.1 Infrastructure Setup¶

8.2 Application Code¶

8.3 Deploy¶

8.4 Test Execution¶

8.5 Data Collection¶

8.6 Cleanup¶

9. Expected signal¶

10. Results¶

11. Interpretation¶

12. What this proves¶

13. What this does NOT prove¶

14. Support takeaway¶

15. Reproduction notes¶

16. Related guide / official docs¶