Skip to content

HTTP Concurrency Cliffs on Flex Consumption

Status: Planned

1. Question

At what HTTP concurrency level does a single Flex Consumption instance begin to degrade (increased latency, errors, or worker restarts), and is this cliff predictable and consistent across runs?

2. Why this matters

Flex Consumption allows configuring http.maxConcurrentRequests in host.json. Setting this too high overloads a single instance; setting it too low wastes instances. Customers need to know the practical concurrency ceiling — not the theoretical limit — so they can configure scaling triggers appropriately. The "cliff" behavior (sudden degradation rather than gradual) is particularly dangerous because monitoring may not catch it until requests are already failing.

3. Customer symptom

  • "Everything is fine at 50 concurrent requests but at 80 it suddenly falls apart."
  • "We increased maxConcurrentRequests to 200 and now we're seeing 500 errors."
  • "Latency is stable until we hit some threshold, then p99 goes from 200ms to 30 seconds."

4. Hypothesis

Each Flex Consumption instance has a practical concurrency ceiling that depends on the function's resource consumption (CPU, memory, I/O). Beyond this ceiling:

  1. Latency increases non-linearly (cliff behavior, not gradual degradation)
  2. The cliff point is reproducible across runs (±10% variance)
  3. The cliff point correlates with CPU saturation or memory pressure, not just request count

5. Environment

Parameter Value
Service Azure Functions
SKU / Plan Flex Consumption
Region Korea Central
Runtime Python 3.11
OS Linux
Date tested

6. Variables

Experiment type: Performance

Controlled:

  • Concurrency ramp: 10, 25, 50, 75, 100, 150, 200 concurrent requests
  • Function payload: CPU-light (JSON parse), CPU-medium (image resize), I/O-bound (HTTP call)
  • maxConcurrentRequests: 100, 200, unbounded
  • Duration per concurrency level: 5 minutes steady state

Observed:

  • Response latency distribution (p50, p95, p99) per concurrency level
  • Error rate per concurrency level
  • Worker restart events
  • Memory and CPU consumption per instance

Independent run definition: Fresh deployment, single instance pinned (always_ready=1, max scale=1), identical ramp profile

Planned runs per configuration: 5

Warm-up exclusion rule: First 2 minutes at each concurrency level

Primary metric: p95 latency; meaningful effect threshold: 2× increase from previous concurrency step

Comparison method: Per-run cliff detection; Mann-Whitney U comparing cliff-point concurrency across runs

7. Instrumentation

  • Application Insights: request traces, performance counters
  • Custom function middleware: per-request timing, concurrent request counter
  • Azure Monitor: ProcessCpuPercentage, ProcessMemoryMB
  • k6 load generator with step-up concurrency profile

8. Procedure

8.1 Infrastructure Setup

export SUBSCRIPTION_ID="<subscription-id>"
export RG="rg-http-concurrency-cliffs-lab"
export LOCATION="koreacentral"
export APP_NAME="func-http-concurrency-cliffs-$RANDOM"
export STORAGE_NAME="sthttpcliff$RANDOM"
export APP_INSIGHTS_NAME="appi-http-concurrency-cliffs"
export LOG_WORKSPACE_NAME="law-http-concurrency-cliffs"

az account set --subscription "$SUBSCRIPTION_ID"

az group create --name "$RG" --location "$LOCATION"

az storage account create \
  --resource-group "$RG" \
  --name "$STORAGE_NAME" \
  --location "$LOCATION" \
  --sku Standard_LRS \
  --kind StorageV2 \
  --allow-blob-public-access false \
  --allow-shared-key-access false \
  --min-tls-version TLS1_2

az monitor log-analytics workspace create \
  --resource-group "$RG" \
  --workspace-name "$LOG_WORKSPACE_NAME" \
  --location "$LOCATION"

az monitor app-insights component create \
  --resource-group "$RG" \
  --app "$APP_INSIGHTS_NAME" \
  --workspace "$LOG_WORKSPACE_NAME" \
  --location "$LOCATION" \
  --application-type web

az functionapp create \
  --resource-group "$RG" \
  --name "$APP_NAME" \
  --storage-account "$STORAGE_NAME" \
  --runtime python \
  --runtime-version 3.11 \
  --functions-version 4 \
  --flexconsumption-location "$LOCATION"

az functionapp identity assign \
  --resource-group "$RG" \
  --name "$APP_NAME"

PRINCIPAL_ID=$(az functionapp identity show \
  --resource-group "$RG" \
  --name "$APP_NAME" \
  --query principalId --output tsv)

STORAGE_ID=$(az storage account show \
  --resource-group "$RG" \
  --name "$STORAGE_NAME" \
  --query id --output tsv)

az role assignment create \
  --assignee-object-id "$PRINCIPAL_ID" \
  --assignee-principal-type ServicePrincipal \
  --role "Storage Blob Data Contributor" \
  --scope "$STORAGE_ID"

az role assignment create \
  --assignee-object-id "$PRINCIPAL_ID" \
  --assignee-principal-type ServicePrincipal \
  --role "Storage Queue Data Contributor" \
  --scope "$STORAGE_ID"

8.2 Application Code

import json
import threading
import time
from datetime import datetime, timezone

import azure.functions as func

app = func.FunctionApp(http_auth_level=func.AuthLevel.ANONYMOUS)
active_requests = 0
lock = threading.Lock()


@app.route(route="probe", methods=["GET"])
def probe(req: func.HttpRequest) -> func.HttpResponse:
    global active_requests
    mode = req.params.get("mode", "cpu-light")
    started = time.perf_counter()

    with lock:
        active_requests += 1
        current_concurrency = active_requests

    try:
        if mode == "cpu-medium":
            end = time.perf_counter() + 0.2
            x = 0
            while time.perf_counter() < end:
                x += 1
        elif mode == "io-bound":
            time.sleep(0.2)
        else:
            payload = json.loads('{"hello":"world","n":1}')
            _ = payload.get("hello")

        exec_ms = round((time.perf_counter() - started) * 1000, 2)
        body = {
            "timestamp_utc": datetime.now(timezone.utc).isoformat(),
            "mode": mode,
            "execution_ms": exec_ms,
            "observed_concurrency": current_concurrency,
        }
        return func.HttpResponse(
            body=json.dumps(body),
            status_code=200,
            mimetype="application/json",
            headers={
                "x-execution-ms": str(exec_ms),
                "x-observed-concurrency": str(current_concurrency),
            },
        )
    finally:
        with lock:
            active_requests -= 1
version: "2.0"
logging:
  applicationInsights:
    samplingSettings:
      isEnabled: true
      excludedTypes: Request
extensions:
  http:
    routePrefix: "api"
    maxConcurrentRequests: 100
    maxOutstandingRequests: 200
functionTimeout: "00:10:00"

8.3 Deploy

mkdir -p app-http-concurrency-cliffs

cat > app-http-concurrency-cliffs/function_app.py <<'PY'
# paste Python from section 8.2
PY

cat > app-http-concurrency-cliffs/host.json <<'JSON'
{
  "version": "2.0",
  "logging": {
    "applicationInsights": {
      "samplingSettings": {
        "isEnabled": true,
        "excludedTypes": "Request"
      }
    }
  },
  "extensions": {
    "http": {
      "routePrefix": "api",
      "maxConcurrentRequests": 100,
      "maxOutstandingRequests": 200
    }
  },
  "functionTimeout": "00:10:00"
}
JSON

cat > app-http-concurrency-cliffs/requirements.txt <<'TXT'
azure-functions
TXT

az functionapp config appsettings set \
  --resource-group "$RG" \
  --name "$APP_NAME" \
  --settings \
    AzureWebJobsStorage__accountName="$STORAGE_NAME" \
    FUNCTIONS_EXTENSION_VERSION="~4" \
    FUNCTIONS_WORKER_RUNTIME="python"

(cd app-http-concurrency-cliffs && zip -r ../app-http-concurrency-cliffs.zip .)

az functionapp deployment source config-zip \
  --resource-group "$RG" \
  --name "$APP_NAME" \
  --src app-http-concurrency-cliffs.zip

az functionapp restart --resource-group "$RG" --name "$APP_NAME"

8.4 Test Execution

export FUNCTION_URL="https://$APP_NAME.azurewebsites.net/api/probe"

cat > k6-cliff.js <<'JS'
import http from "k6/http";
import { check, sleep } from "k6";

const mode = __ENV.MODE || "cpu-light";
const target = __ENV.TARGET;

export const options = {
  scenarios: {
    stepped: {
      executor: "ramping-vus",
      startVUs: 10,
      stages: [
        { duration: "5m", target: 10 },
        { duration: "5m", target: 25 },
        { duration: "5m", target: 50 },
        { duration: "5m", target: 75 },
        { duration: "5m", target: 100 },
        { duration: "5m", target: 150 },
        { duration: "5m", target: 200 }
      ]
    }
  }
};

export default function () {
  const res = http.get(`${target}?mode=${mode}`);
  check(res, { "status is 200": (r) => r.status === 200 });
  sleep(0.1);
}
JS

# 1) Run cpu-light profile with maxConcurrentRequests=100
k6 run --env TARGET="$FUNCTION_URL" --env MODE="cpu-light" k6-cliff.js

# 2) Run cpu-medium profile
k6 run --env TARGET="$FUNCTION_URL" --env MODE="cpu-medium" k6-cliff.js

# 3) Run io-bound profile
k6 run --env TARGET="$FUNCTION_URL" --env MODE="io-bound" k6-cliff.js

# 4) Raise host HTTP concurrency and repeat matrix
cat > app-http-concurrency-cliffs/host.json <<'JSON'
{
  "version": "2.0",
  "extensions": {
    "http": {
      "routePrefix": "api",
      "maxConcurrentRequests": 200,
      "maxOutstandingRequests": 400
    }
  }
}
JSON

(cd app-http-concurrency-cliffs && zip -r ../app-http-concurrency-cliffs.zip .)
az functionapp deployment source config-zip \
  --resource-group "$RG" \
  --name "$APP_NAME" \
  --src app-http-concurrency-cliffs.zip

k6 run --env TARGET="$FUNCTION_URL" --env MODE="cpu-light" k6-cliff.js

# 5) For reproducibility, execute each profile 5 times and keep raw outputs
for run in 1 2 3 4 5; do
  k6 run --env TARGET="$FUNCTION_URL" --env MODE="cpu-medium" k6-cliff.js \
    --summary-export "summary-cpu-medium-run${run}.json"
done

8.5 Data Collection

APP_INSIGHTS_ID=$(az monitor app-insights component show \
  --resource-group "$RG" \
  --app "$APP_INSIGHTS_NAME" \
  --query appId --output tsv)

az monitor app-insights query \
  --app "$APP_INSIGHTS_ID" \
  --analytics-query "requests | where timestamp > ago(8h) | where name has '/api/probe' | summarize p50=percentile(duration,50), p95=percentile(duration,95), p99=percentile(duration,99), errors=countif(success == false) by bin(timestamp, 5m) | order by timestamp asc" \
  --output table

az monitor app-insights query \
  --app "$APP_INSIGHTS_ID" \
  --analytics-query "traces | where timestamp > ago(8h) | where customDimensions has 'observed_concurrency' | project timestamp, message, customDimensions | order by timestamp asc" \
  --output table

az monitor metrics list \
  --resource "/subscriptions/<subscription-id>/resourceGroups/$RG/providers/Microsoft.Web/sites/$APP_NAME" \
  --metric "Requests" "Http5xx" "AverageResponseTime" \
  --interval PT1M \
  --aggregation Average Maximum Total \
  --output table

az monitor metrics list \
  --resource "/subscriptions/<subscription-id>/resourceGroups/$RG/providers/Microsoft.Web/sites/$APP_NAME" \
  --metric "CpuPercentage" "MemoryWorkingSet" \
  --interval PT1M \
  --aggregation Average Maximum \
  --output table

8.6 Cleanup

az group delete --name "$RG" --yes --no-wait

9. Expected signal

  • Latency remains stable up to a concurrency threshold, then increases sharply
  • The cliff point varies by function type: CPU-light ~100-150, CPU-medium ~50-75, I/O-bound ~75-100
  • Error rate transitions from 0% to >5% within a single concurrency step at the cliff
  • Cliff point is consistent across 5 runs (±10 concurrent requests)

10. Results

Awaiting execution.

11. Interpretation

Awaiting execution.

12. What this proves

Awaiting execution.

13. What this does NOT prove

Awaiting execution.

14. Support takeaway

Awaiting execution.

15. Reproduction notes

  • Pin to a single instance using always_ready=1 and maximum instance count=1 to isolate per-instance behavior
  • Flex Consumption instance sizes may vary; check actual CPU/memory allocation
  • Worker process recycling can reset the concurrency state mid-test