Replica Load Imbalance¶

Use this playbook when one replica becomes hot while others stay underused, or when a steady workload produces uneven latency across otherwise healthy replicas.

Symptom¶

One replica shows higher CPU, memory, or latency than the rest.
Request throughput does not spread evenly during a steady load test.
Sticky-session traffic or long-lived connections keep returning to the same replica.
Scale-out happens, but the new replicas do not meaningfully reduce the hottest replica load.

Possible Causes¶

Session affinity is concentrating user flows on a subset of replicas.
Long-lived WebSocket, gRPC, or streaming connections pin work to earlier replicas.
HTTP concurrency is too high, so a hot replica accepts too much work before more replicas are added.
Revision traffic weights are correct, but the issue is inside a single revision at replica level.
Downstream caching or state locality causes specific replicas to do disproportionate work.

Diagnosis Steps¶

flowchart TD
    A[Uneven replica utilization] --> B[Check ingress, session affinity, and connection type]
    B --> C{Sticky sessions or long-lived connections?}
    C -->|Yes| D[Expected concentration on some replicas]
    C -->|No| E[Check scale rules and concurrency]
    E --> F{New replicas arriving late?}
    F -->|Yes| G[Tune KEDA threshold or min replicas]
    F -->|No| H[Inspect app-level state and cache locality]
    D --> I[Disable affinity for stateless paths or isolate sticky workloads]
    H --> J[Refactor state handling or spread work more evenly]

Confirm ingress behavior and whether sticky sessions are enabled.

az containerapp show \
    --name "$APP_NAME" \
    --resource-group "$RG" \
    --query "properties.configuration.ingress" \
    --output json

Check scale rules and replica limits.

az containerapp show \
    --name "$APP_NAME" \
    --resource-group "$RG" \
    --query "properties.template.scale" \
    --output json

Compare request timing and replica events.

az monitor metrics list \
    --resource "/subscriptions/<subscription-id>/resourceGroups/$RG/providers/Microsoft.App/containerApps/$APP_NAME" \
    --metric Requests \
    --aggregation Total \
    --timespan PT1H

let AppName = "ca-myapp";
ContainerAppSystemLogs_CL
| where ContainerAppName_s == AppName
| where TimeGenerated > ago(2h)
| where Reason_s has_any ("ReplicaStarted", "ReplicaReady")
   or Log_s has_any ("affinity", "session", "connection")
| project TimeGenerated, RevisionName_s, ReplicaName_s, Reason_s, Log_s
| order by TimeGenerated desc

If multiple revisions are active, make sure the problem is not mistaken for a traffic-splitting issue.

az containerapp show \
    --name "$APP_NAME" \
    --resource-group "$RG" \
    --query "properties.configuration.ingress.traffic" \
    --output json

Command or Query	Why it is used
`az containerapp show --query properties.configuration.ingress`	Reveals session affinity and ingress behavior that can bias request distribution.
`az containerapp show --query properties.template.scale`	Shows whether scaling thresholds delay new replica creation.
`az monitor metrics list --metric Requests ...`	Establishes the load window that must be correlated with replica behavior.
KQL against system logs	Maps request concentration symptoms to replica lifecycle events.

Resolution¶

Disable sticky sessions for stateless routes, or isolate stateful endpoints where affinity is required.
Lower per-replica concurrency or scale thresholds so new replicas become useful sooner.
Increase minReplicas during steady high-volume periods.
Reduce long-lived connection concentration by separating streaming traffic from normal HTTP traffic.
If state locality is deliberate, adjust expectations and monitor by replica cohort instead of assuming perfect balance.

Prevention¶

Decide explicitly whether affinity is required before enabling it.
Test with both short-lived and long-lived request patterns.
Validate scale rules using steady-state as well as burst traffic.
Keep revision traffic-splitting and replica-level load balancing conceptually separate in incident reviews.
Instrument per-replica behavior in application telemetry where possible.