Memory Pressure and Worker Degradation (Azure App Service Linux)¶

1. Summary¶

Symptom¶

Latency and error rates gradually worsen over uptime even when CPU is not saturated. Requests that were stable after deployment become slower after several hours, then often recover after restart or recycle. In severe windows, the app shows 502/503 bursts, worker restarts, or OOM-like behavior.

Why this scenario is confusing¶

Teams commonly expect memory incidents to present as immediate crashes. In App Service Linux, memory pressure can first appear as worker degradation: slower GC cycles, queue buildup, intermittent timeouts, and delayed responses. Because CPU can stay moderate, responders may incorrectly scale on CPU alone and miss plan-level memory contention shared across apps.

Troubleshooting decision flow¶

graph TD
    A[Symptom: Performance degrades over uptime, restart helps] --> B{What to check first?}
    B --> C[Memory trend climbs with stable traffic]
    B --> D[Multiple apps in same plan degrade]
    B --> E[Frequent worker timeout/restart signatures]
    B --> F[Large payloads/retries increase in-flight memory]
    C --> H1[H1: Application memory leak or unbounded cache growth]
    D --> H2[H2: Plan-level memory contention]
    E --> H3[H3: Worker/process model mismatch]
    F --> H4[H4: Dependency/runtime amplifies memory pressure]

2. Common Misreadings¶

"CPU is healthy, so platform capacity is healthy."
"Only one app is affected, therefore the App Service Plan is not relevant."
"Restart fixed it, so the issue is gone" (without proving root cause).
"No persistent 5xx means no user-impacting performance issue."
"Memory leak always means obvious out-of-memory crash" (ignoring slow degradation patterns).

3. Competing Hypotheses¶

H1: Application memory leak or unbounded cache growth in Python/Node worker processes causes progressive memory retention and GC overhead.
H2: Plan-level memory contention (noisy neighbor pattern) where multiple apps in the same plan consume shared memory headroom, degrading one another.
H3: Worker/process model mismatch (too many workers/threads for available memory) causing thrash, frequent restarts, and queueing.
H4: Dependency and runtime behavior amplifies memory pressure (large payload buffering, retry storms, long-lived objects), creating degraded workers before hard OOM.

4. What to Check First¶

Metrics¶

App Service Plan Memory Percentage and CpuPercentage over the same timeline as latency degradation.
HTTP latency distribution using AppServiceHTTPLogs.TimeTaken (P50/P95/P99).
Restart/recycle frequency and instance count trend during the incident window.

Portal view: Metrics blade for memory vs CPU divergence detection¶

The decisive metric pivot for this playbook is scope: change Scope from the app to the parent App Service Plan, because MemoryPercentage and CpuPercentage are plan-level metrics, not app-level — querying them on the app scope returns empty data and produces a false "no memory pressure" conclusion. Once scoped to the plan, add MemoryPercentage with Avg aggregation and CpuPercentage with Avg aggregation on the same chart over a 24-hour window: the canonical degradation signature this playbook diagnoses is MemoryPercentage rising monotonically with uptime while CpuPercentage stays moderate and flat (the divergence pattern listed in the Normal vs Abnormal Comparison table below). Use the Plot multiple metrics capability hinted at in the help card to overlay these on one chart, then add a second chart with Http Response Time on the app scope split by Instance to confirm latency climbs in lockstep with plan memory.

Logs¶

AppServiceConsoleLogs for OOM, worker timeout, restart loop, GC pressure, heap warnings.
AppServiceAppLogs for framework/runtime warnings (allocation spikes, request body size warnings, retry storms).
AppServiceHTTPLogs for path-specific slowdown and status code drift (200 -> 499/5xx patterns).

Platform Signals¶

AppServicePlatformLogs for container restart/recycle events and health check consequences.
Correlation with deployment, scale changes, and app setting updates.
Shared-plan context: whether sibling apps show simultaneous stress.

5. Evidence to Collect¶

Required Evidence¶

KQL: latency trend and endpoint distribution from AppServiceHTTPLogs.
KQL: memory-pressure keywords and worker lifecycle events from AppServiceConsoleLogs and AppServicePlatformLogs.
KQL: application-level warning/error bursts from AppServiceAppLogs.
Azure Monitor metric exports for App Service Plan memory/CPU and affected app instance counts.

Useful Context¶

Runtime details: Python/Node version, Gunicorn/PM2/startup command, worker/thread count.
Recent code changes affecting object lifetime, caching, streaming, payload size, retries.
Plan topology: number of apps in the same App Service Plan and recent utilization trends.
Incident timing: when degradation starts after startup and how quickly restart restores baseline.

Sample Log Patterns¶

AppServiceHTTPLogs (memory-pressure lab)¶

2026-04-04T11:23:04Z  GET  /diag/env    200  4
2026-04-04T11:23:03Z  GET  /diag/stats  200  18
2026-04-04T11:21:50Z  GET  /heavy       200  1384
2026-04-04T11:21:49Z  GET  /heavy       200  1153
2026-04-04T11:21:49Z  GET  /heavy       200  1019
2026-04-04T11:21:49Z  GET  /heavy       200  950
2026-04-04T11:21:49Z  GET  /heavy       200  920
2026-04-04T11:21:48Z  GET  /leak        200  4808

AppServiceConsoleLogs (worker model clues)¶

2026-04-04T11:14:07Z  [2026-04-04 11:14:07 +0000] [1891] [INFO] Starting gunicorn 24.1.1
2026-04-04T11:14:07Z  [2026-04-04 11:14:07 +0000] [1891] [INFO] Listening at: http://0.0.0.0:8000 (1891)
2026-04-04T11:14:07Z  [2026-04-04 11:14:07 +0000] [1891] [INFO] Using worker: sync
2026-04-04T11:14:07Z  [2026-04-04 11:14:07 +0000] [1892] [INFO] Booting worker with pid: 1892
2026-04-04T11:14:07Z  [2026-04-04 11:14:07 +0000] [1893] [INFO] Booting worker with pid: 1893
2026-04-04T11:14:07Z  [2026-04-04 11:14:07 +0000] [1894] [INFO] Booting worker with pid: 1894
2026-04-04T11:14:07Z  [2026-04-04 11:14:07 +0000] [1895] [INFO] Booting worker with pid: 1895

AppServicePlatformLogs (recycle sequence)¶

2026-04-04T11:14:30Z  Informational  Container is terminating. Grace period: 5 seconds.
2026-04-04T11:14:30Z  Informational  Stopping container: f19d98813a89_<app-name>.
2026-04-04T11:14:36Z  Informational  Container is terminated. Total time elapsed: 5545 ms.
2026-04-04T11:14:36Z  Informational  Site: <app-name> stopped.

How to Read This

/heavy requests are consistently ~920-1384 ms while /leak is 4808 ms, even with HTTP 200 responses. That is a degradation signature, not an availability-only incident. Combined with sync workers and only four workers, a few long-running calls can saturate worker slots and amplify queue delay.

KQL Queries with Example Output¶

Query 1: Endpoint latency fingerprint during incident window¶

AppServiceHTTPLogs
| where TimeGenerated between (datetime(2026-04-04 11:21:45) .. datetime(2026-04-04 11:23:05))
| project TimeGenerated, CsMethod, CsUriStem, ScStatus, TimeTaken
| order by TimeGenerated desc

Example Output

TimeGenerated	CsMethod	CsUriStem	ScStatus	TimeTaken
2026-04-04 11:23:04	GET	/diag/env	200	4
2026-04-04 11:23:03	GET	/diag/stats	200	18
2026-04-04 11:21:50	GET	/heavy	200	1384
2026-04-04 11:21:49	GET	/heavy	200	1153
2026-04-04 11:21:49	GET	/heavy	200	1019
2026-04-04 11:21:49	GET	/heavy	200	950
2026-04-04 11:21:49	GET	/heavy	200	920
2026-04-04 11:21:48	GET	/leak	200	4808

How to Read This

Health endpoints (/diag/env, /diag/stats) remain fast while workload endpoints degrade. This weakens global network outage hypotheses and strengthens endpoint-specific pressure hypotheses (memory growth, heavy compute, queueing).

Query 2: Worker model and boot evidence from console logs¶

AppServiceConsoleLogs
| where TimeGenerated between (datetime(2026-04-04 11:14:00) .. datetime(2026-04-04 11:14:10))
| project TimeGenerated, Level, ResultDescription
| order by TimeGenerated desc

Example Output

TimeGenerated	Level	ResultDescription
2026-04-04 11:14:07	Error	[2026-04-04 11:14:07 +0000] [1895] [INFO] Booting worker with pid: 1895
2026-04-04 11:14:07	Error	[2026-04-04 11:14:07 +0000] [1894] [INFO] Booting worker with pid: 1894
2026-04-04 11:14:07	Error	[2026-04-04 11:14:07 +0000] [1893] [INFO] Booting worker with pid: 1893
2026-04-04 11:14:07	Error	[2026-04-04 11:14:07 +0000] [1892] [INFO] Booting worker with pid: 1892
2026-04-04 11:14:07	Error	[2026-04-04 11:14:07 +0000] [1891] [INFO] Using worker: sync
2026-04-04 11:14:07	Error	[2026-04-04 11:14:07 +0000] [1891] [INFO] Listening at: http://0.0.0.0:8000 (1891)
2026-04-04 11:14:07	Error	[2026-04-04 11:14:07 +0000] [1891] [INFO] Starting gunicorn 24.1.1

How to Read This

sync plus a small worker count means each worker handles one blocking request at a time. Long /heavy and /leak calls can consume all workers quickly even when CPU is not pegged.

Query 3: Platform recycle timeline around pressure event¶

AppServicePlatformLogs
| where TimeGenerated between (datetime(2026-04-04 11:14:25) .. datetime(2026-04-04 11:14:40))
| project TimeGenerated, Level, Message
| order by TimeGenerated desc

Example Output

TimeGenerated	Level	Message
2026-04-04 11:14:36	Informational	Site: stopped.
2026-04-04 11:14:36	Informational	Container is terminated. Total time elapsed: 5545 ms.
2026-04-04 11:14:30	Informational	Stopping container: f19d98813a89_.
2026-04-04 11:14:30	Informational	Container is terminating. Grace period: 5 seconds.

How to Read This

These rows confirm lifecycle churn. If latency improves immediately after this stop/start cycle and then degrades again with uptime, treat memory/worker degradation as primary until disproven.

CLI Investigation Commands¶

az webapp config show --resource-group <resource-group> --name <app-name>
az webapp config appsettings list --resource-group <resource-group> --name <app-name>
az webapp log tail --resource-group <resource-group> --name <app-name>
az monitor metrics list --resource <app-service-plan-resource-id> --metric "CpuPercentage,MemoryPercentage" --interval PT1M --aggregation Average

Example Output (sanitized)

$ az webapp config show --resource-group <resource-group> --name <app-name>
{
  "linuxFxVersion": "PYTHON|3.12",
  "alwaysOn": true,
  "http20Enabled": true
}

$ az webapp config appsettings list --resource-group <resource-group> --name <app-name>
[
  {"name": "WEBSITES_PORT", "value": "8000"},
  {"name": "SCM_DO_BUILD_DURING_DEPLOYMENT", "value": "false"}
]

$ az monitor metrics list --resource <app-service-plan-resource-id> --metric "CpuPercentage,MemoryPercentage" --interval PT1M --aggregation Average
timestamp                  CpuPercentage_Average   MemoryPercentage_Average
-------------------------  ----------------------  ------------------------
2026-04-04T11:21:00Z       36.2                    83.7
2026-04-04T11:22:00Z       39.8                    86.4

How to Read This

If memory remains high while CPU is moderate and HTTP latency climbs, scaling by CPU signal alone will miss the failure mode. Revisit worker count, memory profile, and endpoint behavior.

Normal vs Abnormal Comparison¶

Signal	Normal (Healthy)	Abnormal (Memory/Worker Degradation)
`/heavy` latency	Mostly sub-second, stable tail	Repeated 920-1384 ms spikes under moderate load
`/leak` latency	Rare and bounded	Multi-second outlier (for example 4808 ms)
Health endpoint latency	Low and stable	Still low (can remain deceptively healthy)
Gunicorn worker mode	Matches workload profile and capacity	`sync` workers saturated by long-running calls
Platform lifecycle	Infrequent stop/start events	Recurrent container termination/restart correlation
CPU vs memory trend	CPU and memory proportional to load	CPU moderate, memory elevated and climbing

6. Validation and Disproof by Hypothesis¶

H1: Application memory leak or unbounded cache growth¶

Signals that support
- Memory trend rises with uptime while traffic volume is relatively stable.
- Latency gradually worsens before any restart event.
- Restart temporarily restores latency and error rates.
- Console/app logs mention OOM, memory allocation failures, or aggressive GC cycles.
Signals that weaken
- Memory remains flat across uptime windows.
- Latency degrades immediately after deployment regardless of uptime length.
- Restart does not produce temporary improvement.

What to verify

KQL (latency and status trend):

AppServiceHTTPLogs
| where TimeGenerated > ago(24h)
| summarize req=count(), p95=percentile(TimeTaken,95), p99=percentile(TimeTaken,99), errors=countif(ScStatus >= 500) by bin(TimeGenerated, 5m)
| order by TimeGenerated asc

KQL (memory symptom keywords from console):

AppServiceConsoleLogs
| where TimeGenerated > ago(24h)
| where ResultDescription has_any ("OutOfMemory", "OOM", "Killed", "worker timeout", "memory", "GC")
| project TimeGenerated, ResultDescription
| order by TimeGenerated desc

CLI (plan memory and cpu):

az monitor metrics list --resource <app-service-plan-resource-id> --metric "MemoryPercentage,CpuPercentage" --interval PT1M --aggregation Average
az webapp log tail --resource-group <resource-group> --name <app-name>

H2: Plan-level memory contention across multiple apps¶

Signals that support
- Multiple apps on the same App Service Plan degrade in overlapping windows.
- Plan memory remains high even when the affected app has moderate traffic.
- Incidents align with a sibling app deployment or load surge.
- Recycling one app helps briefly, but pressure returns until total plan demand drops.
Signals that weaken
- Other plan apps remain stable with no latency or restart signal.
- Plan memory headroom remains comfortably below pressure levels.
- Isolated dedicated plan shows no recurrence.

What to verify

KQL (platform restart/recycle timeline):

AppServicePlatformLogs
| where TimeGenerated > ago(24h)
| where ResultDescription has_any ("restart", "recycle", "container", "health check")
| project TimeGenerated, ContainerId, OperationName, ResultDescription
| order by TimeGenerated desc

CLI (apps sharing plan and plan metadata):

az appservice plan show --resource-group <resource-group> --name <plan-name>
az webapp list --resource-group <resource-group> --query "[?serverFarmId!=null].{name:name,serverFarmId:serverFarmId,state:state}" --output table
az monitor metrics list --resource <app-service-plan-resource-id> --metric "MemoryPercentage" --interval PT5M --aggregation Maximum

Verify whether affected and sibling apps share incident timestamps and memory pressure windows.

H3: Worker/process model is overcommitted for memory budget¶

Signals that support
- Startup command configures high worker/thread count relative to SKU memory.
- Frequent worker exits/timeouts with moderate CPU.
- Tail latency worsens with concurrency bursts and short recovery after recycle.
- Logs show repeated worker boot/restart patterns.
Signals that weaken
- Conservative worker settings with sustained stability under equivalent load tests.
- No worker timeout/restart signatures in logs.
- Latency follows dependency slowness independent of concurrency level.

What to verify

KQL (worker lifecycle and timeout signatures):

AppServiceConsoleLogs
| where TimeGenerated > ago(12h)
| where ResultDescription has_any ("WORKER TIMEOUT", "Booting worker", "Worker exiting", "signal 9", "Killed")
| summarize events=count() by bin(TimeGenerated, 5m)
| order by TimeGenerated asc

CLI (runtime and startup config):

az webapp config show --resource-group <resource-group> --name <app-name>
az webapp config appsettings list --resource-group <resource-group> --name <app-name>

Validate effective process settings (workers, threads, timeout) against measured memory per worker and plan limits.

H4: Dependency/runtime behavior amplifies memory pressure¶

Signals that support
- Slow periods align with high-volume endpoints returning large payloads or buffering request bodies.
- App logs show retry storms, large object serialization, or unbounded in-memory aggregation.
- HTTP latency and 499/5xx increase before restart, not only during startup.
- Memory pressure worsens when dependency latency increases (larger in-flight object lifetime).
Signals that weaken
- Large-payload endpoints are quiet during incidents.
- Dependency latency is stable while memory pressure still rises linearly.
- Reduced retry limits do not change memory profile.

What to verify

KQL (path and payload-related latency shape):

AppServiceHTTPLogs
| where TimeGenerated > ago(12h)
| summarize req=count(), p95=percentile(TimeTaken,95), p99=percentile(TimeTaken,99) by CsUriStem, ScStatus
| top 20 by p99 desc

KQL (application warnings and memory-affecting behavior):

AppServiceAppLogs
| where TimeGenerated > ago(12h)
| where ResultDescription has_any ("retry", "payload", "buffer", "allocation", "memory", "gc", "timeout")
| project TimeGenerated, CustomLevel, ResultDescription, Logger
| order by TimeGenerated desc

CLI (restart for controlled validation window):

az webapp restart --resource-group <resource-group> --name <app-name>
az monitor metrics list --resource <app-service-plan-resource-id> --metric "MemoryPercentage" --interval PT1M --aggregation Average

7. Likely Root Cause Patterns¶

Pattern A: Gradual heap retention in application code
- Common in Python/Node when caches are unbounded, large objects remain referenced, or per-request data leaks into process scope.
Pattern B: Shared-plan headroom collapse
- One or more sibling apps consume memory spikes, reducing effective capacity and degrading unrelated apps in the same plan.
Pattern C: Over-aggressive worker count for SKU size
- Higher worker concurrency increases baseline resident memory and pushes plan into frequent pressure cycles.
Pattern D: Slow dependency causes in-flight memory expansion
- More concurrent in-flight requests hold larger object graphs longer, compounding GC cost and tail latency.

Investigation Notes¶

App Service Linux performance incidents can be memory-first even when CPU appears healthy.
Always align evidence by time window; individual signals in isolation can be misleading.
A restart that helps is a useful signal, not a root cause.
Plan-level memory is shared capacity; app-level tuning without plan context is often incomplete.
Validate both application behavior and process model choices before concluding platform fault.

Quick Conclusion¶

When App Service Linux response times degrade over uptime and improve after restart, treat memory pressure and worker degradation as primary hypotheses early. Correlate AppServiceHTTPLogs, AppServiceConsoleLogs, AppServicePlatformLogs, AppServiceAppLogs, and plan metrics in one timeline to separate leak patterns, plan contention, worker overcommit, and dependency-amplified pressure. Stabilize with low-risk mitigations, then implement durable memory budgeting, isolation, and workload design changes to prevent recurrence.

8. Immediate Mitigations¶

Reduce worker/process count to stabilize memory footprint (temporary, production-safe if traffic is moderate).
Scale up App Service Plan SKU to add memory headroom quickly (temporary, production-safe, cost impact).
Move high-memory sibling app to a separate plan to remove contention (production-safe, operational change risk).
Apply bounded cache limits and shorter object retention windows (production-safe).
Restart affected app during incident to recover service while investigation continues (temporary, risk-bearing: brief disruption and cold-start effect).
Reduce retry fan-out and cap request payload sizes in hot paths (production-safe with behavior validation).

9. Prevention¶

Establish memory budgets per worker and choose concurrency settings from load-test data, not defaults.
Add leak detection and periodic heap profiling in pre-production and canary slots.
Implement bounded caches with explicit eviction policy and size controls.
Isolate critical workloads into dedicated App Service Plans to eliminate cross-app memory contention.
Track SLOs using P95/P99 latency plus memory trend and restart frequency correlation alerts.
Refactor large buffering code paths to streaming patterns where possible.

Limitations¶

This playbook focuses on Azure App Service Linux and OSS runtime patterns only.
It does not replace framework-specific memory profiling guidance for each language ecosystem.
Kernel-level host diagnostics are abstracted by the platform and may not be directly visible.

Memory Pressure and Worker Degradation (Azure App Service Linux)¶

1. Summary¶

Symptom¶

Why this scenario is confusing¶

Troubleshooting decision flow¶

2. Common Misreadings¶

3. Competing Hypotheses¶

4. What to Check First¶

Metrics¶

Portal view: Metrics blade for memory vs CPU divergence detection¶

Logs¶

Platform Signals¶

5. Evidence to Collect¶

Required Evidence¶

Useful Context¶

Sample Log Patterns¶

AppServiceHTTPLogs (memory-pressure lab)¶

AppServiceConsoleLogs (worker model clues)¶

AppServicePlatformLogs (recycle sequence)¶

KQL Queries with Example Output¶

Query 1: Endpoint latency fingerprint during incident window¶

Query 2: Worker model and boot evidence from console logs¶

Query 3: Platform recycle timeline around pressure event¶

CLI Investigation Commands¶

Normal vs Abnormal Comparison¶

6. Validation and Disproof by Hypothesis¶

H1: Application memory leak or unbounded cache growth¶

H2: Plan-level memory contention across multiple apps¶

H3: Worker/process model is overcommitted for memory budget¶

H4: Dependency/runtime behavior amplifies memory pressure¶

7. Likely Root Cause Patterns¶

Investigation Notes¶

Quick Conclusion¶

8. Immediate Mitigations¶

9. Prevention¶

Limitations¶

See Also¶

Sources¶

Memory Pressure and Worker Degradation (Azure App Service Linux)¶

1. Summary¶

Symptom¶

Why this scenario is confusing¶

Troubleshooting decision flow¶

2. Common Misreadings¶

3. Competing Hypotheses¶

4. What to Check First¶

Metrics¶

Portal view: Metrics blade for memory vs CPU divergence detection¶

Logs¶

Platform Signals¶

5. Evidence to Collect¶

Required Evidence¶

Useful Context¶

Sample Log Patterns¶

AppServiceHTTPLogs (memory-pressure lab)¶

AppServiceConsoleLogs (worker model clues)¶

AppServicePlatformLogs (recycle sequence)¶

KQL Queries with Example Output¶

Query 1: Endpoint latency fingerprint during incident window¶

Query 2: Worker model and boot evidence from console logs¶

Query 3: Platform recycle timeline around pressure event¶

CLI Investigation Commands¶

Normal vs Abnormal Comparison¶

6. Validation and Disproof by Hypothesis¶

H1: Application memory leak or unbounded cache growth¶

H2: Plan-level memory contention across multiple apps¶

H3: Worker/process model is overcommitted for memory budget¶

H4: Dependency/runtime behavior amplifies memory pressure¶

7. Likely Root Cause Patterns¶

Investigation Notes¶

Quick Conclusion¶

8. Immediate Mitigations¶

9. Prevention¶

Limitations¶

See Also¶

Related Queries¶

Related Checklists¶

Related Labs¶

Related Labs (Evidence Drills)¶

Sources¶