Lab Guide: Out of Memory Crash Under Blob Processing Load¶

This lab reproduces a Python Azure Functions out-of-memory incident under sustained blob-trigger load. You will intentionally use a memory-inefficient buffering path, drive concurrency in phases, collect telemetry, and verify that a streaming fix plus concurrency controls removes crash-loop behavior.

Lab Metadata¶

Field	Value
Difficulty	Advanced
Duration	60-90 min
Hosting plan tested	Premium EP1 (Linux)
Trigger type	Blob trigger
Runtime	Python 3.11 / Functions v4
Azure services	Azure Functions, Storage Account, Application Insights
Skills practiced	Hypothesis-driven troubleshooting, KQL correlation, memory pressure diagnosis, safe remediation validation

What this lab is designed to prove

In Python Azure Functions, memory exhaustion is usually observed as worker process termination (often exit code 137) and restart traces, not a language-runtime exception from a different stack.

This lab validates a Python-appropriate chain: - Memory usage rises in performanceCounters (Process / Private Bytes). - Function logs show Python MemoryError or abrupt invocation interruption. - Host traces show worker lifecycle events (Worker process exited, Restarting worker process, Worker process started). - After remediation, the same load profile no longer produces crash signatures.

1) Background¶

Azure Functions instances run with finite memory. In this lab, the app uses a blob-processing anti-pattern (buffer full payload in memory) under high concurrency. That pattern amplifies per-invocation memory usage until the Python worker becomes unstable.

On Linux-hosted Python apps, severe memory pressure commonly ends with the worker process being OOM-killed by the OS (exit code 137) or Python raising MemoryError in function execution logs. The host runtime then restarts the worker process.

Because this lab uses EP1, thresholds should match EP1 capacity characteristics, not Consumption-size limits.

Failure progression model¶

flowchart TD
    A[Blob load increases] --> B[Function buffers payload in memory]
    B --> C[Concurrent invocations overlap]
    C --> D[Private Bytes climbs rapidly]
    D --> E[Python worker under memory pressure]
    E --> F["MemoryError and/or worker exit code 137"]
    F --> G[Host logs worker exited]
    G --> H[Host restarts worker process]
    H --> I[Latency spikes and temporary failures]
    I --> J[Streaming fix + lower concurrency]
    J --> K[Memory stabilizes and restarts stop]

EP1 memory bands used in this lab¶

State	Worker Private Bytes (EP1)	Interpretation
Healthy	350-700 MB	Normal operating band
Degraded	1.2-2.0 GB	Rising pressure; latency and retries increase
Critical	2.5 GB+	Near EP1 limit (3.5 GB), crash risk high

Why this lab can be misread¶

Common misdiagnoses: 1. Treating dependency failures as the primary cause when they are downstream effects. 2. Looking only at request latency without memory and worker-lifecycle correlation. 3. Searching for .NET exception strings in a Python worker incident.

2) Hypothesis¶

Formal statement¶

If the Python blob function buffers full payloads in memory while high concurrency is enabled on EP1, then process memory (Private Bytes) will enter a critical band and produce Python worker instability (MemoryError, worker exit 137, worker restarts). Replacing buffering with streaming and reducing concurrency will remove those signatures under equivalent test load.

Causal chain¶

flowchart LR
    A[In-memory blob buffering] --> B[High per-invocation memory cost]
    B --> C[Concurrent invocation overlap]
    C --> D[Private Bytes enters critical band]
    D --> E[MemoryError or OOM kill exit 137]
    E --> F[Worker restart cycle]
    F --> G["Latency/failure burst"]
    G --> H[Streaming + lower concurrency]
    H --> I[Stabilized memory and recovery]

Proof criteria¶

FunctionAppLogs in incident window include Python-style memory failure patterns (for example MemoryError) or abrupt invocation failures aligned to worker exits.
traces show worker lifecycle around the same times (Worker process exited, Restarting worker process, Worker process started).
performanceCounters shows Private Bytes entering EP1 critical band (2.5 GB+).
requests and dependencies degrade before/during worker exits.
After remediation, equivalent load no longer produces MemoryError/exit 137 restart signatures.

Disproof criteria¶

Worker lifecycle is stable while failures are explained entirely by unrelated dependency saturation.
Private Bytes remains outside critical band across repeated high-load phases.
Streaming remediation does not materially improve restart and latency profiles.

3) Runbook¶

Prerequisites¶

Azure CLI authenticated and correct subscription selected.
Functions Core Tools and Python 3.11 available.
Permissions to query Application Insights.

az account show --output table
func --version
python3 --version

Variables¶

RG="rg-func-lab-oom"
LOCATION="koreacentral"
STORAGE_NAME="stfuncoomlab001"
PLAN_NAME="plan-func-oom-ep1"
APP_NAME="func-oom-lab-001"
AI_NAME="appi-func-oom-001"

3.1 Deploy baseline infrastructure (EP1)¶

az group create --name "$RG" --location "$LOCATION"
az storage account create --name "$STORAGE_NAME" --resource-group "$RG" --location "$LOCATION" --sku Standard_LRS --kind StorageV2
az monitor app-insights component create --app "$AI_NAME" --location "$LOCATION" --resource-group "$RG" --kind web --application-type web
az functionapp plan create --name "$PLAN_NAME" --resource-group "$RG" --location "$LOCATION" --sku EP1 --is-linux true
az functionapp create --name "$APP_NAME" --resource-group "$RG" --plan "$PLAN_NAME" --runtime python --runtime-version 3.11 --functions-version 4 --storage-account "$STORAGE_NAME" --app-insights "$AI_NAME"

3.2 Deploy memory-buffering version¶

az functionapp deployment source config-zip --name "$APP_NAME" --resource-group "$RG" --src "./artifacts/oom-buffering-app.zip"
az functionapp config appsettings set --name "$APP_NAME" --resource-group "$RG" --settings "FUNCTIONS_WORKER_RUNTIME=python" "BLOB_BATCH_SIZE=32" "MAX_CONCURRENT_INVOCATIONS=48"
az functionapp restart --name "$APP_NAME" --resource-group "$RG"

Lab artifacts

The ZIP packages and load files referenced in this lab are pre-built assets. Prepare them before starting:

oom-buffering-app.zip: Python function that reads entire blob into memory
oom-streaming-app.zip: Streaming version that processes in chunks
load/phase1-3: Blob files of increasing size (100MB, 500MB, 1GB+)

3.3 Capture baseline evidence (T0 to T0+15m)¶

Set a fixed window anchor for all queries:

let appName = "func-oom-lab-001";
let t0 = datetime(2026-04-05 01:30:00Z);
let tEnd = datetime(2026-04-05 02:40:00Z);

Query A: Host lifecycle baseline (`traces`)¶

let appName = "func-oom-lab-001";
let t0 = datetime(2026-04-05 01:30:00Z);
let t1 = datetime(2026-04-05 01:45:00Z);
traces
| where timestamp between (t0 .. t1)
| where cloud_RoleName == appName
| where message has_any ("Starting Host", "Host started", "Job host started", "Worker process started")
| project timestamp, severityLevel, message
| order by timestamp asc

Query B: Request baseline (`requests`)¶

let appName = "func-oom-lab-001";
let t0 = datetime(2026-04-05 01:30:00Z);
let t1 = datetime(2026-04-05 01:45:00Z);
requests
| where timestamp between (t0 .. t1)
| where cloud_RoleName == appName
| summarize
    total=count(),
    failures=countif(success == false),
    p95Ms=round(percentile(toreal(duration / 1ms), 95), 2),
    failureRatePercent=round(100.0 * failures / total, 2)
  by bin(timestamp, 5m)
| order by timestamp asc

Query C: Memory baseline (`performanceCounters`)¶

let appName = "func-oom-lab-001";
let t0 = datetime(2026-04-05 01:30:00Z);
let t1 = datetime(2026-04-05 01:45:00Z);
performanceCounters
| where timestamp between (t0 .. t1)
| where cloud_RoleName == appName
| where counter == "Private Bytes"
| summarize avgMemoryMB=round(avg(value) / (1024.0 * 1024.0), 1), maxMemoryMB=round(max(value) / (1024.0 * 1024.0), 1) by bin(timestamp, 5m)
| order by timestamp asc

Query D: Python memory failure signature (`FunctionAppLogs`)¶

let appName = "func-oom-lab-001";
let t0 = datetime(2026-04-05 01:30:00Z);
let t1 = datetime(2026-04-05 01:45:00Z);
FunctionAppLogs
| where TimeGenerated between (t0 .. t1)
| where AppName == appName
| where Message has_any ("MemoryError", "Worker process exited", "Restarting worker process")
| project TimeGenerated, FunctionName, Level, Message
| order by TimeGenerated asc

CLI execution examples:

az monitor app-insights query --apps "$AI_NAME" --resource-group "$RG" --analytics-query "let appName='${APP_NAME}'; let t0=datetime(2026-04-05 01:30:00Z); let t1=datetime(2026-04-05 01:45:00Z); performanceCounters | where timestamp between (t0 .. t1) | where cloud_RoleName == appName | where counter == 'Private Bytes' | summarize avgMemoryMB=round(avg(value)/(1024.0*1024.0),1), maxMemoryMB=round(max(value)/(1024.0*1024.0),1) by bin(timestamp,5m) | order by timestamp asc" --output table
az monitor app-insights query --apps "$AI_NAME" --resource-group "$RG" --analytics-query "let appName='${APP_NAME}'; let t0=datetime(2026-04-05 01:30:00Z); let t1=datetime(2026-04-05 01:45:00Z); requests | where timestamp between (t0 .. t1) | where cloud_RoleName == appName | summarize total=count(), failures=countif(success == false), p95Ms=round(percentile(toreal(duration / 1ms),95),2), failureRatePercent=round(100.0*failures/total,2) by bin(timestamp,5m) | order by timestamp asc" --output table

3.4 Trigger controlled incident phases¶

Use a fixed timeline to avoid drift:

Baseline: 01:30-01:45 (T0 to T0+15m)
Load phase 1 (moderate): 01:45-02:00
Load phase 2 (high): 02:00-02:15
Incident (OOM crashes): 02:15-02:25
Remediation action: 02:25
Recovery: 02:25-02:40

az storage container create --name "input" --account-name "$STORAGE_NAME" --auth-mode login
az storage blob upload-batch --account-name "$STORAGE_NAME" --destination "input" --source "./load/phase1" --auth-mode login
az storage blob upload-batch --account-name "$STORAGE_NAME" --destination "input" --source "./load/phase2" --auth-mode login
az storage blob upload-batch --account-name "$STORAGE_NAME" --destination "input" --source "./load/phase3" --auth-mode login

3.5 Collect incident evidence (T0+15m to T0+55m)¶

Query E: Python `MemoryError` and invocation failures (`FunctionAppLogs`)¶

let appName = "func-oom-lab-001";
let tStart = datetime(2026-04-05 01:45:00Z);
let tEnd = datetime(2026-04-05 02:25:00Z);
FunctionAppLogs
| where TimeGenerated between (tStart .. tEnd)
| where AppName == appName
| where Message has_any ("MemoryError", "invocation failed", "Out of memory")
| project TimeGenerated, FunctionName, Level, Message
| order by TimeGenerated asc

Expected pattern example:

2026-04-05T02:16:41.105Z  BlobBufferProcessor  Error  MemoryError: cannot allocate bytes object
2026-04-05T02:19:12.442Z  BlobBufferProcessor  Error  MemoryError: cannot allocate bytes object

Query F: Worker lifecycle crash signatures (`traces`)¶

let appName = "func-oom-lab-001";
let tStart = datetime(2026-04-05 01:45:00Z);
let tEnd = datetime(2026-04-05 02:25:00Z);
traces
| where timestamp between (tStart .. tEnd)
| where cloud_RoleName == appName
| where message has_any ("Worker process exited", "exit code 137", "Restarting worker process", "Worker process started")
| project timestamp, severityLevel, message
| order by timestamp asc

Expected pattern example:

2026-04-05T02:16:41.512Z  Error        Worker process exited with code 137
2026-04-05T02:16:45.994Z  Warning      Restarting worker process after unexpected exit
2026-04-05T02:16:49.144Z  Information  Worker process started

Query G: Request degradation (`requests`)¶

let appName = "func-oom-lab-001";
let tStart = datetime(2026-04-05 01:45:00Z);
let tEnd = datetime(2026-04-05 02:25:00Z);
requests
| where timestamp between (tStart .. tEnd)
| where cloud_RoleName == appName
| summarize
    total=count(),
    failures=countif(success == false),
    p95Ms=round(percentile(toreal(duration / 1ms), 95), 2),
    failureRatePercent=round(100.0 * failures / total, 2)
  by bin(timestamp, 5m)
| order by timestamp asc

Query H: Dependency retry pressure (`dependencies`)¶

let appName = "func-oom-lab-001";
let tStart = datetime(2026-04-05 01:45:00Z);
let tEnd = datetime(2026-04-05 02:25:00Z);
dependencies
| where timestamp between (tStart .. tEnd)
| where cloud_RoleName == appName
| summarize
    total=count(),
    failed=countif(success == false),
    p95Ms=round(percentile(toreal(duration / 1ms), 95), 2),
    failureRatePercent=round(100.0 * failed / total, 2)
  by type, target, bin(timestamp, 5m)
| order by timestamp asc

Query I: EP1 memory pressure progression (`performanceCounters`)¶

let appName = "func-oom-lab-001";
let tStart = datetime(2026-04-05 01:45:00Z);
let tEnd = datetime(2026-04-05 02:25:00Z);
performanceCounters
| where timestamp between (tStart .. tEnd)
| where cloud_RoleName == appName
| where counter == "Private Bytes"
| summarize avgMemoryMB=round(avg(value) / (1024.0 * 1024.0), 1), maxMemoryMB=round(max(value) / (1024.0 * 1024.0), 1) by bin(timestamp, 5m)
| order by timestamp asc

CLI execution examples:

az monitor app-insights query --apps "$AI_NAME" --resource-group "$RG" --analytics-query "let appName='${APP_NAME}'; let tStart=datetime(2026-04-05 01:45:00Z); let tEnd=datetime(2026-04-05 02:25:00Z); traces | where timestamp between (tStart .. tEnd) | where cloud_RoleName == appName | where message has_any ('Worker process exited','exit code 137','Restarting worker process','Worker process started') | project timestamp,severityLevel,message | order by timestamp asc" --output table
az monitor app-insights query --apps "$AI_NAME" --resource-group "$RG" --analytics-query "let appName='${APP_NAME}'; let tStart=datetime(2026-04-05 01:45:00Z); let tEnd=datetime(2026-04-05 02:25:00Z); FunctionAppLogs | where TimeGenerated between (tStart .. tEnd) | where AppName == appName | where Message has_any ('MemoryError','invocation failed','Out of memory') | project TimeGenerated,FunctionName,Level,Message | order by TimeGenerated asc" --output table
az monitor app-insights query --apps "$AI_NAME" --resource-group "$RG" --analytics-query "let appName='${APP_NAME}'; let tStart=datetime(2026-04-05 01:45:00Z); let tEnd=datetime(2026-04-05 02:25:00Z); performanceCounters | where timestamp between (tStart .. tEnd) | where cloud_RoleName == appName | where counter == 'Private Bytes' | summarize avgMemoryMB=round(avg(value)/(1024.0*1024.0),1), maxMemoryMB=round(max(value)/(1024.0*1024.0),1) by bin(timestamp,5m) | order by timestamp asc" --output table

3.6 Apply remediation and verify (T0+55m to T0+70m)¶

Deploy streaming implementation.
Reduce batch size and max concurrency.
Re-run the same load profile and compare windows.

az functionapp deployment source config-zip --name "$APP_NAME" --resource-group "$RG" --src "./artifacts/oom-streaming-app.zip"
az functionapp config appsettings set --name "$APP_NAME" --resource-group "$RG" --settings "BLOB_BATCH_SIZE=8" "MAX_CONCURRENT_INVOCATIONS=12"
az functionapp restart --name "$APP_NAME" --resource-group "$RG"

Post-fix verification queries:

let appName = "func-oom-lab-001";
let tStart = datetime(2026-04-05 02:25:00Z);
let tEnd = datetime(2026-04-05 02:40:00Z);
FunctionAppLogs
| where TimeGenerated between (tStart .. tEnd)
| where AppName == appName
| where Message has_any ("MemoryError", "Out of memory")
| summarize count()

let appName = "func-oom-lab-001";
let tStart = datetime(2026-04-05 02:25:00Z);
let tEnd = datetime(2026-04-05 02:40:00Z);
requests
| where timestamp between (tStart .. tEnd)
| where cloud_RoleName == appName
| summarize
    p95Ms=round(percentile(toreal(duration / 1ms), 95), 2),
    failureRatePercent=round(100.0 * countif(success == false) / count(), 2)
  by bin(timestamp, 5m)
| order by timestamp asc

let appName = "func-oom-lab-001";
let tStart = datetime(2026-04-05 02:25:00Z);
let tEnd = datetime(2026-04-05 02:40:00Z);
traces
| where timestamp between (tStart .. tEnd)
| where cloud_RoleName == appName
| where message has_any ("Worker process exited", "Restarting worker process")
| summarize count()

4) Experiment Log¶

Artifact inventory¶

Artifact	Location	Purpose
Buffering deployment package	`./artifacts/oom-buffering-app.zip`	Reproduce failure path
Streaming deployment package	`./artifacts/oom-streaming-app.zip`	Validate remediation
Load phases	`./load/phase1` `./load/phase2` `./load/phase3`	Controlled escalation
Query export bundle	`./evidence/oom/kql-session.json`	Preserve evidence chain
Timeline worksheet	`./evidence/oom/timeline.csv`	Keep event correlation reproducible

Timeline anchor and phases¶

Phase	Window (UTC)	Goal
Baseline	01:30-01:45	Establish healthy memory and latency
Load phase 1	01:45-02:00	Introduce moderate pressure
Load phase 2	02:00-02:15	Push toward degraded band
Incident	02:15-02:25	Observe crash-loop signatures
Remediation	02:25	Deploy streaming + lower concurrency
Recovery	02:25-02:40	Verify stabilization

Baseline observations (01:30-01:45)¶

Metric	Observation	Interpretation
Private Bytes	420-610 MB	Healthy EP1 band
Request p95	420-620 ms	Healthy
Failure rate	0.0-0.4%	Healthy
Worker lifecycle	No exit/restart events	Stable worker
Python memory errors	None	No memory fault

Condensed incident timeline (key events)¶

Time (UTC)	Phase	Signal	Observation	Interpretation
01:46	Load 1	`performanceCounters`	Private Bytes 0.92 GB	Early growth
01:50	Load 1	`requests`	p95 1.62 s, fail 2.1%	Degradation starts
01:54	Load 1	`dependencies`	fail 3.7%	Retry pressure rising
01:58	Load 1	`performanceCounters`	Private Bytes 1.28 GB	Enters degraded band
02:02	Load 2	`requests`	p95 3.91 s, fail 6.4%	Service impact visible
02:06	Load 2	`performanceCounters`	Private Bytes 1.84 GB	High degraded zone
02:10	Load 2	`dependencies`	fail 9.2%	Cascading effects
02:13	Load 2	`performanceCounters`	Private Bytes 2.32 GB	Near critical threshold
02:16	Incident	`FunctionAppLogs`	`MemoryError: cannot allocate bytes object`	Python memory failure
02:16	Incident	`traces`	Worker process exited with code 137	OOM kill signature
02:16	Incident	`traces`	Restarting worker process	Host recovery loop starts
02:17	Incident	`traces`	Worker process started	Restart complete
02:19	Incident	`FunctionAppLogs`	`MemoryError` repeated	Fault persists under load
02:19	Incident	`requests`	p95 10.21 s, fail 22.8%	Severe outage behavior
02:21	Incident	`performanceCounters`	Private Bytes 2.67 GB	EP1 critical band
02:22	Incident	`traces`	Worker process exited with code 137	Crash loop repeats
02:23	Incident	`traces`	Restarting worker process	Recurrent instability
02:25	Remediation	Runbook action	Deploy streaming package	Causal intervention
02:25	Remediation	Runbook action	Set batch=8, concurrency=12	Pressure reduction
02:28	Recovery	`performanceCounters`	Private Bytes 1.34 GB	Leaves critical band
02:31	Recovery	`requests`	p95 1.58 s, fail 2.3%	Fast recovery trend
02:34	Recovery	`performanceCounters`	Private Bytes 0.98 GB	Near healthy target
02:36	Recovery	`FunctionAppLogs`	No `MemoryError` records	Primary fault removed
02:38	Recovery	`traces`	No worker exit/restart	Crash loop resolved
02:40	Recovery	`requests`	p95 0.92 s, fail 0.6%	Stabilized

Representative log excerpts (Python-consistent)¶

[FunctionAppLogs]
2026-04-05T02:16:41.105Z Error BlobBufferProcessor MemoryError: cannot allocate bytes object
2026-04-05T02:16:41.151Z Error BlobBufferProcessor Invocation failed after 00:00:09.102
2026-04-05T02:19:12.442Z Error BlobBufferProcessor MemoryError: cannot allocate bytes object
2026-04-05T02:19:12.501Z Error BlobBufferProcessor Invocation failed after 00:00:10.088
2026-04-05T02:28:30.014Z Information BlobBufferProcessor Streaming implementation active
2026-04-05T02:31:05.327Z Information BlobBufferProcessor Invocation completed in 00:00:01.102

[traces]
2026-04-05T02:16:41.512Z Error Worker process exited with code 137
2026-04-05T02:16:45.994Z Warning Restarting worker process after unexpected exit
2026-04-05T02:16:49.144Z Information Worker process started
2026-04-05T02:22:07.523Z Error Worker process exited with code 137
2026-04-05T02:22:12.016Z Warning Restarting worker process after unexpected exit
2026-04-05T02:22:15.145Z Information Worker process started
2026-04-05T02:25:33.044Z Information Host started (412ms)

Core finding¶

The incident aligns with Python worker memory exhaustion, not a .NET exception path. During high-load phases, Private Bytes enters EP1 critical levels, then MemoryError and worker exit 137 appear with restart loops in the same window. After streaming remediation and concurrency reduction, memory, latency, and restart signals return to stable bands.

Hypothesis verdict¶

Criterion	Verdict	Evidence
Python memory failure signature appears	Supported	`MemoryError` records in `FunctionAppLogs`
Worker lifecycle crash signatures align	Supported	`exit code 137` + restart traces
EP1 memory enters critical band	Supported	`Private Bytes` exceeds 2.5 GB
Post-fix restart loop disappears	Supported	No exit/restart events in recovery window
Post-fix latency/failures normalize	Supported	p95 and failure rate trend to healthy

Final verdict: Hypothesis supported.

Expected Evidence¶

Before Trigger (Baseline)¶

Signal	Expected Value
`FunctionAppLogs` memory errors	0
Worker exit/restart traces	0
`performanceCounters` Private Bytes	350-700 MB
Request p95 (`requests`)	under 700 ms
Dependency failure rate (`dependencies`)	under 1%

During Incident¶

Signal	Expected Value
`FunctionAppLogs` `MemoryError`	1+ records in incident window
Worker lifecycle traces	exit `137` + restart sequence
`performanceCounters` Private Bytes	2.5 GB+ (EP1 critical)
Request p95 (`requests`)	above 8 s
Failure rate (`requests`)	above 15%

After Recovery¶

Signal	Expected Value
`FunctionAppLogs` `MemoryError`	0
Worker exit/restart traces	0
`performanceCounters` Private Bytes	trending below 1.2 GB
Request p95 (`requests`)	under 1.2 s
Failure rate (`requests`)	under 1%

Evidence Timeline¶

gantt
    title OOM Lab Evidence Timeline (EP1)
    dateFormat  HH:mm
    axisFormat  %H:%M
    section Baseline
    Baseline capture                :done, b1, 01:30, 15m
    section Load
    Load phase 1 (moderate)         :done, l1, 01:45, 15m
    Load phase 2 (high)             :done, l2, 02:00, 15m
    section Incident
    OOM crash-loop window           :done, i1, 02:15, 10m
    section Recovery
    Remediation applied             :done, r1, 02:25, 1m
    Recovery validation             :done, r2, 02:25, 15m

Evidence Chain: Why This Proves the Hypothesis¶

Falsification logic

The hypothesis is only supported when all signals align in the same time window:

Memory pressure reaches EP1 critical thresholds (Private Bytes 2.5 GB+).
Python failure signatures (MemoryError) appear.
Host-level worker lifecycle shows exit 137 and restart events.
After changing to streaming + lower concurrency, those signatures disappear while latency and failure rate improve.

If any one of these fails to correlate temporally, rerun the phase and reassess alternate causes.

Clean Up¶

az group delete --name "$RG" --yes --no-wait

Out-of-Memory Worker Crash Playbook

Sources¶

https://learn.microsoft.com/azure/azure-functions/functions-scale
https://learn.microsoft.com/azure/azure-functions/functions-reference-python
https://learn.microsoft.com/azure/azure-functions/functions-monitoring
https://learn.microsoft.com/azure/azure-monitor/app/data-model-complete
https://learn.microsoft.com/azure/azure-monitor/logs/log-query-overview