Skip to content

Incident Playbooks¶

Symptom-oriented troubleshooting guides for Azure Functions.

Each playbook follows a hypothesis-driven structure: start from the symptom, list competing hypotheses, collect evidence, validate or disprove, and identify the root cause.

graph TD
    A[Reported symptom] --> B{Primary symptom area}
    B --> C[Execution]
    B --> D[Performance]
    B --> E["Queue / Event Processing"]
    B --> F[Deployment]
    B --> G[Triggers]
    B --> H[Scaling]
    B --> I["Auth / Config"]
    C --> C1[Functions not executing]
    C --> C2[Functions failing with errors]
    D --> D1["High latency / slow responses"]
    E --> E1[Queue messages piling up]
    E --> E2[Blob trigger not firing]
    F --> F1[Deployment failures]
    G --> G1["Timeout / Execution Limit"]
    G --> G2["Event Hub / Service Bus Lag"]
    H --> H1["Out of Memory / Worker Crash"]
    H --> H2[Durable Orchestration Stuck]
    I --> I1["Managed Identity / RBAC Failure"]
    I --> I2[App Settings Misconfiguration]

Execution¶

Playbook	Symptom
Functions Not Executing	Events arrive but invocation count is near zero
Functions Failing with Errors	Exception count and 5xx increase quickly

Performance¶

Playbook	Symptom
High Latency / Slow Responses	P95 latency spikes and timeout rate increases

Queue / Event Processing¶

Playbook	Symptom
Queue Messages Piling Up	Queue depth and message age rise steadily
Blob Trigger Not Firing	Blob uploads succeed but invocations never appear

Deployment¶

Playbook	Symptom
Deployment Failures	Deployment fails or app degrades immediately after release

Triggers¶

Playbook	Symptom
Timeout / Execution Limit	Functions terminate early or hit maximum execution duration
Event Hub / Service Bus Lag	Event-driven processing falls behind and checkpoint lag grows

Scaling¶

Playbook	Symptom
Out of Memory / Worker Crash	Workers restart or fail under memory pressure
Durable Orchestration Stuck	Durable orchestrations hang or replay excessively

Auth / Config¶

Playbook	Symptom
Managed Identity / RBAC Failure	Identity-based access fails after RBAC or scope changes
App Settings Misconfiguration	Functions fail due to missing, wrong, or stale application settings
---

How to Use These Playbooks¶

Identify the primary symptom your incident matches.
Open the corresponding playbook.
Follow the hypothesis-driven workflow: What you observe → Hypotheses → Checks → Interpretation → Fix.
Use inline KQL queries directly in the playbook — no need to switch to a separate query library.

Troubleshooting Workflow

Start with First 10 Minutes, follow Methodology, use playbook-embedded KQL queries, and map hands-on practice from Lab Guides.

See Also¶