Troubleshooting¶

Use this section when Azure Functions workloads are degraded, failing, or behaving unexpectedly. It is designed for incident response first, then root-cause analysis and prevention.

Operations Guide

For monitoring setup and alert configuration, see Monitoring and Alerts.

What this section covers¶

First 10 Minutes: incident triage checklist for rapid stabilization.
Decision Tree: visual routing from symptom to investigation path.
Mental Model: conceptual framework for Azure Functions troubleshooting.
Playbooks: scenario runbooks with symptoms, diagnosis, and fixes.
Methodology: repeatable troubleshooting workflow for complex incidents.
KQL Query Library: ready-to-use Application Insights and Log Analytics queries.
Lab Guides: hands-on failure simulations to practice response.

Suggested incident flow¶

Start with First 10 Minutes to verify platform health and blast radius.
Move to Playbooks for scenario-specific diagnosis paths.
Use KQL Query Library to validate hypotheses with telemetry.
Apply Methodology to avoid guesswork and reduce MTTR.
Rehearse with Lab Guides to improve operational readiness.

Troubleshooting mental model¶

Use this classification first to narrow where to collect evidence.

Category	Examples	First Check	Typical Evidence
Request path issue	5xx, timeout, 403, connection refused	`requests` + `exceptions` tables	HTTP status codes, error types
App startup issue	Host not starting, container ping failure, health check timeout	`traces` table (host lifecycle)	`Host started` missing, startup duration
Runtime degradation	Memory pressure, GIL contention, thread pool starvation	`customMetrics`, process metrics	CPU/memory trends, cold start frequency
Dependency / outbound issue	DNS failure, SNAT exhaustion, private endpoint unreachable	`dependencies` table	Failed dependency calls, target resolution
Deployment / recycle event	Post-deploy failures, slot swap issues, config drift	Activity Log, `traces`	Deploy events, host restart events

About customMetrics

The customMetrics table contains metrics explicitly emitted by your application or SDK. Only a few metrics (for example, FunctionExecutionCount, FunctionExecutionUnits) are emitted automatically by the Azure Functions runtime. Queue-related metrics and custom business metrics require explicit instrumentation.

Decision tree¶

flowchart TD
    A[Issue detected] --> B{Is it a 5xx issue?}
    B -->|Yes| C{Intermittent or constant?}
    C -->|Constant| D[Check host startup + recent deploy]
    C -->|Intermittent| E{Recent deployment?}
    E -->|Yes| F["Compare before/after metrics, consider rollback"]
    E -->|No| G{Dependency-correlated?}
    G -->|Yes| H[Check dependency health + outbound networking]
    G -->|No| I[Check concurrency + memory + cold start]
    B -->|No| J{Trigger not firing?}
    J -->|Yes| K[Check listener status + connection config]
    J -->|No| L{Performance degradation?}
    L -->|Yes| M[Check dependencies + scaling + storage]
    L -->|No| N[Review evidence-map for matching symptoms]

Representative log patterns (quick reference)¶

Pattern	Indicates	Severity	Next Action
`Container didn't respond to HTTP pings`	Host startup failure	Critical	Check host logs and recent deploy activity
`Storage operation failed: (403) Forbidden`	Storage auth broken	Critical	Check managed identity assignments and RBAC scope
`Host started (>10000ms)`	Severe cold start	Warning	Check dependency initialization path and hosting plan
`Message has been dequeued 'N' time(s)`	Poison message loop	Warning	Check handler idempotency and `maxDequeueCount`
`getaddrinfo ENOTFOUND`	DNS resolution failure	Critical	Check VNet integration and private DNS zones

Quick investigation flow¶

For architecture context, see Troubleshooting Architecture.
For "where do I look first?", see Evidence Map.
For fast triage sequence, start at First 10 Minutes.

Updated section map¶

Document	Coverage
First 10 Minutes	Time-boxed triage checks for active incidents
Decision Tree	Visual routing from symptom to investigation path
Mental Model	Conceptual framework for Azure Functions troubleshooting
Playbooks	Scenario-based diagnostics and mitigations
Methodology	Reproducible Observe → Hypothesize → Test → Fix → Verify workflow
KQL Query Library	Reusable telemetry and evidence queries
Troubleshooting Architecture	Component boundaries and failure-domain context
Evidence Map	Symptom-to-evidence lookup for first-query selection
Lab Guides	Failure drills for response readiness

Scope and source policy¶

Guidance in this section follows Microsoft Learn documentation for Azure Functions, App Service, Application Insights, and Azure Monitor.
Product behavior, limits, and trigger specifics should always be validated against the linked Learn references.
Examples use masked identifiers (<subscription-id>, xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) to avoid exposing PII.

Troubleshooting¶

What this section covers¶

Suggested incident flow¶

Troubleshooting mental model¶

Decision tree¶

Representative log patterns (quick reference)¶

Quick investigation flow¶

Updated section map¶

Scope and source policy¶

See Also¶

Cross-service references¶

Sources¶