Skip to content

Troubleshooting

Use this section when Azure Functions workloads are degraded, failing, or behaving unexpectedly. It is designed for incident response first, then root-cause analysis and prevention.

Operations Guide

For monitoring setup and alert configuration, see Monitoring and Alerts.

What this section covers

  • First 10 Minutes: incident triage checklist for rapid stabilization.
  • Decision Tree: visual routing from symptom to investigation path.
  • Mental Model: conceptual framework for Azure Functions troubleshooting.
  • Playbooks: scenario runbooks with symptoms, diagnosis, and fixes.
  • Methodology: repeatable troubleshooting workflow for complex incidents.
  • KQL Query Library: ready-to-use Application Insights and Log Analytics queries.
  • Lab Guides: hands-on failure simulations to practice response.

Suggested incident flow

  1. Start with First 10 Minutes to verify platform health and blast radius.
  2. Move to Playbooks for scenario-specific diagnosis paths.
  3. Use KQL Query Library to validate hypotheses with telemetry.
  4. Apply Methodology to avoid guesswork and reduce MTTR.
  5. Rehearse with Lab Guides to improve operational readiness.

Troubleshooting mental model

Use this classification first to narrow where to collect evidence.

Category Examples First Check Typical Evidence
Request path issue 5xx, timeout, 403, connection refused requests + exceptions tables HTTP status codes, error types
App startup issue Host not starting, container ping failure, health check timeout traces table (host lifecycle) Host started missing, startup duration
Runtime degradation Memory pressure, GIL contention, thread pool starvation customMetrics, process metrics CPU/memory trends, cold start frequency
Dependency / outbound issue DNS failure, SNAT exhaustion, private endpoint unreachable dependencies table Failed dependency calls, target resolution
Deployment / recycle event Post-deploy failures, slot swap issues, config drift Activity Log, traces Deploy events, host restart events

About customMetrics

The customMetrics table contains metrics explicitly emitted by your application or SDK. Only a few metrics (for example, FunctionExecutionCount, FunctionExecutionUnits) are emitted automatically by the Azure Functions runtime. Queue-related metrics and custom business metrics require explicit instrumentation.

Decision tree

flowchart TD
    A[Issue detected] --> B{Is it a 5xx issue?}
    B -->|Yes| C{Intermittent or constant?}
    C -->|Constant| D[Check host startup + recent deploy]
    C -->|Intermittent| E{Recent deployment?}
    E -->|Yes| F[Compare before/after metrics, consider rollback]
    E -->|No| G{Dependency-correlated?}
    G -->|Yes| H[Check dependency health + outbound networking]
    G -->|No| I[Check concurrency + memory + cold start]
    B -->|No| J{Trigger not firing?}
    J -->|Yes| K[Check listener status + connection config]
    J -->|No| L{Performance degradation?}
    L -->|Yes| M[Check dependencies + scaling + storage]
    L -->|No| N[Review evidence-map for matching symptoms]

Representative log patterns (quick reference)

Pattern Indicates Severity Next Action
Container didn't respond to HTTP pings Host startup failure Critical Check host logs and recent deploy activity
Storage operation failed: (403) Forbidden Storage auth broken Critical Check managed identity assignments and RBAC scope
Host started (>10000ms) Severe cold start Warning Check dependency initialization path and hosting plan
Message has been dequeued 'N' time(s) Poison message loop Warning Check handler idempotency and maxDequeueCount
getaddrinfo ENOTFOUND DNS resolution failure Critical Check VNet integration and private DNS zones

Quick investigation flow

Updated section map

Document Coverage
First 10 Minutes Time-boxed triage checks for active incidents
Decision Tree Visual routing from symptom to investigation path
Mental Model Conceptual framework for Azure Functions troubleshooting
Playbooks Scenario-based diagnostics and mitigations
Methodology Reproducible Observe → Hypothesize → Test → Fix → Verify workflow
KQL Query Library Reusable telemetry and evidence queries
Troubleshooting Architecture Component boundaries and failure-domain context
Evidence Map Symptom-to-evidence lookup for first-query selection
Lab Guides Failure drills for response readiness

Scope and source policy

  • Guidance in this section follows Microsoft Learn documentation for Azure Functions, App Service, Application Insights, and Azure Monitor.
  • Product behavior, limits, and trigger specifics should always be validated against the linked Learn references.
  • Examples use masked identifiers (<subscription-id>, xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) to avoid exposing PII.

See Also

Sources