Functions Diagnostics (Execution Failures and Timeouts)¶

Analyze Azure Functions telemetry in Application Insights to identify slow executions, timeout patterns, and the functions that are producing the highest failure rates in a recent incident window.

Scenario¶

You need to identify which functions are failing or running longer than expected in the last 24 hours so that you can separate host saturation from function-specific code issues.

KQL Query¶

requests
| where timestamp > ago(24h)
| where cloud_RoleName has "func" or operation_Name has "Function"
| extend FunctionName = coalesce(operation_Name, name)
| summarize
    InvocationCount = count(),
    FailureCount = countif(success == false),
    AvgDurationMs = avg(duration / 1ms),
    P95DurationMs = percentile(duration / 1ms, 95)
    by FunctionName
| extend FailureRate = round(todouble(FailureCount) * 100.0 / InvocationCount, 2)
| where FailureCount > 0 or P95DurationMs > 10000
| order by FailureCount desc, P95DurationMs desc
| take 15

Data Flow¶

graph TD
    A[Application Insights requests table] --> B[Filter by 24h]
    B --> C[Map records to function names]
    C --> D[Summarize failures and durations]
    D --> E[Flag slow or failing functions]
    E --> F[Prioritize top offenders]

Sample Output¶

FunctionName	InvocationCount	FailureCount	AvgDurationMs	P95DurationMs	FailureRate
ProcessOrders	482	37	1840	15320	7.68
CleanupTimer	96	6	9230	61100	6.25

How to Read This¶

High FailureRate with low duration often points to dependency or input validation errors, while high P95DurationMs suggests timeout pressure, cold start amplification, or downstream latency. When a timer or queue-triggered function is slow but not failing often, check backlog growth and host concurrency limits before focusing only on exceptions.

Limitations¶

This query assumes Azure Functions telemetry is landing in Application Insights requests data.
Function naming can vary depending on trigger type, host version, and custom telemetry enrichment.
Timeout root cause analysis often requires joining with traces, exceptions, or dependency telemetry for full context.