Dependency Queries¶

KQL queries for analyzing outbound dependency failures and queue processing latency.

flowchart TD
    A[Dependency signals] --> B[Failure rate by target]
    A --> C[Status code distribution]
    A --> D[Queue latency metrics]
    B --> E[Identify failing dependency]
    C --> E
    D --> E

Dependency call failures¶

let appName = "func-myapp-prod";
AppDependencies
| where TimeGenerated > ago(1h)
| where AppRoleName =~ appName
| summarize
    Calls=count(),
    Failed=countif(success == false),
    FailureRatePercent=round(100.0 * countif(success == false) / count(), 2),
    P95Ms=round(percentile(duration, 95), 2)
  by target, type
| order by Failed desc, P95Ms desc

Example result:

target	type	Calls	Failed	FailureRatePercent	P95Ms
api.partner.internal	HTTP	28	0	0.00	1260

How to interpret:

Indicator	Normal	Warning	Critical
Dependency failure rate	< 0.5%	0.5-2%	> 2%
Dependency P95Ms	< 300ms	300-1000ms	> 1000ms
Failed calls concentration by target	No single target > 30%	One target 30-60%	One target > 60%

Normal vs abnormal

Normal: Failures are sparse across multiple targets with low latency.

Abnormal: A single target has both high failure rate and high latency. Treat that target as the primary blast radius source.

Dependency failures by status code¶

let appName = "func-myapp-prod";
AppDependencies
| where TimeGenerated > ago(2h)
| where AppRoleName =~ appName
| where success == false
| summarize Count=count() by target, ResultCode, type
| order by Count desc

How to interpret:

Status Code	Meaning	Typical Cause
401	Unauthorized	Managed identity token expired or role not assigned
403	Forbidden	RBAC role missing or firewall blocking
404	Not found	Wrong endpoint URL or resource deleted
408/504	Timeout	Dependency overloaded or network latency
429	Throttled	Rate limit exceeded on downstream service
500	Server error	Downstream service failure

Queue processing latency¶

Custom instrumentation required

The following queue metrics (QueueMessageAgeMs, QueueProcessingLatencyMs, QueueDequeueDelayMs) are not emitted by the Azure Functions runtime by default. Your application must explicitly emit these using TelemetryClient.TrackMetric() (C#) or the OpenTelemetry SDK. If you have not added custom instrumentation, these queries will return empty results. For built-in queue monitoring, use Azure Storage metrics via az monitor metrics list.

let appName = "func-myapp-prod";
AppMetrics
| where TimeGenerated > ago(2h)
| where AppRoleName =~ appName
| where name in ("QueueMessageAgeMs", "QueueProcessingLatencyMs", "QueueDequeueDelayMs")
| summarize AvgMs=avg(value), P95Ms=percentile(value, 95), MaxMs=max(value) by MetricName=name, bin(TimeGenerated, 5m)
| order by TimeGenerated desc

Example result:

MetricName	TimeGenerated	AvgMs	P95Ms	MaxMs
QueueProcessingLatencyMs	2026-04-04T09:10:00Z	420	860	1,430
QueueProcessingLatencyMs	2026-04-04T09:05:00Z	5,220	12,480	28,200
QueueMessageAgeMs	2026-04-04T09:05:00Z	41,800	79,200	124,000
QueueDequeueDelayMs	2026-04-04T09:05:00Z	3,880	7,120	11,340

This query returned no results because the reference application does not emit custom queue metrics. In production, you would see data here only if your application explicitly emits QueueMessageAgeMs, QueueProcessingLatencyMs, or QueueDequeueDelayMs via TelemetryClient.TrackMetric() or the OpenTelemetry SDK.

How to interpret:

Indicator	Normal	Warning	Critical
QueueProcessingLatencyMs Avg	< 1000ms	1000-5000ms	> 5000ms
QueueMessageAgeMs P95	< 10000ms	10000-60000ms	> 60000ms
QueueDequeueDelayMs Avg	< 500ms	500-2000ms	> 2000ms

Normal vs abnormal

Normal: AvgMs and P95Ms move together at low values.

Abnormal: Short-window spike where QueueMessageAgeMs and QueueProcessingLatencyMs jump together indicates throughput collapse or scaling lag.

Storage dependency health¶

let appName = "func-myapp-prod";
AppDependencies
| where TimeGenerated > ago(1h)
| where AppRoleName =~ appName
| where type == "Azure blob" or type == "Azure queue" or type == "Azure table"
| summarize
    Calls=count(),
    Failed=countif(success == false),
    P95Ms=round(percentile(duration, 95), 2)
  by target, type
| order by Failed desc

How to interpret:

Storage is critical infrastructure for Azure Functions (leases, triggers, Durable Functions state). Any storage failure pattern should be treated as high priority.

Dependency Queries¶

Dependency call failures¶

Dependency failures by status code¶

Queue processing latency¶

Storage dependency health¶

See Also¶

Sources¶