Dependency Queries¶
KQL queries for analyzing outbound dependency failures and queue processing latency.
flowchart TD
A[Dependency signals] --> B[Failure rate by target]
A --> C[Status code distribution]
A --> D[Queue latency metrics]
B --> E[Identify failing dependency]
C --> E
D --> E Dependency call failures¶
let appName = "func-myapp-prod";
AppDependencies
| where TimeGenerated > ago(1h)
| where AppRoleName =~ appName
| summarize
Calls=count(),
Failed=countif(success == false),
FailureRatePercent=round(100.0 * countif(success == false) / count(), 2),
P95Ms=round(percentile(duration, 95), 2)
by target, type
| order by Failed desc, P95Ms desc
Example result:
| target | type | Calls | Failed | FailureRatePercent | P95Ms |
|---|---|---|---|---|---|
| api.partner.internal | HTTP | 28 | 0 | 0.00 | 1260 |
How to interpret:
| Indicator | Normal | Warning | Critical |
|---|---|---|---|
| Dependency failure rate | < 0.5% | 0.5-2% | > 2% |
| Dependency P95Ms | < 300ms | 300-1000ms | > 1000ms |
| Failed calls concentration by target | No single target > 30% | One target 30-60% | One target > 60% |
Normal vs abnormal
Normal: Failures are sparse across multiple targets with low latency.
Abnormal: A single target has both high failure rate and high latency. Treat that target as the primary blast radius source.
Dependency failures by status code¶
let appName = "func-myapp-prod";
AppDependencies
| where TimeGenerated > ago(2h)
| where AppRoleName =~ appName
| where success == false
| summarize Count=count() by target, ResultCode, type
| order by Count desc
How to interpret:
| Status Code | Meaning | Typical Cause |
|---|---|---|
| 401 | Unauthorized | Managed identity token expired or role not assigned |
| 403 | Forbidden | RBAC role missing or firewall blocking |
| 404 | Not found | Wrong endpoint URL or resource deleted |
| 408/504 | Timeout | Dependency overloaded or network latency |
| 429 | Throttled | Rate limit exceeded on downstream service |
| 500 | Server error | Downstream service failure |
Queue processing latency¶
Custom instrumentation required
The following queue metrics (QueueMessageAgeMs, QueueProcessingLatencyMs, QueueDequeueDelayMs) are not emitted by the Azure Functions runtime by default. Your application must explicitly emit these using TelemetryClient.TrackMetric() (C#) or the OpenTelemetry SDK. If you have not added custom instrumentation, these queries will return empty results. For built-in queue monitoring, use Azure Storage metrics via az monitor metrics list.
let appName = "func-myapp-prod";
AppMetrics
| where TimeGenerated > ago(2h)
| where AppRoleName =~ appName
| where name in ("QueueMessageAgeMs", "QueueProcessingLatencyMs", "QueueDequeueDelayMs")
| summarize AvgMs=avg(value), P95Ms=percentile(value, 95), MaxMs=max(value) by MetricName=name, bin(TimeGenerated, 5m)
| order by TimeGenerated desc
Example result:
| MetricName | TimeGenerated | AvgMs | P95Ms | MaxMs |
|---|---|---|---|---|
| QueueProcessingLatencyMs | 2026-04-04T09:10:00Z | 420 | 860 | 1,430 |
| QueueProcessingLatencyMs | 2026-04-04T09:05:00Z | 5,220 | 12,480 | 28,200 |
| QueueMessageAgeMs | 2026-04-04T09:05:00Z | 41,800 | 79,200 | 124,000 |
| QueueDequeueDelayMs | 2026-04-04T09:05:00Z | 3,880 | 7,120 | 11,340 |
This query returned no results because the reference application does not emit custom queue metrics. In production, you would see data here only if your application explicitly emits
QueueMessageAgeMs,QueueProcessingLatencyMs, orQueueDequeueDelayMsviaTelemetryClient.TrackMetric()or the OpenTelemetry SDK.
How to interpret:
| Indicator | Normal | Warning | Critical |
|---|---|---|---|
| QueueProcessingLatencyMs Avg | < 1000ms | 1000-5000ms | > 5000ms |
| QueueMessageAgeMs P95 | < 10000ms | 10000-60000ms | > 60000ms |
| QueueDequeueDelayMs Avg | < 500ms | 500-2000ms | > 2000ms |
Normal vs abnormal
Normal: AvgMs and P95Ms move together at low values.
Abnormal: Short-window spike where QueueMessageAgeMs and QueueProcessingLatencyMs jump together indicates throughput collapse or scaling lag.
Storage dependency health¶
let appName = "func-myapp-prod";
AppDependencies
| where TimeGenerated > ago(1h)
| where AppRoleName =~ appName
| where type == "Azure blob" or type == "Azure queue" or type == "Azure table"
| summarize
Calls=count(),
Failed=countif(success == false),
P95Ms=round(percentile(duration, 95), 2)
by target, type
| order by Failed desc
How to interpret:
Storage is critical infrastructure for Azure Functions (leases, triggers, Durable Functions state). Any storage failure pattern should be treated as high priority.