Correlation Queries¶
KQL queries for correlating signals across telemetry sources to connect symptoms to root causes.
flowchart LR
A[Collect OperationId] --> B[Query AppRequests]
B --> C[Join dependencies]
C --> D[Join exceptions and traces]
D --> E[Build timeline]
E --> F[Validate root-cause hypothesis] Single invocation correlation¶
Use when you already have an OperationId from a failed request.
let opId = "<operation-id>";
union isfuzzy=true
(
AppRequests
| where OperationId == opId
| project TimeGenerated, itemType="request", name=OperationName, success, ResultCode, duration, details=tostring(url)
),
(
AppDependencies
| where OperationId == opId
| project TimeGenerated, itemType="dependency", name=target, success, ResultCode, duration, details=tostring(data)
),
(
AppExceptions
| where OperationId == opId
| project TimeGenerated, itemType="exception", name=type, success=bool(false), ResultCode="", duration=real(null), details=outerMessage
),
(
AppTraces
| where OperationId == opId
| project TimeGenerated, itemType="trace", name="trace", success=bool(true), ResultCode="", duration=real(null), details=Message
)
| order by TimeGenerated asc
Example result:
| TimeGenerated | itemType | name | success | ResultCode | duration | details |
|---|---|---|---|---|---|---|
| 2026-04-04T11:32:30.000Z | request | Functions.ErrorHandler | false | 500 | 12.80 | https://func-myapp-prod.azurewebsites.net/api/exceptions/unhandled |
| 2026-04-04T11:32:30.000Z | trace | trace | true | Executing 'Functions.ErrorHandler' (Reason='This function was programmatically called via the host APIs.', Id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) | ||
| 2026-04-04T11:32:30.000Z | trace | trace | true | Unhandled exception endpoint requested | ||
| 2026-04-04T11:32:30.000Z | exception | Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcException | false | Exception while executing function: Functions.ErrorHandler | ||
| 2026-04-04T11:32:30.000Z | trace | trace | true | Executed 'Functions.ErrorHandler' (Failed, Id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, Duration=8ms) |
How to interpret:
| Indicator | Normal | Warning | Critical |
|---|---|---|---|
| request -> dependency -> trace sequence | Complete and successful | Complete with retries | Broken by exception/failure |
| First failing component in timeline | None | Delayed dependency | Immediate dependency/auth failure |
| Total invocation timeline | < 1000ms | 1000-5000ms | > 5000ms |
Normal vs abnormal
Normal: Request and dependencies succeed, no exception event, short timeline.
Abnormal: Request failure follows dependency 403 and exception record in the same OperationId, proving downstream auth or connectivity as root cause.
Latency vs error correlation¶
Correlate rising latency with rising error rates to identify whether latency precedes errors (dependency bottleneck) or errors precede latency (retry storms).
let appName = "func-myapp-prod";
AppRequests
| where TimeGenerated > ago(2h)
| where AppRoleName =~ appName
| where OperationName startswith "Functions."
| summarize
P95Ms = round(percentile(duration, 95), 2),
ErrorRate = round(100.0 * countif(success == false) / count(), 2),
Invocations = count()
by bin(TimeGenerated, 5m)
| order by TimeGenerated asc
Example result:
| TimeGenerated | P95Ms | ErrorRate | Invocations |
|---|---|---|---|
| 2026-04-04T10:00:00Z | 245 | 0.00 | 120 |
| 2026-04-04T10:05:00Z | 280 | 0.00 | 135 |
| 2026-04-04T10:10:00Z | 1250 | 0.50 | 142 |
| 2026-04-04T10:15:00Z | 3800 | 8.20 | 98 |
| 2026-04-04T10:20:00Z | 6200 | 22.40 | 64 |
How to interpret:
| Pattern | Meaning | Root Cause Direction |
|---|---|---|
| Latency rises before errors | Dependency slowdown causing timeouts | Investigate downstream dependencies |
| Errors rise before latency | Application failures causing retry storms | Investigate application exceptions |
| Both rise simultaneously | Capacity saturation | Investigate scaling and resource limits |
| Latency stable but errors spike | Application logic error (fast failures) | Investigate code changes and config |
How to Read This
Plot P95Ms and ErrorRate on the same timeline. If latency climbs 2-3 bins before errors appear, the root cause is almost always a dependency bottleneck. If errors appear first, look at application code or configuration changes.
Restarts vs latency correlation¶
Correlate host restart events with latency spikes to identify whether restarts cause cold start latency or are caused by unhealthy state.
let appName = "func-myapp-prod";
let restarts = AppTraces
| where TimeGenerated > ago(6h)
| where AppRoleName =~ appName
| where Message has "Host started"
| summarize RestartCount = count() by bin(TimeGenerated, 5m);
let latency = AppRequests
| where TimeGenerated > ago(6h)
| where AppRoleName =~ appName
| where OperationName startswith "Functions."
| summarize P95Ms = round(percentile(duration, 95), 2), Invocations = count() by bin(TimeGenerated, 5m);
restarts
| join kind=fullouter latency on TimeGenerated
| project TimeGenerated = coalesce(TimeGenerated, TimeGenerated1), RestartCount = coalesce(RestartCount, 0), P95Ms = coalesce(P95Ms, 0.0), Invocations = coalesce(Invocations, 0)
| order by TimeGenerated asc
How to interpret:
| Pattern | Meaning |
|---|---|
| Restart → latency spike → recovery | Normal cold start behavior |
| Latency spike → restart | Unhealthy state caused restart (OOM, timeout) |
| Repeated restart + sustained high latency | Crash loop — investigate host logs |
| Restart with no latency change | Graceful restart or scale event |