Alert Not Firing¶

1. Summary¶

An Azure Monitor alert rule exists, the underlying symptom appears real, but no alert is created or no notification reaches the responders. This playbook applies to metric alerts, scheduled query alerts, and the handoff from rule evaluation to action group delivery. Use it when teams say "the threshold definitely happened" yet the incident history shows no matching alert or only a partial alert lifecycle.

Microsoft Learn troubleshooting guidance separates this scenario into three layers: the monitored signal, the alert rule evaluation logic, and the notification path. Most "did not fire" incidents are not platform outages. They are timing mismatches, aggregation misunderstandings, disabled rules, alert processing rules, or action group delivery failures.

Typical incident window: 5-15 minutes for metric alerts and 10-20 minutes for scheduled query rules, depending on evaluation frequency and window size. Time to resolution: 30 minutes to 2 hours depending on whether the miss is signal logic, rule state, suppression, or receiver delivery.

Use it when:

A metric clearly spiked, but the metric alert did not activate.
A log query returns rows in Logs, but the scheduled query rule did not fire.
An alert activated in history, but email, webhook, ITSM, or automation never triggered.
Maintenance suppression, auto-mitigation, or dimension configuration may have changed the outcome.

flowchart TD
    A[Alert did not fire] --> B{Did the signal truly meet rule logic?}
    B -->|No| C[Fix threshold, aggregation, window, or query]
    B -->|Yes| D{Was the rule enabled and evaluated?}
    D -->|No| E[Fix rule state or scope]
    D -->|Yes| F{Was notification suppressed or failed?}
    F -->|Suppressed| G[Review alert processing rules]
    F -->|Failed| H[Review action group delivery]
    F -->|No evidence| I[Check evaluation timing and ingestion delay]

2. Common Misreadings¶

Observation	Often Misread As	Actually Means
Portal chart shows a spike	Alert should have fired automatically	Metric alerts evaluate aggregation, frequency, dimensions, and threshold, not screenshots of the chart.
Log query returns rows right now	The scheduled query rule should have fired earlier	The rule ran on an earlier window and may have seen no rows yet because of ingestion delay or different time grain.
Alert history is empty	Notification failed	The rule may never have activated, so the investigation must start before action groups.
Alert is in history but no email arrived	Rule logic is wrong	The rule may have worked and the failure is in action group delivery or suppression.
Dynamic threshold or dimensions were enabled	Azure Monitor is unpredictable	Those features change how the signal is evaluated and can suppress what looks obvious in a raw chart.
Auto-mitigated alert resolved quickly	The alert never happened	The alert may have activated and resolved before responders saw the notification.

3. Competing Hypotheses¶

Hypothesis	Likelihood	Key Discriminator
The signal never met the exact rule logic	High	Manual replay of the same aggregation, dimensions, and window disproves the assumption that the threshold was crossed.
The alert rule was disabled, mis-scoped, or unhealthy	High	CLI shows disabled state, wrong scopes, or unexpected rule configuration.
Evaluation timing or ingestion delay prevented activation	Medium	Log data appears later than the rule window, or the metric spike was shorter than the evaluation criteria.
An alert processing rule suppressed notifications	Medium	Alert history shows activation or expected conditions, but processing rules target the scope and time.
Action group delivery failed	Medium	Alert history exists but email, webhook, SMS, or automation execution failed.
Dimensions or dynamic thresholds changed the expected result	Low	The rule uses per-dimension evaluation or adaptive thresholds that differ from the operator's mental model.

4. What to Check First¶

Show metric alert rule state, scope, window, and evaluation frequency

az monitor metrics alert show \
    --resource-group $RG \
    --name $ALERT_RULE_NAME \
    --output json

If it is a scheduled query rule, inspect evaluation settings and scopes

az monitor scheduled-query show \
    --resource-group $RG \
    --name $ALERT_RULE_NAME \
    --output json

Replay a control query against the workspace when the rule is log-based

az monitor log-analytics query \
    --workspace $WORKSPACE_ID \
    --analytics-query "Perf | where TimeGenerated > ago(15m) | where ObjectName == 'Processor' and CounterName == '% Processor Time' | summarize AvgCPU=avg(CounterValue) by bin(TimeGenerated, 5m), Computer" \
    --timespan "PT15M"

List action groups wired to the rule

az monitor action-group list \
    --resource-group $RG \
    --output table

Review alert processing rules that can suppress notifications

az monitor alert-processing-rule list \
    --resource-group $RG \
    --output json

Confirm the target resource and dimensions actually match the rule scope

az resource show \
    --ids $RESOURCE_ID \
    --query "{id:id,type:type,name:name,location:location}"

5. Evidence to Collect¶

5.1 KQL Queries¶

// Alert-related Azure Activity entries for recent rule operations
AzureActivity
| where TimeGenerated > ago(7d)
| where OperationNameValue has_any ("Microsoft.Insights/metricAlerts", "Microsoft.Insights/scheduledQueryRules", "Microsoft.Insights/actionGroups")
| project TimeGenerated, OperationNameValue, ActivityStatusValue, ResourceGroup, ResourceId, Caller
| order by TimeGenerated desc

Column	Example data	Interpretation
`OperationNameValue`	`Microsoft.Insights/scheduledQueryRules/write`	Confirms when the rule was created or updated.
`ActivityStatusValue`	`Succeeded`	A recent config change may align with the missed alert.
`ResourceId`	`/subscriptions/<subscription-id>/resourceGroups/rg-monitor/providers/Microsoft.Insights/scheduledQueryRules/cpu-breach`	Use this to confirm the exact rule under investigation.
`Caller`	`<object-id>`	Identify who or what changed the rule.

How to Read This

Start with recent writes or deletes. Many missed alerts follow an innocent change to frequency, scopes, dimensions, or action groups rather than a platform malfunction.

// Check whether action-group delivery failures were recorded
AzureActivity
| where TimeGenerated > ago(7d)
| where OperationNameValue == "Microsoft.Insights/actionGroups/notification/action"
| summarize Attempts=count(), Failures=countif(ActivityStatusValue == "Failed") by ResourceGroup, ActivityStatusValue, bin(TimeGenerated, 1h)
| order by TimeGenerated desc

Column	Example data	Interpretation
`Attempts`	`12`	Notification pipeline was exercised.
`Failures`	`3`	Nonzero values point to delivery issues, not evaluation logic.
`ActivityStatusValue`	`Failed`	Failures justify checking action group endpoints and receivers.
`TimeGenerated`	`hourly bucket`	Correlate to the expected incident window.

How to Read This

If this query is empty and alert history is also empty, stop blaming the action group. The rule probably never activated.

// Example replay pattern for a scheduled query alert signal
Perf
| where TimeGenerated between (ago(2h) .. now())
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize AvgCPU = avg(CounterValue) by bin(TimeGenerated, 5m), Computer
| order by TimeGenerated asc

Column	Example data	Interpretation
`Computer`	`vm-prod-01`	Evaluate whether the rule should have fired per machine or across the fleet.
`TimeGenerated`	`5-minute bucket`	Match this to the alert evaluation frequency and window.
`AvgCPU`	`91.7`	Compare to the actual rule threshold and operator.
Series shape	`brief spike`	A short spike can look severe in a chart but still fail the alert's evaluation logic.

How to Read This

Replay the signal exactly as the rule evaluates it. If the rule uses 5-minute averages and two evaluation periods, a one-minute spike is not enough evidence.

// Estimate ingestion delay for a log-search alert source table
Perf
| where TimeGenerated > ago(6h)
| extend DelayMinutes = datetime_diff('minute', ingestion_time(), TimeGenerated)
| summarize AvgDelay=avg(DelayMinutes), P95Delay=percentile(DelayMinutes, 95), MaxDelay=max(DelayMinutes)

Column	Example data	Interpretation
`AvgDelay`	`1.6`	Typical ingestion lag.
`P95Delay`	`9`	If this approaches the rule window, misses become plausible.
`MaxDelay`	`17`	Large delay spikes justify widening alert windows or fixing the source pipeline.

How to Read This

Scheduled query rules are only as good as the data they can see in their evaluation window. If delay is close to or larger than that window, late data can make the rule look broken.

5.2 CLI Investigation¶

# Inspect a metric alert rule
az monitor metrics alert show \
    --resource-group $RG \
    --name $ALERT_RULE_NAME \
    --output json

Sample output:

{
  "enabled": true,
  "evaluationFrequency": "PT1M",
  "scopes": [
    "/subscriptions/<subscription-id>/resourceGroups/rg-prod/providers/Microsoft.Compute/virtualMachines/vm-prod-01"
  ],
  "severity": 2,
  "windowSize": "PT5M"
}

Interpretation:

enabled must be true.
evaluationFrequency and windowSize define what a real threshold crossing means.
Incorrect scopes are a common explanation for "this alert never fires."

# Inspect a scheduled query rule
az monitor scheduled-query show \
    --resource-group $RG \
    --name $ALERT_RULE_NAME \
    --output json

Sample output:

{
  "enabled": true,
  "evaluationFrequency": "PT5M",
  "scopes": [
    "/subscriptions/<subscription-id>/resourceGroups/rg-monitor/providers/Microsoft.OperationalInsights/workspaces/law-prod"
  ],
  "windowSize": "PT10M"
}

Interpretation:

Replay the rule against the same windowSize and evaluationFrequency.
Disabled or mis-scoped rules are configuration problems, not alerting bugs.
Keep this output beside the manual KQL replay to avoid comparing different windows.

# List alert processing rules that may suppress notifications
az monitor alert-processing-rule list \
    --resource-group $RG \
    --output json

Sample output:

[
  {
    "enabled": true,
    "name": "suppress-maintenance-window",
    "scopes": [
      "/subscriptions/<subscription-id>/resourceGroups/rg-prod"
    ]
  }
]

Interpretation:

If scope and schedule overlap the incident, notifications may be intentionally suppressed.
This does not necessarily stop the alert from evaluating; it changes downstream delivery.
Always compare suppression scope with the rule target resource.

# Inspect the action group tied to the rule
az monitor action-group show \
    --resource-group $RG \
    --name $ACTION_GROUP_NAME \
    --output json

Sample output:

{
  "enabled": true,
  "emailReceivers": [
    {
      "name": "oncall-email",
      "status": "Enabled"
    }
  ],
  "webhookReceivers": [
    {
      "name": "incident-webhook",
      "serviceUri": "https://example.invalid/alerts"
    }
  ]
}

Interpretation:

Disabled or misconfigured receivers explain missing notifications after activation.
Check webhook endpoints and automation permissions if delivery is inconsistent.
Keep the action group investigation separate from rule evaluation logic.

6. Validation and Disproof by Hypothesis¶

Hypothesis 1: The signal never met the exact rule logic¶

Proves if: Section 5.1 Query 3 fails to cross the actual threshold across the actual window and dimensions.

Disproves if: Manual replay exactly matches the rule logic and should have activated.

Test with: Section 5.1 Query 3 plus Section 5.2 CLI command 1 or 2.

Hypothesis 2: The alert rule was disabled, mis-scoped, or unhealthy¶

Proves if: CLI shows enabled: false, wrong scope, or configuration drift.

Disproves if: Rule state and scope are correct.

Test with: Section 5.2 CLI command 1 or 2 and Section 5.1 Query 1.

Hypothesis 3: Evaluation timing or ingestion delay prevented activation¶

Proves if: Data arrives after the rule window or only brief spikes occur inside a longer evaluation window.

Disproves if: Delay is low and the signal sustained the threshold across the required period.

Test with: Section 5.1 Queries 3 and 4.

Hypothesis 4: An alert processing rule suppressed notifications¶

Proves if: Processing rules target the same scope and time as the missed incident.

Disproves if: No relevant processing rule exists.

Test with: Section 5.2 CLI command 3.

Hypothesis 5: Action group delivery failed¶

Proves if: Alert history or Azure Activity shows activation but notification actions failed.

Disproves if: There is no evidence the alert ever activated.

Test with: Section 5.1 Query 2 plus Section 5.2 CLI command 4.

Hypothesis 6: Dimensions or dynamic thresholds changed the expected result¶

Proves if: The rule uses dimensions or adaptive logic that differ from the operator's broad chart view.

Disproves if: The rule uses simple static thresholds with no dimension splits.

Test with: Section 5.2 CLI command 1 or 2 and manual signal replay per dimension.

7. Likely Root Cause Patterns¶

Pattern	Evidence	Resolution
Threshold replay was done with the wrong window	Operator uses raw chart, but rule uses averaged evaluation periods	Replay with exact rule window, frequency, and dimension handling.
Rule was disabled or scope drifted after maintenance	Azure Activity shows a recent write and CLI reveals changed settings	Restore the intended enabled state and target scope.
Scheduled query alert window was too short for ingestion delay	KQL results appear after the expected evaluation window	Increase evaluation window or fix source-side delay.
Alert processing rule suppressed expected notification	Rule history exists or conditions were met, but suppression scope matches the resource	Narrow or disable the suppression during the relevant period.
Action group receiver failed after activation	Notification attempts show failures while rule configuration is otherwise correct	Repair receiver configuration, endpoint health, or permissions.

Normal vs Abnormal Comparison¶

Metric/Log	Normal State	Abnormal State	Threshold
Metric alert evaluation	Signal exceeds threshold across the configured `windowSize` and `evaluationFrequency`	Signal spikes briefly but does not satisfy the actual rule window	Rule-specific
Scheduled query alert data freshness	Source table data arrives inside the evaluation window	Ingestion delay pushes rows outside the evaluation window	Delay near `windowSize`
Alert rule state	`enabled` is true and scope matches intended target	Disabled rule or wrong scope/resource	Any mismatch
Alert processing rules	No overlapping suppression for the incident scope and time	Scope and schedule suppress expected notifications	Any overlapping suppress rule
Action group delivery	Receivers enabled and reachable	Alert activates but notifications fail or never leave the action group	Any failed delivery

8. Immediate Mitigations¶

Re-enable the alert rule if it was disabled.

az monitor metrics alert update \
    --resource-group $RG \
    --name $ALERT_RULE_NAME \
    --enabled true

Widen the scheduled query alert window when late data is the issue.

az monitor scheduled-query update \
    --resource-group $RG \
    --name $ALERT_RULE_NAME \
    --evaluation-frequency 5m \
    --window-size 15m

Disable or narrow an overlapping alert processing rule.

az monitor alert-processing-rule update \
    --resource-group $RG \
    --name $PROCESSING_RULE_NAME \
    --enabled false

Repair action group receivers and revalidate delivery.

az monitor action-group show \
    --resource-group $RG \
    --name $ACTION_GROUP_NAME \
    --query "{enabled:enabled,emailReceivers:emailReceivers,webhookReceivers:webhookReceivers}"

If dimensions caused the miss, temporarily simplify to the exact monitored target.

az monitor metrics alert show \
    --resource-group $RG \
    --name $ALERT_RULE_NAME \
    --query "{scopes:scopes,criteria:criteria}"

9. Prevention¶

Prevent missed alerts by validating the whole alert path, not just the threshold. Replay the rule logic against real historical data before relying on it in production.

Keep alert rule configuration review in change management.

az monitor scheduled-query show \
    --resource-group $RG \
    --name $ALERT_RULE_NAME \
    --query "{enabled:enabled,scopes:scopes,evaluationFrequency:evaluationFrequency,windowSize:windowSize}"

Avoid windows shorter than normal ingestion latency for log alerts.

az monitor scheduled-query update \
    --resource-group $RG \
    --name $ALERT_RULE_NAME \
    --window-size 15m

Keep suppression rules narrow in scope and duration.

az monitor alert-processing-rule list \
    --resource-group $RG \
    --output table

Test action groups as part of onboarding and after receiver changes.

az monitor action-group show \
    --resource-group $RG \
    --name $ACTION_GROUP_NAME \
    --output table

Finally, document whether each alert is per-resource, per-dimension, or fleet-wide. Most operator confusion comes from expecting one model while the rule actually uses another.

Alert Not Firing¶

1. Summary¶

2. Common Misreadings¶

3. Competing Hypotheses¶

4. What to Check First¶

5. Evidence to Collect¶

5.1 KQL Queries¶

5.2 CLI Investigation¶

6. Validation and Disproof by Hypothesis¶

Hypothesis 1: The signal never met the exact rule logic¶

Hypothesis 2: The alert rule was disabled, mis-scoped, or unhealthy¶

Hypothesis 3: Evaluation timing or ingestion delay prevented activation¶

Hypothesis 4: An alert processing rule suppressed notifications¶

Hypothesis 5: Action group delivery failed¶

Hypothesis 6: Dimensions or dynamic thresholds changed the expected result¶

7. Likely Root Cause Patterns¶

Normal vs Abnormal Comparison¶

8. Immediate Mitigations¶

9. Prevention¶

See Also¶

Sources¶