Alert Rule Management¶
Azure Monitor alert rules only help when they are consistent, actionable, and easy to verify after every change. This runbook covers day-2 operations for metric alerts, scheduled query alerts, and their action groups.
flowchart TD
Signal[Metric or log signal] --> Rule[Alert rule]
Rule --> Evaluation[Evaluation engine]
Evaluation --> ActionGroup[Action group]
ActionGroup --> Human[Ops notifications]
ActionGroup --> Automation[Webhook or automation] Prerequisites¶
- Azure CLI authenticated with
az login. - A target resource that already emits metrics or logs.
- At least one action group for notifications.
- Permissions:
Monitoring Contributorfor alert changes.Log Analytics Readeror better for log-query testing.
- Variables used below:
RG="rg-monitoring-prod" VM_ID="/subscriptions/<subscription-id>/resourceGroups/rg-prod/providers/Microsoft.Compute/virtualMachines/vm-prod-01" WORKSPACE_ID="/subscriptions/<subscription-id>/resourceGroups/rg-monitoring-prod/providers/Microsoft.OperationalInsights/workspaces/law-ops-central" ACTION_GROUP_ID="/subscriptions/<subscription-id>/resourceGroups/rg-monitoring-prod/providers/microsoft.insights/actionGroups/ag-oncall-team" ALERT_RULE_NAME="alert-vm-high-cpu"
When to Use¶
- A new workload needs baseline monitoring.
- Alert noise is increasing and thresholds need tuning.
- Notifications must be rerouted to a different action group.
- A rule must be disabled during planned maintenance.
- A production incident requires confirming whether the rule really evaluates as expected.
Procedure¶
Step 1: Inventory existing alert rules and action groups¶
List current metric alerts in the resource group before creating or changing anything.
az monitor metrics alert list \
--resource-group $RG \
--query "[].{name:name,enabled:enabled,severity:severity,scopes:scopes}" \
--output table
Name Enabled Severity Scopes
---------------------- --------- ---------- ------------------------------------------------------
alert-vm-high-cpu True 2 ['/subscriptions/<subscription-id>/.../virtualMachines/vm-prod-01']
alert-vm-heartbeat True 1 ['/subscriptions/<subscription-id>/.../virtualMachines/vm-prod-01']
az monitor action-group show \
--ids $ACTION_GROUP_ID \
--query "{name:name,shortName:groupShortName,enabled:enabled}" \
--output json
Step 2: Create a metric alert with explicit evaluation settings¶
Create the rule with an explicit window size, frequency, severity, and action group.
az monitor metrics alert create \
--name $ALERT_RULE_NAME \
--resource-group $RG \
--scopes $VM_ID \
--condition "avg Percentage CPU > 85" \
--window-size 5m \
--evaluation-frequency 1m \
--severity 2 \
--action $ACTION_GROUP_ID \
--description "CPU is above 85 percent for 5 minutes on vm-prod-01" \
--output json
{
"enabled": true,
"evaluationFrequency": "PT1M",
"name": "alert-vm-high-cpu",
"scopes": [
"/subscriptions/<subscription-id>/resourceGroups/rg-prod/providers/Microsoft.Compute/virtualMachines/vm-prod-01"
],
"severity": 2,
"windowSize": "PT5M"
}
Step 3: Add or update a scheduled query alert for log-based detection¶
Use scheduled query alerts for conditions that cannot be expressed as a single Azure Monitor metric.
az monitor scheduled-query create \
--name "alert-heartbeat-missing-sev2" \
--resource-group "$RG" \
--scopes "$WORKSPACE_ID" \
--condition "count 'HeartbeatMissing' > 0" \
--condition-query "HeartbeatMissing=Heartbeat | where Computer == \"vm-prod-01\" | where TimeGenerated < ago(5m)" \
--evaluation-frequency "5m" \
--window-size "5m" \
--severity 2 \
--skip-query-validation true \
--action-groups $ACTION_GROUP_ID \
--description "Trigger when vm-prod-01 has no fresh heartbeat data for five minutes." \
--output json
{
"actions": {
"actionGroups": [
"/subscriptions/<subscription-id>/resourceGroups/rg-monitoring-prod/providers/microsoft.insights/actionGroups/ag-oncall-team"
]
},
"enabled": true,
"evaluationFrequency": "PT5M",
"name": "alert-heartbeat-missing-sev2",
"severity": 2,
"windowSize": "PT5M"
}
az monitor log-analytics query \
--workspace $WORKSPACE_ID \
--analytics-query "Heartbeat | where Computer == 'vm-prod-01' and TimeGenerated > ago(5m) | count" \
--output table
Step 4: Tune or disable rules during maintenance and noise reduction¶
Update rules instead of deleting them when you want to preserve history and configuration intent.
az monitor metrics alert update \
--name $ALERT_RULE_NAME \
--resource-group $RG \
--description "CPU is above 90 percent for 10 minutes on vm-prod-01" \
--enabled false \
--output json
{
"description": "CPU is above 90 percent for 10 minutes on vm-prod-01",
"enabled": false,
"name": "alert-vm-high-cpu"
}
az monitor metrics alert update \
--name $ALERT_RULE_NAME \
--resource-group $RG \
--enabled true \
--output json
Step 5: Review rule state and recent firing history¶
Validate the final state of both metric and scheduled query alerts.
az monitor metrics alert show \
--name $ALERT_RULE_NAME \
--resource-group $RG \
--query "{name:name,enabled:enabled,severity:severity,windowSize:windowSize,evaluationFrequency:evaluationFrequency}" \
--output json
{
"enabled": true,
"evaluationFrequency": "PT1M",
"name": "alert-vm-high-cpu",
"severity": 2,
"windowSize": "PT5M"
}
az monitor activity-log list \
--resource-group $RG \
--offset 1d \
--query "[?contains(operationName.localizedValue, 'alert')].{time:eventTimestamp,status:status.value,operation:operationName.localizedValue}" \
--output table
Time Status Operation
--------------------------- --------- ------------------------------------
2026-04-05T08:42:09.000000Z Succeeded Create or Update Metric Alert Rule
2026-04-05T08:44:31.000000Z Succeeded Create or Update Scheduled Query Rule
Verification¶
List rules again and confirm state, severity, and naming conventions.
az monitor metrics alert list \
--resource-group $RG \
--query "[].{name:name,enabled:enabled,severity:severity}" \
--output table
Name Enabled Severity
---------------------- --------- ----------
alert-vm-high-cpu True 2
alert-vm-heartbeat True 1
az monitor scheduled-query show \
--name "alert-heartbeat-missing-sev2" \
--resource-group $RG \
--query "{name:name,enabled:enabled,windowSize:windowSize,evaluationFrequency:evaluationFrequency}" \
--output json
{
"enabled": true,
"evaluationFrequency": "PT5M",
"name": "alert-heartbeat-missing-sev2",
"windowSize": "PT5M"
}
Rollback / Troubleshooting¶
Disable a noisy metric alert immediately:
az monitor metrics alert update \
--name $ALERT_RULE_NAME \
--resource-group $RG \
--enabled false \
--output json
az monitor scheduled-query delete \
--name "alert-heartbeat-missing-sev2" \
--resource-group $RG \
--yes
Automation¶
Alert rule hygiene should be scripted rather than handled only in the portal.
az monitor metrics alert list \
--query "[].{name:name,resourceGroup:resourceGroup,enabled:enabled,severity:severity}" \
--output json