Alerts and Metrics¶

Azure App Service provides a rich set of platform metrics that help you monitor the health and performance of your application. You can use these metrics to create alert rules that notify you of issues in real-time.

Data Flow Diagram¶

graph TD
    Resource[Azure App Service] -->|Push| MetricsStore[Azure Monitor Metrics Store]
    MetricsStore -->|Aggregation| Dashboards[Azure Portal Dashboards]
    MetricsStore -->|Rule Evaluation| AlertEngine[Azure Monitor Alert Engine]
    AlertEngine -->|Trigger| ActionGroup[Action Groups (Email, SMS, Webhook)]

Key Metrics for App Service¶

The following platform metrics are critical for monitoring App Service performance:

Http5xx: Number of HTTP requests resulting in a 5xx server error.
AverageResponseTime: Time taken for the app to serve requests.
CpuPercentage: CPU utilization of the App Service plan instances.
MemoryPercentage: Memory utilization of the App Service plan instances.
Requests: Total number of HTTP requests processed.

Configuration Examples¶

Creating an Alert Rule via CLI¶

The following command creates a metric alert rule that triggers when the HTTP 5xx error count exceeds 10 in a 5-minute period.

az monitor metrics alert create \
    --name "High-HTTP-5xx-Errors" \
    --resource-group "my-resource-group" \
    --scopes "/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Web/sites/{appName}" \
    --condition "count Http5xx > 10" \
    --window-size "5m" \
    --evaluation-frequency "1m" \
    --description "Alert when HTTP 5xx errors are high" \
    --action "/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Insights/actionGroups/{actionGroupName}"

KQL Query Examples¶

While metrics are used for alerting, you can also query them using KQL in the AzureMetrics table if you have enabled metric export in diagnostic settings.

Query High CPU Utilization¶

Identify instances within an App Service Plan experiencing high CPU usage.

AzureMetrics
| where MetricName == "CpuPercentage"
| summarize AverageCpu = avg(Average) by ResourceInstance, bin(TimeGenerated, 5m)
| where AverageCpu > 80
| render timechart

Analyze Response Time Spikes¶

Compare average response times across different time intervals.

AzureMetrics
| where MetricName == "AverageResponseTime"
| summarize avg(Average) by bin(TimeGenerated, 15m)
| render timechart

Compare Error Rate With Total Traffic¶

If metric export is enabled, compare Http5xx to total request volume before deciding whether a short spike warrants paging.

let Requests =
    AzureMetrics
    | where TimeGenerated > ago(1h)
    | where MetricName == "Requests"
    | summarize TotalRequests=sum(Total) by bin(TimeGenerated, 5m);
let Errors =
    AzureMetrics
    | where TimeGenerated > ago(1h)
    | where MetricName == "Http5xx"
    | summarize TotalErrors=sum(Total) by bin(TimeGenerated, 5m);
Requests
| join kind=leftouter Errors on TimeGenerated
| extend TotalErrors = coalesce(TotalErrors, 0)
| extend ErrorRate = todouble(TotalErrors) / iff(TotalRequests == 0, 1, todouble(TotalRequests)) * 100
| project TimeGenerated, TotalRequests, TotalErrors, ErrorRate
| order by TimeGenerated asc

Find Apps Near Memory Limits¶

AzureMetrics
| where TimeGenerated > ago(2h)
| where MetricName == "MemoryPercentage"
| summarize AvgMemory=avg(Average), PeakMemory=max(Maximum) by Resource, bin(TimeGenerated, 15m)
| where PeakMemory > 85
| order by PeakMemory desc

Sample output:

TimeGenerated              Resource                     AvgMemory  PeakMemory
-------------------------  ---------------------------  ---------  ----------
2026-04-06T00:45:00Z       my-app-service-plan-prod    78.4       91.2
2026-04-06T01:00:00Z       my-app-service-plan-prod    80.7       93.8

Monitoring Baseline¶

For App Service, start with a small set of metrics that directly map to user impact and capacity:

Availability / customer impact
- Http5xx
- Requests
- AverageResponseTime
Capacity / scaling pressure
- CpuPercentage
- MemoryPercentage
- DiskQueueLength if the workload depends on storage-intensive operations
Deployment confidence
- Traffic trend after deployment
- Restart count or instance churn from logs and activity history

The main rule is to alert on sustained conditions, not single-minute spikes. App Service plans can absorb short bursts that do not justify an incident.

Verify Available Metrics Before Creating Alerts¶

List metric definitions¶

az monitor metrics list-definitions \
    --resource "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.Web/sites/my-app-service" \
    --output table

Sample output:

Name                   PrimaryAggregationType    Unit
---------------------  ------------------------  ------------
Requests               Total                     Count
Http5xx                Total                     Count
AverageResponseTime    Average                   Seconds
CpuTime                Total                     Seconds
MemoryWorkingSet       Average                   Bytes

Query recent metric values¶

az monitor metrics list \
    --resource "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.Web/sites/my-app-service" \
    --metric "Http5xx" "AverageResponseTime" \
    --interval "PT5M" \
    --aggregation "Total" "Average"

Sample output:

{
  "cost": 0,
  "timespan": "2026-04-06T00:00:00Z/2026-04-06T01:00:00Z",
  "value": [
    {
      "name": { "value": "Http5xx" },
      "timeseries": [
        {
          "data": [
            { "timeStamp": "2026-04-06T00:55:00Z", "total": 0 },
            { "timeStamp": "2026-04-06T01:00:00Z", "total": 12 }
          ]
        }
      ]
    }
  ]
}

Practical Alert Rules¶

Alert on sustained HTTP 5xx failures¶

Use this as the first production alert because it directly correlates with failed user requests.

az monitor metrics alert create \
    --name "appsvc-http5xx-sustained" \
    --resource-group "my-resource-group" \
    --scopes "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.Web/sites/my-app-service" \
    --condition "total Http5xx > 20" \
    --window-size "5m" \
    --evaluation-frequency "1m" \
    --severity 2 \
    --description "App Service is returning sustained HTTP 5xx responses" \
    --action "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.Insights/actionGroups/ag-app-oncall"

Alert on high average response time¶

az monitor metrics alert create \
    --name "appsvc-latency-high" \
    --resource-group "my-resource-group" \
    --scopes "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.Web/sites/my-app-service" \
    --condition "avg AverageResponseTime > 2" \
    --window-size "10m" \
    --evaluation-frequency "5m" \
    --severity 3 \
    --description "Average response time is above 2 seconds" \
    --action "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.Insights/actionGroups/ag-app-oncall"

Alert on App Service plan CPU pressure¶

Plan-level alerts help when multiple apps share the same compute resources.

az monitor metrics alert create \
    --name "appsvc-plan-cpu-high" \
    --resource-group "my-resource-group" \
    --scopes "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.Web/serverfarms/my-app-service-plan" \
    --condition "avg CpuPercentage > 80" \
    --window-size "15m" \
    --evaluation-frequency "5m" \
    --severity 2 \
    --description "App Service plan CPU usage is above 80 percent" \
    --action "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.Insights/actionGroups/ag-platform-oncall"

Alert Tuning Guidance¶

Use app-level alerts for request failures and latency.
Use plan-level alerts for CPU and memory saturation across shared apps.
Use severity 2 for customer-impacting failures.
Use severity 3 or 4 for early warning signals such as latency growth.
Combine metrics with platform logs or Application Insights during triage instead of creating too many overlapping rules.

Triage Workflow¶

When a metric alert fires, review evidence in this order:

Requests and failures
- Did traffic increase?
- Is the error count material or just one instance?
Latency
- Are users seeing slow responses before outright failures?
Plan capacity
- Is the problem isolated to one app or the whole plan?
Application telemetry
- Do dependency failures or exceptions explain the metric spike?
Platform logs
- Was there a restart, deployment, or storage issue?

Workbook Suggestions¶

For each production app, create a dashboard or workbook with these tiles:

Requests and Http5xx trend for the last 24 hours
AverageResponseTime percentile trend after deployments
CPU and memory by App Service plan instance
Deployment markers from Activity Log or release pipeline events
Drill-through links to Application Insights request and exception queries

Common Mistakes¶

Alerting on Requests == 0 for apps that do not receive continuous traffic
Using only plan-level CPU alerts and missing app-specific failures
Treating single-minute Http5xx bursts as incidents without checking traffic volume
Creating separate alerts for every metric without a triage runbook

Alerts and Metrics¶

Data Flow Diagram¶

Key Metrics for App Service¶

Configuration Examples¶

Creating an Alert Rule via CLI¶

KQL Query Examples¶

Query High CPU Utilization¶

Analyze Response Time Spikes¶

Compare Error Rate With Total Traffic¶

Find Apps Near Memory Limits¶

Monitoring Baseline¶

Verify Available Metrics Before Creating Alerts¶

List metric definitions¶

Query recent metric values¶

Practical Alert Rules¶

Alert on sustained HTTP 5xx failures¶

Alert on high average response time¶

Alert on App Service plan CPU pressure¶

Alert Tuning Guidance¶

Triage Workflow¶

Workbook Suggestions¶

Common Mistakes¶

See Also¶

Sources¶