Skip to content

Monitoring

This guide describes how to monitor Azure Functions in production using Azure Monitor and Application Insights. It combines metrics, logs, traces, and dashboards into a practical operational workflow.

Platform Guide

For scaling architecture and plan comparison, see Scaling.

Language Guide

For Python deployment specifics, see the Python Tutorial.

Prerequisites

  • A running Function App in Consumption, Flex Consumption, Premium, or Dedicated.
  • An Application Insights resource connected to the app.
  • Access to Azure Monitor metrics and Log Analytics query permissions.
  • Azure CLI installed and authenticated.
  • Resource placeholders ready for commands.
RG=<resource-group>
APP_NAME=<app-name>
SUBSCRIPTION_ID=<subscription-id>
Command/Parameter Purpose
RG Sets the resource group name variable
APP_NAME Sets the function app name variable
SUBSCRIPTION_ID Sets the Azure subscription identifier variable

When to Use

Use the signal that best answers the operational question.

Scenario Primary approach Why Secondary approach
Traffic increase/decrease Metrics Fast trend view with low query cost Logs
Error spike after deployment Logs (requests, exceptions) Rich failure context Traces
Latency regression Metrics + KQL percentile Quantify p95/p99 drift Live Metrics
External dependency incident Dependencies Clear target/result visibility Exceptions
Host recycle/cold start analysis Traces Runtime lifecycle evidence Instance metrics
Configuration change impact Activity logs Control-plane history Logs + traces

Application Insights Blade

[Observed] The Application Insights blade shows the linked Application Insights resource with a View Application Insights data link for full telemetry access. The Change your resource section allows switching or disconnecting the monitoring resource:

Application Insights blade showing linked resource

[Inferred] Verify the Application Insights link is active before troubleshooting telemetry gaps. If no resource is linked, function execution data will not appear in Application Insights queries.

Logs (Log Analytics) Blade

[Observed] The Logs blade opens the KQL query editor with a New Query tab, table selector, and time range filter. The Queries hub provides pre-built query templates:

Logs blade showing KQL query editor

[Inferred] Use this blade to run KQL queries directly against Function App logs. Select tables like FunctionAppLogs, requests, traces, and exceptions for troubleshooting. The time range filter defaults to 24 hours.

Diagnostic Settings Blade

[Observed] The Diagnostic settings blade shows available log categories for streaming export: Function Application Logs, Access Audit Logs, IPSecurity Audit logs, App Service Authentication logs, and AllMetrics. No diagnostic settings are configured:

Diagnostic settings blade showing available log categories

[Inferred] Configure diagnostic settings to stream platform logs to Log Analytics, Storage Account, or Event Hub for long-term retention and compliance. Without diagnostic settings, platform-level logs (not Application Insights telemetry) are not persisted.

Procedure

Monitoring architecture

Azure Functions emits multiple telemetry streams:

  • Platform metrics in Azure Monitor (execution count, failures, instance activity).
  • Application telemetry in Application Insights (requests, dependencies, traces, exceptions).
  • Activity logs for control-plane changes.

Use all three for complete operational visibility.

flowchart TD
    A[Function App Runtime] --> B[Azure Monitor Metrics]
    A --> C[Application Insights]
    A --> D[Activity Log]
    B --> E[Metric Alerts]
    C --> F["KQL / Log Analytics"]
    C --> G[Workbooks]
    F --> H[Log Alerts]
    E --> I[Action Group]
    H --> I

Enable Application Insights

Set the connection string in app settings:

az functionapp config appsettings set \
    --resource-group <resource-group> \
    --name <app-name> \
    --settings APPLICATIONINSIGHTS_CONNECTION_STRING="InstrumentationKey=<masked>;IngestionEndpoint=https://<region>.in.applicationinsights.azure.com/"
Command/Parameter Purpose
az functionapp config appsettings set Configures the Application Insights connection string
--resource-group <resource-group> Specifies the resource group
--name <app-name> Specifies the function app name
--settings Sets the connection string to route telemetry to the correct resource

Prefer connection strings over legacy instrumentation-key-only configuration.

Core metrics to track

Track a small set of high-signal metrics first:

Signal Why it matters
Execution count Detect traffic shifts and workload volume
Execution duration Detect latency regressions and cold start symptoms
Failure count/rate Detect runtime and dependency instability
Instance count Observe scale behavior per plan
Queue or backlog depth Detect processing lag in event-driven flows

Backlog metrics

Queue-length and lag metrics usually come from the messaging service (for example, Storage Queue or Service Bus), not only from the Function App resource.

Query metrics with Azure CLI:

APP_ID=$(az functionapp show \
    --resource-group "$RG" \
    --name "$APP_NAME" \
    --query id \
    --output tsv)

az monitor metrics list \
    --resource "$APP_ID" \
    --metric "Function Execution Count" "Function Execution Units" \
    --interval PT5M \
    --aggregation Total Average \
    --start-time 2026-04-05T00:00:00Z \
    --end-time 2026-04-05T01:00:00Z \
    --output table
Command/Parameter Purpose
az functionapp show Gets the resource ID of the function app
az monitor metrics list Retrieves numerical platform metrics
--resource "$APP_ID" Target resource for metric retrieval
--metric List of metrics to query (execution count and units)
--interval PT5M Aggregates data into 5-minute time grains
--aggregation Total Average Specified aggregation types to return
--start-time / --end-time ISO 8601 timestamps for the query window
--output table Formats the metrics as a table

Sample output (PII masked):

Cost    Interval    Metric                    TimeStamp                   Total    Average
0       PT5M        Function Execution Count  2026-04-05T00:00:00Z       184      6.13
0       PT5M        Function Execution Units  2026-04-05T00:00:00Z       42       1.40

Live Metrics stream

Use Live Metrics during deployments and incidents for near real-time visibility:

  1. Open Application Insights.
  2. Select Live Metrics.
  3. Watch request rate, failures, and server response time during rollout.

This is especially useful during slot swaps and traffic ramp-up windows.

Log Analytics and KQL basics

Application Insights data is queryable with KQL.

Recent failed invocations

requests
| where timestamp > ago(1h)
| where success == false
| project timestamp, name, resultCode, duration, operation_Id
| order by timestamp desc

Slow operations over time

requests
| where timestamp > ago(24h)
| summarize p95_duration=percentile(duration, 95), avg_duration=avg(duration) by bin(timestamp, 5m)
| render timechart

Exceptions by type

exceptions
| where timestamp > ago(7d)
| summarize failures=count() by type, outerMessage
| order by failures desc

End-to-end correlation

union requests, dependencies, traces, exceptions
| where operation_Id == "<operation-id>"
| project timestamp, itemType, name, message, resultCode, duration
| order by timestamp asc

Host startup events

traces
| where timestamp > ago(24h)
| where message has_any ("Host started", "Host initialized", "Stopping JobHost")
| project timestamp, severityLevel, cloud_RoleName, message
| order by timestamp desc

Dependency health by target

dependencies
| where timestamp > ago(6h)
| summarize total_calls=count(), failed_calls=countif(success == false), p95_duration=percentile(duration, 95) by target, type
| extend failure_rate = toreal(failed_calls) / iif(total_calls == 0, 1.0, toreal(total_calls))
| order by failure_rate desc, failed_calls desc

Dashboards and workbooks

Build a workbook that answers these operational questions:

  • Is availability stable?
  • Are failures isolated to a function, dependency, or region?
  • Did a deployment change latency or error distribution?
  • Is queue backlog growing faster than throughput?

Recommended workbook visuals:

  • Timechart of request count and failure rate.
  • P95/P99 duration trend by function name.
  • Exceptions by type and operation.
  • Dependency failure trend for external calls.
  • Queue depth trend alongside execution rate.

Sampling and data volume control

Adjust Application Insights sampling in host.json when telemetry volume grows.

{
  "version": "2.0",
  "logging": {
    "applicationInsights": {
      "samplingSettings": {
        "isEnabled": true,
        "maxTelemetryItemsPerSecond": 5,
        "excludedTypes": "Request;Exception"
      }
    }
  }
}

Keep request and exception data unsampled for reliable incident triage.

Operational monitoring decision flow

flowchart TD
    A[Alert or Incident] --> B{"Availability/latency issue?"}
    B -- Yes --> C[Check metrics first]
    B -- No --> D[Check logs and traces]
    C --> E{Anomaly detected?}
    E -- Yes --> F[Run focused KQL]
    E -- No --> G[Review Activity Log]
    D --> H{Error signature found?}
    H -- Yes --> I[Correlate by operation_Id]
    H -- No --> J[Use Live Metrics and widen time range]
    F --> K[Mitigate and tune alerts]
    G --> K
    I --> K
    J --> K

Operational monitoring routine

Daily:

  • Check failure trend and top exception signatures.
  • Verify queue backlog and processing lag.

Per deployment:

  • Monitor Live Metrics during release window.
  • Compare before/after latency and failure ratio.

Weekly:

  • Review dashboard trends and adjust alert sensitivity.
  • Validate telemetry cost and sampling strategy.

Portal Walkthrough

This section shows the key Azure Portal blades for monitoring a Function App. All captures are from a live Consumption (Y1) deployment with PII masked per AGENTS.md rules.

Metrics Explorer

[Observed] The Metrics blade provides a chart builder for Azure Monitor platform metrics. Select metrics such as Function Execution Count, Function Execution Units, or Http5xx to visualize trends. The time range selector in the top-right corner defaults to Last 24 hours (Automatic):

Metrics Explorer blade with empty chart and metric selector

[Inferred] If the chart shows "Not configured", no metric has been selected yet. Click Add metric to begin building the chart. For incident triage, start with Function Execution Count and Http5xx in separate charts.

Log Stream

[Observed] The Log stream blade shows real-time stdout/stderr output from the Function App host. When connected, the status bar reads Connected! and new log lines appear as they are emitted:

Log stream blade showing Connected status with dark console background

[Inferred] Log stream is useful during deployments and incident triage to confirm the host is running and processing requests. If the stream shows no output, check that the app is not scaled to zero (Consumption plan) or that logging is not disabled in host.json.

Diagnose and Solve Problems

[Observed] The Diagnose and solve problems blade provides built-in troubleshooting categories: Availability and Performance, Configuration and Management, Deployment, Networking, and Diagnostic Tools. A Risk alerts section surfaces active warnings (e.g., "Availability: 1 Warning"):

Diagnose and solve problems blade showing 5 troubleshooting categories and risk alerts

[Inferred] This blade is the Portal equivalent of the hypothesis-driven playbooks in this guide. Use it as a starting point when you do not yet know the failure category.

Verification

Validate that monitoring is working end-to-end after changes.

  1. Trigger at least one function invocation.
  2. Confirm metrics appear in 5-minute bins.
  3. Confirm logs and traces are queryable.
  4. Confirm workbook visuals show data.
  5. Confirm alert rules evaluate without data-source errors.

Metric verification command:

az monitor metrics list \
    --resource "$APP_ID" \
    --metric "Function Execution Count" \
    --interval PT5M \
    --aggregation Total \
    --start-time 2026-04-05T00:00:00Z \
    --end-time 2026-04-05T00:30:00Z \
    --query "value[0].timeseries[0].data[?total > \`0\`].[timeStamp,total]" \
    --output table
Command/Parameter Purpose
az monitor metrics list Queries the specified metric
--metric "Function Execution Count" Tracks how many times functions were invoked
--query JMESPath filter to only show intervals where total executions are greater than zero
--output table Formats results as a table

Log verification command:

requests
| where timestamp > ago(15m)
| summarize total_requests=count(), failed_requests=countif(success == false)

Expected result: total_requests is greater than 0, metric timestamps align with test traffic, and dependency calls appear in dependencies for external calls.

Rollback / Troubleshooting

Missing telemetry in Application Insights

  • Metrics appear but requests table is empty.
  • Live Metrics stream has no flow.

  • Verify APPLICATIONINSIGHTS_CONNECTION_STRING exists.

  • Verify endpoint/region value is correct.
  • Restart Function App after config changes.
  • Validate egress/network rules for telemetry ingestion.
az functionapp config appsettings list \
    --resource-group "$RG" \
    --name "$APP_NAME" \
    --query "[?name=='APPLICATIONINSIGHTS_CONNECTION_STRING'].value" \
    --output tsv
Command/Parameter Purpose
az functionapp config appsettings list Lists application settings
--query Extracts the value of the Application Insights connection string
--output tsv Returns the raw string value

Sampling too aggressive

  • Log request counts are much lower than platform metrics.
  • Exception evidence is sparse during incidents.

  • Inspect host.json sampling settings.

  • Exclude Request;Exception from sampling.
  • Increase maxTelemetryItemsPerSecond temporarily for investigations.

Rollback:

  • Revert to last known-good sampling settings.
  • Redeploy and re-run verification queries.

Common blind spots

  • Monitoring only HTTP success and ignoring non-HTTP triggers.
  • Missing downstream dependency metrics.
  • Over-sampling that removes needed forensic signals.
  • No version marker in logs, making release impact hard to isolate.

See Also

Sources