Functions Not Executing¶

1. Summary¶

Use this playbook when trigger sources are active but one or more Azure Functions are not executing. Most incidents fall into one of six hypotheses: disabled function, listener startup failure, host restart loop, trigger misbinding, source-side delivery gap, or deployment/runtime regression. Primary objective: restore safe execution quickly while collecting enough evidence to prevent recurrence.

Decision Flow¶

flowchart TD
    A["Functions not executing"] --> B{"Enabled state healthy?"}
    B -->|No| C["Remove disable setting and restart"]
    B -->|Yes| D{"Listener starts?"}
    D -->|No| E["Fix trigger auth/connection/binding"]
    D -->|Yes| F{"Source events present?"}
    F -->|No| G["Fix publisher/schedule/subscription"]
    F -->|Yes| H{"Host stable?"}
    H -->|No| I["Investigate restart loop/runtime init"]
    H -->|Yes| J["Check deployment/config regression"]
    C --> K["Verify invocation recovery"]
    E --> K
    G --> K
    I --> K
    J --> K

Severity guidance¶

Condition	Severity	Action priority
Critical business trigger halted	SEV-1	Immediate mitigation + incident commander
Non-critical batch trigger halted	SEV-2	Restore within current shift
Low-impact maintenance trigger halted	SEV-3	Restore in planned response window
### Signal snapshot
Signal	Normal	Incident
---	---	---
Invocations	Correlate with source events	Near zero on impacted function
Listener traces	`Listener started` appears	`unable to start` or missing listener trace
Function state	Enabled	Disabled via metadata/app setting
Host lifecycle	Stable startup cadence	Frequent restart cycling

2. Common Misreadings¶

Misreading	Why incorrect	Correct interpretation
"App is healthy, so function must run."	App up does not guarantee listener up	Validate listener startup traces
"No exceptions means no issue."	Listener failures may exist only in `traces`	Query host/lifecycle logs first
"Scale out first."	Scaling does not fix disabled or broken config	Verify control-plane and trigger path
"Queue quiet means no work."	Source visibility can lag or be partial	Correlate source metrics and invocations
"Always code bug."	Config/RBAC drift often causes this pattern	Validate settings and permissions before rollback

3. Competing Hypotheses¶

ID	Hypothesis	Confirming signal	Disproving signal
H1	Function disabled by setting	`IsDisabled=true` or disable trace	Enabled and no disable trace
H2	Listener startup failure (auth/connection)	`listener ... unable to start`	Listener stable after restart
H3	Host restart loop	Dense start/shutdown timeline	Low restart count and stable host
H4	Binding/extension mismatch	Indexing/binding errors at startup	Clean indexing and active listener
H5	Source not delivering events	No source-side activity	Source active while invocations zero
H6	Deployment/runtime regression	Onset after release/config change	No temporal correlation

4. What to Check First¶

Confirm target function is enabled.
Confirm listener startup traces exist after latest host start.
Confirm source activity exists in same interval.
Confirm host is not cycling.

Quick portal checks¶

Function App -> Functions -> target function: Enabled is On.
Function App -> Monitor: invocation line not flat while source is active.
Application Insights -> Logs: listener startup and host lifecycle traces present.

Quick CLI checks¶

az functionapp function list --name "$APP_NAME" --resource-group "$RG" --output table
az functionapp config appsettings list --name "$APP_NAME" --resource-group "$RG" --output table
az monitor log-analytics query --workspace "$WORKSPACE_ID" --analytics-query "traces | where timestamp > ago(30m) | where cloud_RoleName =~ '$APP_NAME' | where message has_any ('disabled','listener','Host started','unable to start') | project timestamp, severityLevel, message | order by timestamp desc" --output table

Example output¶

Name             Trigger    IsDisabled
---------------  ---------  ----------
QueueProcessor   queue      true
health           http       false

Name                                              Value
------------------------------------------------  ----------------------------------------------------------------
AzureWebJobs.QueueProcessor.Disabled              true
AzureWebJobsStorage                               DefaultEndpointsProtocol=https;AccountName=***;AccountKey=***

timestamp                    severityLevel  message
---------------------------  -------------  ---------------------------------------------------------------------------------
2026-04-04T12:22:53.000000Z  3              Process reporting unhealthy: azure.functions.webjobs.storage: Unhealthy (AuthorizationPermissionMismatch)
2026-04-04T12:16:06.000000Z  2              The listener for function 'Functions.scheduled_cleanup' was unable to start.
2026-04-04T12:15:42.000000Z  2              Function 'QueueProcessor' is disabled via app setting 'AzureWebJobs.QueueProcessor.Disabled'.

5. Evidence to Collect¶

Collect all artifacts from one bounded time range (ago(30m) for triage, ago(2h) for verification).

Minimum evidence pack¶

Artifact	Required capture	Purpose
Function list	CLI table output	Confirm enabled state/trigger type
App settings	CLI table output	Find disable and connection issues
Listener/host traces	KQL results	Validate startup path
Invocation summary	KQL results	Confirm blast radius
Source activity	Metrics or producer evidence	Separate source outage from execution outage
Change timeline	Deploy/config timestamps	Validate regression hypothesis
### Sample Log Patterns
# Abnormal: Function disabled [2024-01-15T10:30:00Z] Function 'QueueProcessor' is disabled via app setting 'AzureWebJobs.QueueProcessor.Disabled' # Abnormal: Listener startup failure [2024-01-15T10:30:05Z] Listener for 'QueueTrigger' was unable to start. Microsoft.WindowsAzure.Storage: Connection refused. # Normal: Healthy listener [2024-01-15T10:30:00Z] Listener started for function 'QueueProcessor' [2024-01-15T10:30:00Z] Host started (234ms) # Real-world: Listener failure from RBAC permission removal (FC1, 2026-04-04) [2026-04-04T12:15:42Z] The listener for function 'Functions.scheduled_cleanup' was unable to start. [2026-04-04T12:16:06Z] The listener for function 'Functions.scheduled_cleanup' was unable to start. [2026-04-04T12:22:53Z] Process reporting unhealthy: azure.functions.webjobs.storage: Unhealthy (AuthorizationPermissionMismatch)
### KQL Queries with Example Output
#### Query 1: Function execution summary
`let appName = "$APP_NAME"; requests \| where timestamp > ago(1h) \| where cloud_RoleName =~ appName \| where operation_Name startswith "Functions." \| summarize Invocations = count(), Failures = countif(success == false), FailureRatePercent = round(100.0 * countif(success == false) / count(), 2), P95Ms = percentile(duration, 95) by FunctionName = operation_Name \| order by Failures desc, P95Ms desc`
FunctionName	Invocations	Failures
---	---	---
Functions.health	50	0
Functions.unhandled_error	15	15

How to Read This

A function that is absent from this table while others appear means it received zero invocations — indicating a trigger-path failure, not an app-wide outage.

Query 2: Failed invocations with error details¶

let appName = "$APP_NAME";
requests
| where timestamp > ago(2h)
| where cloud_RoleName =~ appName
| where operation_Name startswith "Functions."
| where success == false
| project timestamp, operation_Id, FunctionName = operation_Name, resultCode, duration
| join kind=leftouter (
    exceptions
    | where timestamp > ago(2h)
    | where cloud_RoleName =~ appName
    | project operation_Id, ExceptionType = type, ExceptionMessage = outerMessage
) on operation_Id
| order by timestamp desc

| timestamp | operation_Id | FunctionName | resultCode | duration | ExceptionType | ExceptionMessage | |---|---|---|---|---|---|---| | 2026-04-04T11:33:00Z | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | Functions.ErrorHandler | 500 | 13.092 | Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcException | Exception while executing function: Functions.ErrorHandler | | 2026-04-04T11:32:45Z | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | Functions.ErrorHandler | 500 | 13.092 | Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcException | Exception while executing function: Functions.ErrorHandler |

How to Read This

Use this to separate true execution failures from non-execution incidents. If this table is empty while source is active, focus on listener/state hypotheses.

Query 8: Host startup/shutdown events¶

let appName = "$APP_NAME";
traces
| where timestamp > ago(12h)
| where cloud_RoleName =~ appName
| where message has_any ("Host started", "Job host started", "Host shutdown", "Host is shutting down", "Stopping JobHost", "Host lock lease")
| project timestamp, severityLevel, message
| order by timestamp desc

| timestamp | severityLevel | message | |---|---|---| | 2026-04-04T11:36:20Z | 1 | Host lock lease acquired by instance ID 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' | | 2026-04-04T11:32:30Z | 1 | Host started (64ms) | | 2026-04-04T11:32:30Z | 1 | Job host started | | 2026-04-04T11:32:30Z | 1 | Starting Host (HostId=func-myapp-prod, Version=4.1047.100.26071, InstanceId=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) |

How to Read This

Repeated startup/shutdown in short intervals weakens listener stability and often explains zero-invocation windows.

CLI Investigation Commands¶

# Function inventory and state
az functionapp function list --name "$APP_NAME" --resource-group "$RG" --output table
# App settings for disable and connection values
az functionapp config appsettings list --name "$APP_NAME" --resource-group "$RG" --output table
# Listener and startup trace scan
az monitor log-analytics query --workspace "$WORKSPACE_ID" --analytics-query "traces | where timestamp > ago(30m) | where cloud_RoleName =~ '$APP_NAME' | where message has_any ('disabled','listener','Host started','unable to start') | project timestamp, message | order by timestamp desc" --output table
# Runtime configuration
az functionapp config show --name "$APP_NAME" --resource-group "$RG" --output json
# Source activity sample (Storage)
az monitor metrics list --resource "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RG/providers/Microsoft.Storage/storageAccounts/<storage-account-name>" --metric "Transactions" --interval PT5M --aggregation Total --output table

Example command output:

Name                                     Value
---------------------------------------  --------------------------------------------------------------
FUNCTIONS_EXTENSION_VERSION              ~4
FUNCTIONS_WORKER_RUNTIME                 python
AzureWebJobs.QueueProcessor.Disabled     true
QueueConnection                          DefaultEndpointsProtocol=https;AccountName=***;AccountKey=***

timestamp                    message
---------------------------  --------------------------------------------------------------------------------
2026-04-04T12:16:06.000000Z  The listener for function 'Functions.QueueProcessor' was unable to start.
2026-04-04T12:15:42.000000Z  Function 'QueueProcessor' is disabled via app setting 'AzureWebJobs.QueueProcessor.Disabled'.

Normal vs Abnormal Comparison¶

Dimension	Normal	Abnormal	Meaning
Function state	Enabled	Disabled	Trigger never starts
Listener startup	`Listener started` present	`unable to start` or absent	Trigger path broken
Host lifecycle	Stable start pattern	Frequent restart cycling	Runtime instability/regression
Source vs invocations	Correlated	Source active but invocations zero	Non-execution incident confirmed
Error concentration	Sparse mixed noise	Repeated same auth/config error	Deterministic failure mode

6. Validation and Disproof by Hypothesis¶

H1: Function disabled via app setting¶

Signals that support - IsDisabled=true in function list. - AzureWebJobs.<FunctionName>.Disabled=true exists in settings. - Trace contains explicit disable message. Signals that weaken - Enabled state and no disable setting. - Listener starts and function executes in same interval. What to verify

az functionapp function list --name "$APP_NAME" --resource-group "$RG" --output table
az functionapp config appsettings list --name "$APP_NAME" --resource-group "$RG" --output table

let appName = "$APP_NAME";
traces
| where timestamp > ago(2h)
| where cloud_RoleName =~ appName
| where message has "is disabled via app setting"
| project timestamp, message
| order by timestamp desc

Example output: | timestamp | message | |---|---| | 2026-04-04T12:15:42Z | Function 'QueueProcessor' is disabled via app setting 'AzureWebJobs.QueueProcessor.Disabled'. | Disproof condition: - No disable setting and no disable trace in incident window.

H2: Listener startup failure due to auth/connection¶

Signals that support - Repeating listener ... unable to start trace lines. - AuthorizationPermissionMismatch, DNS, or connection-refused patterns. - Source events present while invocations remain zero. Signals that weaken - Listener starts cleanly after restart and remains stable. - Test event processes successfully. What to verify

let appName = "$APP_NAME";
traces
| where timestamp > ago(2h)
| where cloud_RoleName =~ appName
| where message has_any ("listener", "unable to start", "AuthorizationPermissionMismatch", "Connection refused")
| project timestamp, severityLevel, message
| order by timestamp desc

az functionapp config appsettings list --name "$APP_NAME" --resource-group "$RG" --output table

Example output: | timestamp | severityLevel | message | |---|---|---| | 2026-04-04T12:22:53Z | 3 | Process reporting unhealthy: azure.functions.webjobs.storage: Unhealthy (AuthorizationPermissionMismatch) | | 2026-04-04T12:16:06Z | 2 | The listener for function 'Functions.QueueProcessor' was unable to start. | Disproof condition: - Startup traces show healthy listener with no recurring auth/connectivity errors.

H3: Host restart loop suppresses execution¶

Signals that support - Frequent host start/shutdown within short intervals. - Startup spikes without matching invocation recovery. Signals that weaken - Host remains stable through incident window. - Restart frequency aligns with normal plan behavior only. What to verify

let appName = "$APP_NAME";
traces
| where timestamp > ago(12h)
| where cloud_RoleName =~ appName
| where message has_any ("Host started", "Host is shutting down", "Stopping JobHost")
| summarize Starts=countif(message has "Host started"), Stops=countif(message has "shutting down" or message has "Stopping JobHost") by bin(timestamp, 1h)
| order by timestamp desc

az monitor log-analytics query --workspace "$WORKSPACE_ID" --analytics-query "traces | where timestamp > ago(2h) | where cloud_RoleName =~ '$APP_NAME' | where message has_any ('Host started','Host is shutting down') | project timestamp, message | order by timestamp desc" --output table

Example output: | timestamp | Starts | Stops | |---|---|---| | 2026-04-04T12:00:00Z | 9 | 8 | | 2026-04-04T11:00:00Z | 2 | 1 | Disproof condition: - Restart count is low and steady; no cycle evidence.

H4: Trigger misbinding or extension mismatch¶

Signals that support - Error indexing method, Cannot bind, or extension startup errors. - Missing or wrong connection setting names. Signals that weaken - Clean indexing and expected binding initialization. - Required settings present and valid. What to verify

let appName = "$APP_NAME";
traces
| where timestamp > ago(2h)
| where cloud_RoleName =~ appName
| where message has_any ("Error indexing method", "Cannot bind", "binding", "extension")
| project timestamp, message
| order by timestamp desc

az functionapp config show --name "$APP_NAME" --resource-group "$RG" --output table
az functionapp config appsettings list --name "$APP_NAME" --resource-group "$RG" --output table

Example output: | timestamp | message | |---|---| | 2026-04-04T12:11:27Z | Error indexing method 'Functions.QueueProcessor'. Cannot bind parameter 'msg' to type ... | Disproof condition: - No binding/indexing failures and listener healthy.

H5: Source-side delivery gap¶

Signals that support - Source metrics show no incoming events. - Producer/scheduler logs indicate halted publishing. Signals that weaken - Source metrics rise during zero-invocation period. - Backlog increases while listener is unhealthy. What to verify

let appName = "$APP_NAME";
requests
| where timestamp > ago(1h)
| where cloud_RoleName =~ appName
| where operation_Name startswith "Functions."
| summarize Invocations=count() by FunctionName=operation_Name, bin(timestamp, 15m)
| order by timestamp desc

az monitor metrics list --resource "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RG/providers/Microsoft.Storage/storageAccounts/<storage-account-name>" --metric "Transactions" --interval PT5M --aggregation Total --output table

Example output: | timestamp | FunctionName | Invocations | |---|---|---| | 2026-04-04T12:15:00Z | Functions.health | 22 | Disproof condition: - Source activity confirmed while target function remains non-executing.

H6: Deployment/runtime regression¶

Signals that support - Incident starts immediately after deployment/config rollout. - Startup traces show worker initialization failures. - Rollback restores execution. Signals that weaken - No recent change event. - Same artifact healthy in equivalent environment. What to verify

az monitor log-analytics query --workspace "$WORKSPACE_ID" --analytics-query "traces | where timestamp > ago(2h) | where cloud_RoleName =~ '$APP_NAME' | where message has_any ('Worker process started and initialized','Failed to start language worker process','did not find functions with language','Host started') | project timestamp, message | order by timestamp desc" --output table
az functionapp config show --name "$APP_NAME" --resource-group "$RG" --output json

Example output:

timestamp                    message
---------------------------  --------------------------------------------------------------------------------------
2026-04-04T12:30:00.000000Z  Failed to start language worker process for runtime: python.
2026-04-04T12:30:01.000000Z  Host started (64ms)
2026-04-04T12:30:03.000000Z  Host is shutting down.

Disproof condition: - No startup/runtime errors and no timeline correlation.

7. Likely Root Cause Patterns¶

Pattern	Trigger	Confirming evidence	Typical fix
Disable flag left behind	Maintenance or manual toggle	Disable setting + disable trace	Remove setting and verify
Storage/Messaging RBAC drift	Role removed or scope changed	`AuthorizationPermissionMismatch` + listener failure	Restore role assignment
Secret/config drift	Rotation mismatch	Connection/auth errors at startup	Update setting and restart
Runtime/extension mismatch	Package/runtime change	Binding/indexing errors	Align versions and bindings
Host recycle instability	Runtime crash or environment churn	Dense start/shutdown cycles	Stabilize runtime and roll back if needed
Upstream publisher outage	Producer stopped	No source-side activity	Restore publisher and replay backlog
### Related Labs
- Cold Start Lab
- Storage Access Failure Lab

8. Immediate Mitigations¶

Apply in order and stop once recovery criteria are met. 1. Remove unintended disable setting(s). 2. Fix trigger connection/auth settings and identity permissions. 3. Restart the Function App to reinitialize listeners. 4. Roll back to last known good artifact when regression is confirmed. 5. Temporarily reduce or pause source publish rate only if downstream risk is growing.

Mitigation commands¶

az functionapp config appsettings delete --name "$APP_NAME" --resource-group "$RG" --setting-names "AzureWebJobs.QueueProcessor.Disabled"
az functionapp config appsettings set --name "$APP_NAME" --resource-group "$RG" --settings "QueueConnection=DefaultEndpointsProtocol=https;AccountName=***;AccountKey=***"
az functionapp restart --name "$APP_NAME" --resource-group "$RG"

Recovery verification¶

let appName = "$APP_NAME";
union isfuzzy=true
(
    traces
    | where timestamp > ago(30m)
    | where cloud_RoleName =~ appName
    | where message has_any ("Listener started", "Host started", "unable to start", "disabled")
    | project timestamp, Kind="trace", Detail=message
),
(
    requests
    | where timestamp > ago(30m)
    | where cloud_RoleName =~ appName
    | where operation_Name startswith "Functions."
    | summarize Invocations=count(), Failures=countif(success == false) by operation_Name
    | project timestamp=now(), Kind="request-summary", Detail=strcat(operation_Name, " invocations=", tostring(Invocations), " failures=", tostring(Failures))
)
| order by timestamp desc

Recovery criteria: - Listener started appears with no new unable to start entries. - Target function invocation count rises within expected trigger interval. - Failures remain below team threshold for 30-60 minutes.

9. Prevention¶

Configuration guardrails¶

Restrict write access for AzureWebJobs.*.Disabled settings.
Validate required trigger settings in CI before deployment.
Keep connection-setting contracts versioned and reviewed.

Monitoring guardrails¶

Alert when critical function Invocations=0 while source activity is non-zero.
Alert on repeated listener startup failures in short windows.
Alert when host restart frequency exceeds plan baseline.

Operational guardrails¶

Pair every secret rotation with listener smoke test.
Use staged rollout and canary validation for runtime/dependency changes.
Run monthly playbook drill for non-executing trigger scenarios.

Preventive checklist¶

Control	Frequency	Owner	Pass condition
Disable-setting audit	Weekly	Platform ops	No unexpected disable flags
Trigger smoke test	Weekly	App team	Test event processed end-to-end
Host restart trend review	Weekly	SRE	Restart rate within baseline
Incident drill	Monthly	Incident commander	End-to-end triage < 15 minutes

Functions Not Executing¶

1. Summary¶

Decision Flow¶

Severity guidance¶

2. Common Misreadings¶

3. Competing Hypotheses¶

4. What to Check First¶

Quick portal checks¶

Quick CLI checks¶

Example output¶

5. Evidence to Collect¶

Minimum evidence pack¶

Query 2: Failed invocations with error details¶

Query 8: Host startup/shutdown events¶

CLI Investigation Commands¶

Normal vs Abnormal Comparison¶

6. Validation and Disproof by Hypothesis¶

H1: Function disabled via app setting¶

H2: Listener startup failure due to auth/connection¶

H3: Host restart loop suppresses execution¶

H4: Trigger misbinding or extension mismatch¶

H5: Source-side delivery gap¶

H6: Deployment/runtime regression¶

7. Likely Root Cause Patterns¶

8. Immediate Mitigations¶

Mitigation commands¶

Recovery verification¶

9. Prevention¶

Configuration guardrails¶

Monitoring guardrails¶

Operational guardrails¶

Preventive checklist¶

See Also¶

Sources¶