Functions Not Executing¶
1. Summary¶
Use this playbook when trigger sources are active but one or more Azure Functions are not executing. Most incidents fall into one of six hypotheses: disabled function, listener startup failure, host restart loop, trigger misbinding, source-side delivery gap, or deployment/runtime regression. Primary objective: restore safe execution quickly while collecting enough evidence to prevent recurrence.
Decision Flow¶
flowchart TD
A["Functions not executing"] --> B{"Enabled state healthy?"}
B -->|No| C["Remove disable setting and restart"]
B -->|Yes| D{"Listener starts?"}
D -->|No| E["Fix trigger auth/connection/binding"]
D -->|Yes| F{"Source events present?"}
F -->|No| G["Fix publisher/schedule/subscription"]
F -->|Yes| H{"Host stable?"}
H -->|No| I["Investigate restart loop/runtime init"]
H -->|Yes| J["Check deployment/config regression"]
C --> K["Verify invocation recovery"]
E --> K
G --> K
I --> K
J --> K Severity guidance¶
| Condition | Severity | Action priority |
|---|---|---|
| Critical business trigger halted | SEV-1 | Immediate mitigation + incident commander |
| Non-critical batch trigger halted | SEV-2 | Restore within current shift |
| Low-impact maintenance trigger halted | SEV-3 | Restore in planned response window |
| ### Signal snapshot | ||
| Signal | Normal | Incident |
| --- | --- | --- |
| Invocations | Correlate with source events | Near zero on impacted function |
| Listener traces | Listener started appears | unable to start or missing listener trace |
| Function state | Enabled | Disabled via metadata/app setting |
| Host lifecycle | Stable startup cadence | Frequent restart cycling |
2. Common Misreadings¶
| Misreading | Why incorrect | Correct interpretation |
|---|---|---|
| "App is healthy, so function must run." | App up does not guarantee listener up | Validate listener startup traces |
| "No exceptions means no issue." | Listener failures may exist only in traces | Query host/lifecycle logs first |
| "Scale out first." | Scaling does not fix disabled or broken config | Verify control-plane and trigger path |
| "Queue quiet means no work." | Source visibility can lag or be partial | Correlate source metrics and invocations |
| "Always code bug." | Config/RBAC drift often causes this pattern | Validate settings and permissions before rollback |
3. Competing Hypotheses¶
| ID | Hypothesis | Confirming signal | Disproving signal |
|---|---|---|---|
| H1 | Function disabled by setting | IsDisabled=true or disable trace | Enabled and no disable trace |
| H2 | Listener startup failure (auth/connection) | listener ... unable to start | Listener stable after restart |
| H3 | Host restart loop | Dense start/shutdown timeline | Low restart count and stable host |
| H4 | Binding/extension mismatch | Indexing/binding errors at startup | Clean indexing and active listener |
| H5 | Source not delivering events | No source-side activity | Source active while invocations zero |
| H6 | Deployment/runtime regression | Onset after release/config change | No temporal correlation |
4. What to Check First¶
- Confirm target function is enabled.
- Confirm listener startup traces exist after latest host start.
- Confirm source activity exists in same interval.
- Confirm host is not cycling.
Quick portal checks¶
- Function App -> Functions -> target function: Enabled is
On. - Function App -> Monitor: invocation line not flat while source is active.
- Application Insights -> Logs: listener startup and host lifecycle traces present.
Quick CLI checks¶
az functionapp function list --name "$APP_NAME" --resource-group "$RG" --output table
az functionapp config appsettings list --name "$APP_NAME" --resource-group "$RG" --output table
az monitor log-analytics query --workspace "$WORKSPACE_ID" --analytics-query "traces | where timestamp > ago(30m) | where cloud_RoleName =~ '$APP_NAME' | where message has_any ('disabled','listener','Host started','unable to start') | project timestamp, severityLevel, message | order by timestamp desc" --output table
Example output¶
Name Trigger IsDisabled
--------------- --------- ----------
QueueProcessor queue true
health http false
Name Value
------------------------------------------------ ----------------------------------------------------------------
AzureWebJobs.QueueProcessor.Disabled true
AzureWebJobsStorage DefaultEndpointsProtocol=https;AccountName=***;AccountKey=***
timestamp severityLevel message
--------------------------- ------------- ---------------------------------------------------------------------------------
2026-04-04T12:22:53.000000Z 3 Process reporting unhealthy: azure.functions.webjobs.storage: Unhealthy (AuthorizationPermissionMismatch)
2026-04-04T12:16:06.000000Z 2 The listener for function 'Functions.scheduled_cleanup' was unable to start.
2026-04-04T12:15:42.000000Z 2 Function 'QueueProcessor' is disabled via app setting 'AzureWebJobs.QueueProcessor.Disabled'.
5. Evidence to Collect¶
Collect all artifacts from one bounded time range (ago(30m) for triage, ago(2h) for verification).
Minimum evidence pack¶
How to Read This
A function that is absent from this table while others appear means it received zero invocations — indicating a trigger-path failure, not an app-wide outage.
Query 2: Failed invocations with error details¶
let appName = "$APP_NAME";
requests
| where timestamp > ago(2h)
| where cloud_RoleName =~ appName
| where operation_Name startswith "Functions."
| where success == false
| project timestamp, operation_Id, FunctionName = operation_Name, resultCode, duration
| join kind=leftouter (
exceptions
| where timestamp > ago(2h)
| where cloud_RoleName =~ appName
| project operation_Id, ExceptionType = type, ExceptionMessage = outerMessage
) on operation_Id
| order by timestamp desc
How to Read This
Use this to separate true execution failures from non-execution incidents. If this table is empty while source is active, focus on listener/state hypotheses.
Query 8: Host startup/shutdown events¶
let appName = "$APP_NAME";
traces
| where timestamp > ago(12h)
| where cloud_RoleName =~ appName
| where message has_any ("Host started", "Job host started", "Host shutdown", "Host is shutting down", "Stopping JobHost", "Host lock lease")
| project timestamp, severityLevel, message
| order by timestamp desc
How to Read This
Repeated startup/shutdown in short intervals weakens listener stability and often explains zero-invocation windows.
CLI Investigation Commands¶
# Function inventory and state
az functionapp function list --name "$APP_NAME" --resource-group "$RG" --output table
# App settings for disable and connection values
az functionapp config appsettings list --name "$APP_NAME" --resource-group "$RG" --output table
# Listener and startup trace scan
az monitor log-analytics query --workspace "$WORKSPACE_ID" --analytics-query "traces | where timestamp > ago(30m) | where cloud_RoleName =~ '$APP_NAME' | where message has_any ('disabled','listener','Host started','unable to start') | project timestamp, message | order by timestamp desc" --output table
# Runtime configuration
az functionapp config show --name "$APP_NAME" --resource-group "$RG" --output json
# Source activity sample (Storage)
az monitor metrics list --resource "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RG/providers/Microsoft.Storage/storageAccounts/<storage-account-name>" --metric "Transactions" --interval PT5M --aggregation Total --output table
Name Value
--------------------------------------- --------------------------------------------------------------
FUNCTIONS_EXTENSION_VERSION ~4
FUNCTIONS_WORKER_RUNTIME python
AzureWebJobs.QueueProcessor.Disabled true
QueueConnection DefaultEndpointsProtocol=https;AccountName=***;AccountKey=***
timestamp message
--------------------------- --------------------------------------------------------------------------------
2026-04-04T12:16:06.000000Z The listener for function 'Functions.QueueProcessor' was unable to start.
2026-04-04T12:15:42.000000Z Function 'QueueProcessor' is disabled via app setting 'AzureWebJobs.QueueProcessor.Disabled'.
Normal vs Abnormal Comparison¶
| Dimension | Normal | Abnormal | Meaning |
|---|---|---|---|
| Function state | Enabled | Disabled | Trigger never starts |
| Listener startup | Listener started present | unable to start or absent | Trigger path broken |
| Host lifecycle | Stable start pattern | Frequent restart cycling | Runtime instability/regression |
| Source vs invocations | Correlated | Source active but invocations zero | Non-execution incident confirmed |
| Error concentration | Sparse mixed noise | Repeated same auth/config error | Deterministic failure mode |
6. Validation and Disproof by Hypothesis¶
H1: Function disabled via app setting¶
Signals that support - IsDisabled=true in function list. - AzureWebJobs.<FunctionName>.Disabled=true exists in settings. - Trace contains explicit disable message. Signals that weaken - Enabled state and no disable setting. - Listener starts and function executes in same interval. What to verify
az functionapp function list --name "$APP_NAME" --resource-group "$RG" --output table
az functionapp config appsettings list --name "$APP_NAME" --resource-group "$RG" --output table
let appName = "$APP_NAME";
traces
| where timestamp > ago(2h)
| where cloud_RoleName =~ appName
| where message has "is disabled via app setting"
| project timestamp, message
| order by timestamp desc
H2: Listener startup failure due to auth/connection¶
Signals that support - Repeating listener ... unable to start trace lines. - AuthorizationPermissionMismatch, DNS, or connection-refused patterns. - Source events present while invocations remain zero. Signals that weaken - Listener starts cleanly after restart and remains stable. - Test event processes successfully. What to verify
let appName = "$APP_NAME";
traces
| where timestamp > ago(2h)
| where cloud_RoleName =~ appName
| where message has_any ("listener", "unable to start", "AuthorizationPermissionMismatch", "Connection refused")
| project timestamp, severityLevel, message
| order by timestamp desc
H3: Host restart loop suppresses execution¶
Signals that support - Frequent host start/shutdown within short intervals. - Startup spikes without matching invocation recovery. Signals that weaken - Host remains stable through incident window. - Restart frequency aligns with normal plan behavior only. What to verify
let appName = "$APP_NAME";
traces
| where timestamp > ago(12h)
| where cloud_RoleName =~ appName
| where message has_any ("Host started", "Host is shutting down", "Stopping JobHost")
| summarize Starts=countif(message has "Host started"), Stops=countif(message has "shutting down" or message has "Stopping JobHost") by bin(timestamp, 1h)
| order by timestamp desc
az monitor log-analytics query --workspace "$WORKSPACE_ID" --analytics-query "traces | where timestamp > ago(2h) | where cloud_RoleName =~ '$APP_NAME' | where message has_any ('Host started','Host is shutting down') | project timestamp, message | order by timestamp desc" --output table
H4: Trigger misbinding or extension mismatch¶
Signals that support - Error indexing method, Cannot bind, or extension startup errors. - Missing or wrong connection setting names. Signals that weaken - Clean indexing and expected binding initialization. - Required settings present and valid. What to verify
let appName = "$APP_NAME";
traces
| where timestamp > ago(2h)
| where cloud_RoleName =~ appName
| where message has_any ("Error indexing method", "Cannot bind", "binding", "extension")
| project timestamp, message
| order by timestamp desc
az functionapp config show --name "$APP_NAME" --resource-group "$RG" --output table
az functionapp config appsettings list --name "$APP_NAME" --resource-group "$RG" --output table
H5: Source-side delivery gap¶
Signals that support - Source metrics show no incoming events. - Producer/scheduler logs indicate halted publishing. Signals that weaken - Source metrics rise during zero-invocation period. - Backlog increases while listener is unhealthy. What to verify
let appName = "$APP_NAME";
requests
| where timestamp > ago(1h)
| where cloud_RoleName =~ appName
| where operation_Name startswith "Functions."
| summarize Invocations=count() by FunctionName=operation_Name, bin(timestamp, 15m)
| order by timestamp desc
az monitor metrics list --resource "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RG/providers/Microsoft.Storage/storageAccounts/<storage-account-name>" --metric "Transactions" --interval PT5M --aggregation Total --output table
H6: Deployment/runtime regression¶
Signals that support - Incident starts immediately after deployment/config rollout. - Startup traces show worker initialization failures. - Rollback restores execution. Signals that weaken - No recent change event. - Same artifact healthy in equivalent environment. What to verify
az monitor log-analytics query --workspace "$WORKSPACE_ID" --analytics-query "traces | where timestamp > ago(2h) | where cloud_RoleName =~ '$APP_NAME' | where message has_any ('Worker process started and initialized','Failed to start language worker process','did not find functions with language','Host started') | project timestamp, message | order by timestamp desc" --output table
az functionapp config show --name "$APP_NAME" --resource-group "$RG" --output json
timestamp message
--------------------------- --------------------------------------------------------------------------------------
2026-04-04T12:30:00.000000Z Failed to start language worker process for runtime: python.
2026-04-04T12:30:01.000000Z Host started (64ms)
2026-04-04T12:30:03.000000Z Host is shutting down.
7. Likely Root Cause Patterns¶
| Pattern | Trigger | Confirming evidence | Typical fix |
|---|---|---|---|
| Disable flag left behind | Maintenance or manual toggle | Disable setting + disable trace | Remove setting and verify |
| Storage/Messaging RBAC drift | Role removed or scope changed | AuthorizationPermissionMismatch + listener failure | Restore role assignment |
| Secret/config drift | Rotation mismatch | Connection/auth errors at startup | Update setting and restart |
| Runtime/extension mismatch | Package/runtime change | Binding/indexing errors | Align versions and bindings |
| Host recycle instability | Runtime crash or environment churn | Dense start/shutdown cycles | Stabilize runtime and roll back if needed |
| Upstream publisher outage | Producer stopped | No source-side activity | Restore publisher and replay backlog |
| ### Related Labs | |||
| - Cold Start Lab | |||
| - Storage Access Failure Lab |
8. Immediate Mitigations¶
Apply in order and stop once recovery criteria are met. 1. Remove unintended disable setting(s). 2. Fix trigger connection/auth settings and identity permissions. 3. Restart the Function App to reinitialize listeners. 4. Roll back to last known good artifact when regression is confirmed. 5. Temporarily reduce or pause source publish rate only if downstream risk is growing.
Mitigation commands¶
az functionapp config appsettings delete --name "$APP_NAME" --resource-group "$RG" --setting-names "AzureWebJobs.QueueProcessor.Disabled"
az functionapp config appsettings set --name "$APP_NAME" --resource-group "$RG" --settings "QueueConnection=DefaultEndpointsProtocol=https;AccountName=***;AccountKey=***"
az functionapp restart --name "$APP_NAME" --resource-group "$RG"
Recovery verification¶
let appName = "$APP_NAME";
union isfuzzy=true
(
traces
| where timestamp > ago(30m)
| where cloud_RoleName =~ appName
| where message has_any ("Listener started", "Host started", "unable to start", "disabled")
| project timestamp, Kind="trace", Detail=message
),
(
requests
| where timestamp > ago(30m)
| where cloud_RoleName =~ appName
| where operation_Name startswith "Functions."
| summarize Invocations=count(), Failures=countif(success == false) by operation_Name
| project timestamp=now(), Kind="request-summary", Detail=strcat(operation_Name, " invocations=", tostring(Invocations), " failures=", tostring(Failures))
)
| order by timestamp desc
Listener started appears with no new unable to start entries. - Target function invocation count rises within expected trigger interval. - Failures remain below team threshold for 30-60 minutes. 9. Prevention¶
Configuration guardrails¶
- Restrict write access for
AzureWebJobs.*.Disabledsettings. - Validate required trigger settings in CI before deployment.
- Keep connection-setting contracts versioned and reviewed.
Monitoring guardrails¶
- Alert when critical function
Invocations=0while source activity is non-zero. - Alert on repeated listener startup failures in short windows.
- Alert when host restart frequency exceeds plan baseline.
Operational guardrails¶
- Pair every secret rotation with listener smoke test.
- Use staged rollout and canary validation for runtime/dependency changes.
- Run monthly playbook drill for non-executing trigger scenarios.
Preventive checklist¶
| Control | Frequency | Owner | Pass condition |
|---|---|---|---|
| Disable-setting audit | Weekly | Platform ops | No unexpected disable flags |
| Trigger smoke test | Weekly | App team | Test event processed end-to-end |
| Host restart trend review | Weekly | SRE | Restart rate within baseline |
| Incident drill | Monthly | Incident commander | End-to-end triage < 15 minutes |
See Also¶
- Troubleshooting Architecture
- First 10 Minutes
- KQL Query Library
- Troubleshooting Methodology
- Troubleshooting Lab Guides