Lab Guide: Storage Access Failure (AzureWebJobsStorage)¶

This Level 3 lab guide reproduces a storage authorization failure in Azure Functions and shows how to prove a full failure cascade from AzureWebJobsStorage access loss to listener startup failure, host unhealthy state, and zero trigger execution. The experiment uses the lab assets in labs/storage-access-failure/ and captures evidence with KQL, CLI, and runtime logs.

Lab Metadata¶

Field	Value
Lab focus	Storage authorization failure and host startup impact
Runtime profile	Azure Functions v4 (Linux)
Plan profile	Flex Consumption (`FC1`)
Trigger type under test	Timer trigger (`Functions.scheduled_cleanup`)
Critical dependency	`AzureWebJobsStorage`
Auth model variants	Managed identity (RBAC) and connection string
Primary blast radius	Trigger listener startup and background execution
Lab source path	`labs/storage-access-failure/`
Observability data sources	Application Insights (`traces`, `requests`, `exceptions`, `dependencies`)
Incident signature	`AuthorizationPermissionMismatch`, listener unable to start, host unhealthy
Recovery signature	Storage probe succeeded (`70ms`, `4 containers`)

What this lab is designed to prove

This lab is built to test a frequent production failure mode: storage auth drift.

It validates that when AzureWebJobsStorage cannot be accessed:

Trigger listeners fail to initialize.
Host health degrades (azure.functions.webjobs.storage: Unhealthy).
Timer/queue-style invocations stop even if the app endpoint still responds.

It also validates that restoring RBAC or connection settings re-establishes storage probes and listener startup.

1) Background¶

Azure Functions host behavior depends on storage access for multiple control-plane and runtime tasks.

This includes:

Trigger coordination metadata.
Lease ownership and lock records.
Timer schedule state persistence.
Queue trigger checkpointing.
Internal host coordination artifacts.

In this lab, storage access is intentionally broken so that host startup and listener behavior can be observed through deterministic evidence.

1.1 Storage dependency model for Azure Functions¶

AzureWebJobsStorage is not optional for most non-trivial Azure Functions apps.

Even when application code does not directly read blobs or queues, the host still needs storage for runtime operations.

flowchart TD
    A[Function host process starts] --> B[Resolve AzureWebJobsStorage configuration]
    B --> C[Authenticate to storage account]
    C --> D[Probe containers and coordination artifacts]
    D --> E[Initialize listeners and schedule state]
    E --> F[Mark host healthy and begin trigger execution]

    C --> G[Auth failure]
    G --> H[Storage probe failure]
    H --> I[Listener initialization failure]
    I --> J[Host reports unhealthy]
    J --> K[No trigger invocations]

1.2 Managed identity vs connection string behavior¶

Two dominant access models are used in Azure Functions storage integration.

Model	How auth is done	Typical failure mode	Recovery action
Managed identity + RBAC	Token-based auth for host identity	Missing/wrong role assignment, wrong scope	Re-add role assignments at correct scope
Connection string key auth	Shared key in app setting	Expired/rotated key mismatch, invalid setting	Update `AzureWebJobsStorage` connection string

For FC1 workloads, identity-based connections are commonly the default architecture choice.

In this lab, the failure trigger emulates RBAC drift in identity mode.

1.3 Required data-plane roles for host storage work¶

For common host flows, these roles are typically required for the identity used by the Function App:

Storage capability	Typical requirement
Blob container operations for host artifacts	`Storage Blob Data Owner`
Queue-related runtime operations	`Storage Queue Data Contributor`
	Storage account-level operations
	Lease and table operations for host state

The exact least-privilege set depends on trigger and extension mix, but removing these roles is sufficient to reproduce deterministic startup failures in this lab scenario.

1.4 FC1 behavior and why this scenario appears suddenly¶

Flex Consumption scales quickly and can instantiate new workers frequently.

That makes configuration and RBAC consistency critical:

A hidden RBAC drift may remain unnoticed while one worker is warm.
New worker startup forces fresh storage probes.
The first startup after drift can suddenly expose failure.

1.5 Failure progression model¶

sequenceDiagram
    participant Host as Functions Host
    participant MSI as Managed Identity Token Service
    participant SA as Storage Account
    participant Listener as Timer Listener

    Host->>MSI: Acquire token for storage
    MSI-->>Host: Access token
    Host->>SA: Probe host storage artifacts
    SA-->>Host: 403 AuthorizationPermissionMismatch
    Host->>Listener: Initialize Functions.scheduled_cleanup
    Listener-->>Host: Unable to start
    Host-->>Host: Process reporting unhealthy: azure.functions.webjobs.storage

1.6 Signal map: normal vs incident¶

Signal source	Normal	Incident
`traces` host startup	Storage probe success messages	Repeated `AuthorizationPermissionMismatch`
`traces` listener logs	Listener started / schedule acquired	`The listener ... was unable to start`
`requests` for timer-backed work	Regular success cadence	Missing invocations or repeated failures
`exceptions`	Sparse or mixed errors	Dominant unauthorized exceptions
Health-related traces	Healthy lifecycle transitions	`Process reporting unhealthy: azure.functions.webjobs.storage`

2) Hypothesis¶

2.1 Formal hypothesis statement¶

If Azure Functions loses required authorization to AzureWebJobsStorage (managed identity RBAC drift or invalid connection auth), then host storage probes fail with authorization errors, timer listeners cannot start, host health becomes unhealthy for storage, and trigger invocations halt until access is restored.

2.2 Causal chain¶

flowchart LR
    A[Role assignment removed or invalid storage auth] --> B[Storage probe returns 403 AuthorizationPermissionMismatch]
    B --> C[Listener initialization fails for Functions.scheduled_cleanup]
    C --> D[Host emits unhealthy storage signal]
    D --> E[Timer trigger invocations stop]
    E --> F[Operational symptom: no scheduled executions]

    G[Role restored] --> H[Storage probe succeeded 70ms 4 containers]
    H --> I[Listener starts]
    I --> J[Invocations resume]

2.3 Proof criteria¶

All criteria below support the hypothesis:

traces contain AuthorizationPermissionMismatch during incident window.
traces contain The listener for function 'Functions.scheduled_cleanup' was unable to start.
traces contain Process reporting unhealthy: azure.functions.webjobs.storage.
Timer invocation cadence drops to zero or error-only behavior.
After restoration, logs include storage probe success pattern (70ms, 4 containers) and listener recovery.

2.4 Disproof criteria¶

Any of the following weakens or falsifies the hypothesis:

Storage authorization errors are absent in the same incident window.
Listener startup failures persist after confirmed storage role restoration.
Invocations remain halted while probe success is stable.
Alternative root cause has stronger evidence (network isolation, DNS resolution, extension mismatch).

2.5 Competing hypotheses tested in this lab¶

Competing hypothesis	What would be observed	How this lab disambiguates
Runtime regression	Mixed failures unrelated to storage auth	Filter traces for storage auth signatures
Platform transient	Short sporadic errors without deterministic repetition	Look for repeated 403 and listener failure sequence
DNS/network issue	Connection timeout/refused patterns, not auth mismatch	Compare error codes and message families
App code defect	Function exceptions without host storage unhealthy traces	Correlate host-level lifecycle signals

2.6 Expected lab verdict¶

Expected outcome for this lab run:

Hypothesis supported during break phase.
Hypothesis recovery path supported after role restoration.
Evidence chain remains consistent without requiring speculative causes.

3) Runbook¶

This runbook is execution-focused and uses long-form CLI flags only.

3.1 Prerequisites¶

Tool or access	Check command
Azure CLI	`az version`
Logged-in Azure session	`az account show`
Bash shell	`bash --version`
App Insights query rights	`az monitor app-insights component show --resource-group "$RG" --app "$APP_NAME"`

3.2 Variables¶

Use the canonical variable names used across this guide:

RG="rg-lab-storage-access"
APP_NAME="func-lab-storageaccess"
LOCATION="koreacentral"
STORAGE_NAME="stlabstorageaccess"

Optional helper variables:

SUBSCRIPTION_ID="<subscription-id>"
LAB_PATH="labs/storage-access-failure"

3.3 Deploy infrastructure¶

Create or confirm resource group:

az group create \
  --name "$RG" \
  --location "$LOCATION"

Deploy the lab stack:

az deployment group create \
  --resource-group "$RG" \
  --template-file "$LAB_PATH/main.bicep" \
  --parameters "appName=$APP_NAME" "storageName=$STORAGE_NAME" "location=$LOCATION"

Validate app and storage resources:

az functionapp show \
  --resource-group "$RG" \
  --name "$APP_NAME" \
  --output table

az storage account show \
  --resource-group "$RG" \
  --name "$STORAGE_NAME" \
  --output table

3.4 Baseline verification (pre-failure)¶

Confirm the app is reachable and host is stable:

az functionapp function list \
  --resource-group "$RG" \
  --name "$APP_NAME" \
  --output table

Capture app settings relevant to storage:

az functionapp config appsettings list \
  --resource-group "$RG" \
  --name "$APP_NAME" \
  --output json

Check baseline host logs in short time window:

let appName = "func-lab-storageaccess";
traces
| where timestamp > ago(30m)
| where cloud_RoleName =~ appName
| where message has_any ("Host started", "listener", "storage", "probe")
| project timestamp, severityLevel, message
| order by timestamp desc

Baseline expected pattern:

No repeated 403 storage auth failures.
Listener startup does not show persistent failure.
Invocation cadence appears normal for timer schedule window.

3.5 Capture baseline evidence queries¶

3.5.1 Function execution summary¶

let appName = "func-lab-storageaccess";
requests
| where timestamp > ago(1h)
| where cloud_RoleName =~ appName
| where operation_Name startswith "Functions."
| summarize
    Invocations = count(),
    Failures = countif(success == false),
    FailureRatePercent = round(100.0 * countif(success == false) / count(), 2),
    P95Ms = round(percentile(duration, 95), 2)
  by FunctionName = operation_Name
| order by Failures desc, P95Ms desc

3.5.2 Failed invocations with exception join¶

let appName = "func-lab-storageaccess";
requests
| where timestamp > ago(2h)
| where cloud_RoleName =~ appName
| where operation_Name startswith "Functions."
| where success == false
| project timestamp, operation_Id, FunctionName = operation_Name, resultCode, duration
| join kind=leftouter (
    exceptions
    | where timestamp > ago(2h)
    | where cloud_RoleName =~ appName
    | project operation_Id, ExceptionType = type, ExceptionMessage = outerMessage
) on operation_Id
| order by timestamp desc

3.5.3 Host startup and lifecycle events¶

let appName = "func-lab-storageaccess";
traces
| where timestamp > ago(2h)
| where cloud_RoleName =~ appName
| where message has_any ("Host started", "Job host started", "listener", "storage", "unhealthy")
| project timestamp, severityLevel, message
| order by timestamp desc

3.6 Trigger the storage access failure¶

This phase intentionally introduces RBAC breakage for the Function App identity.

List current role assignments for app identity on storage scope:

APP_PRINCIPAL_ID=$(az functionapp identity show \
  --resource-group "$RG" \
  --name "$APP_NAME" \
  --query "principalId" \
  --output tsv)

STORAGE_ID=$(az storage account show \
  --resource-group "$RG" \
  --name "$STORAGE_NAME" \
  --query "id" \
  --output tsv)

az role assignment list \
  --assignee "$APP_PRINCIPAL_ID" \
  --scope "$STORAGE_ID" \
  --output table

Remove required roles (failure trigger):

az role assignment delete \
  --assignee-object-id "$APP_PRINCIPAL_ID" \
  --role "Storage Account Contributor" \
  --scope "$STORAGE_ID"

az role assignment delete \
  --assignee-object-id "$APP_PRINCIPAL_ID" \
  --role "Storage Blob Data Owner" \
  --scope "$STORAGE_ID"

az role assignment delete \
  --assignee-object-id "$APP_PRINCIPAL_ID" \
  --role "Storage Queue Data Contributor" \
  --scope "$STORAGE_ID"

az role assignment delete \
  --assignee-object-id "$APP_PRINCIPAL_ID" \
  --role "Storage Table Data Contributor" \
  --scope "$STORAGE_ID"

Restart Function App so host performs fresh storage initialization:

az functionapp restart \
  --resource-group "$RG" \
  --name "$APP_NAME"

Wait for 2-5 minutes for telemetry propagation.

3.7 Collect incident evidence¶

3.7.1 Storage auth and unhealthy traces¶

let appName = "func-lab-storageaccess";
traces
| where timestamp > ago(1h)
| where cloud_RoleName =~ appName
| where message has_any ("AuthorizationPermissionMismatch", "Unable to access AzureWebJobsStorage", "unhealthy", "Forbidden", "Unauthorized")
| project timestamp, severityLevel, message
| order by timestamp desc

3.7.2 Listener startup failures¶

let appName = "func-lab-storageaccess";
traces
| where timestamp > ago(1h)
| where cloud_RoleName =~ appName
| where message has "The listener for function 'Functions.scheduled_cleanup' was unable to start."
| project timestamp, severityLevel, message
| order by timestamp desc

3.7.3 Request and exception correlation¶

let appName = "func-lab-storageaccess";
requests
| where timestamp > ago(1h)
| where cloud_RoleName =~ appName
| where operation_Name startswith "Functions."
| project timestamp, operation_Id, operation_Name, success, resultCode, duration
| join kind=leftouter (
    exceptions
    | where timestamp > ago(1h)
    | where cloud_RoleName =~ appName
    | project operation_Id, type, outerMessage
) on operation_Id
| order by timestamp desc

3.7.4 Dependency failures to storage endpoints¶

let appName = "func-lab-storageaccess";
dependencies
| where timestamp > ago(1h)
| where cloud_RoleName =~ appName
| where target has_any ("blob.core.windows.net", "queue.core.windows.net")
| summarize Calls=count(), Failed=countif(success == false), FailureRatePercent=round(100.0 * countif(success == false) / count(), 2), P95Ms=percentile(duration, 95) by target, resultCode
| order by Failed desc, P95Ms desc

3.8 Real incident log patterns to verify¶

Use these representative patterns as validation anchors. The failure phase shows event log snippets derived from live telemetry:

3.8.1 Failure-phase log snippets¶

Process reporting unhealthy: Unhealthy. Health check entries are {
  "azure.functions.web_host.lifecycle": {"status":"Healthy"},
  "azure.functions.script_host.lifecycle": {"status":"Healthy"},
  "azure.functions.webjobs.storage": {
    "status":"Unhealthy",
    "description":"Unable to access AzureWebJobsStorage",
    "errorCode":"AuthorizationPermissionMismatch"
  }
}
[2026-04-04T12:17:54Z] Storage operation failed: Status: 403 (AuthorizationPermissionMismatch) after 325ms while probing AzureWebJobsStorage.
[2026-04-04T12:15:42Z] The listener for function 'Functions.scheduled_cleanup' was unable to start.

3.8.2 Recovery-phase log snippet¶

[2026-04-04T12:31:08Z] Storage probe succeeded (70ms, 4 containers).
[2026-04-04T12:31:10Z] Job host started.
[2026-04-04T12:31:11Z] The listener for function 'Functions.scheduled_cleanup' started.

3.9 Interpretation checklist¶

Check	Evidence source	Pass condition
Storage auth failure detected	`traces`	`AuthorizationPermissionMismatch` appears repeatedly
Listener startup failure detected	`traces`	Scheduled cleanup listener unable to start
Host unhealthy emitted	`traces`	`azure.functions.webjobs.storage: Unhealthy` appears
Invocation disruption observed	`requests`	Timer execution absent or failure-only
Recovery probe success detected	`traces`	`Storage probe succeeded (70ms, 4 containers)`
Invocation recovery observed	`requests`	Successful scheduled executions resume

3.10 Triage decision logic¶

flowchart TD
    A[No scheduled executions observed] --> B{AuthorizationPermissionMismatch present?}
    B -->|Yes| C{Listener unable to start present?}
    B -->|No| D[Investigate other causes: DNS, runtime, trigger config]
    C -->|Yes| E{Host unhealthy storage signal present?}
    C -->|No| F[Check trigger binding and schedule config]
    E -->|Yes| G[Classify as storage authorization failure]
    E -->|No| H[Collect more host lifecycle traces]
    G --> I[Restore RBAC or storage auth settings]
    I --> J{Probe success + invocation recovery?}
    J -->|Yes| K[Hypothesis confirmed]
    J -->|No| L["Escalate to network/platform differential diagnosis"]

3.11 Recover from the induced failure¶

Re-add required roles:

az role assignment create \
  --assignee-object-id "$APP_PRINCIPAL_ID" \
  --role "Storage Account Contributor" \
  --scope "$STORAGE_ID"

az role assignment create \
  --assignee-object-id "$APP_PRINCIPAL_ID" \
  --role "Storage Blob Data Owner" \
  --scope "$STORAGE_ID"

az role assignment create \
  --assignee-object-id "$APP_PRINCIPAL_ID" \
  --role "Storage Queue Data Contributor" \
  --scope "$STORAGE_ID"

az role assignment create \
  --assignee-object-id "$APP_PRINCIPAL_ID" \
  --role "Storage Table Data Contributor" \
  --scope "$STORAGE_ID"

Restart the app:

az functionapp restart \
  --resource-group "$RG" \
  --name "$APP_NAME"

Wait 2-5 minutes and re-run the incident queries.

3.12 Verify recovery¶

Recovery validation query:

let appName = "func-lab-storageaccess";
traces
|| where timestamp > ago(10m)
| where cloud_RoleName =~ appName
| where message has_any ("Storage probe succeeded", "listener", "Host started", "Job host started")
| project timestamp, severityLevel, message
| order by timestamp desc

Invocation recovery query:

let appName = "func-lab-storageaccess";
requests
|| where timestamp > ago(10m)
| where cloud_RoleName =~ appName
| where operation_Name =~ "Functions.scheduled_cleanup"
| summarize Invocations=count(), Failures=countif(success == false), LastSeen=max(timestamp)

Expected recovery indicators:

Storage probe success appears.
Listener startup failures stop.
Scheduled trigger invocations resume.

4) Experiment Log¶

This section documents artifact-centered observations and links each finding to hypothesis criteria.

4.1 Artifact inventory¶

Category	Artifact examples
Deployment and config	`deployment.json`, `appsettings-baseline.json`, `identity-baseline.json`
Role assignment evidence	`roles-before.txt`, `roles-after-break.txt`, `roles-after-recovery.txt`
KQL exports	`kql-traces-storage-failure.json`, `kql-listener-failure.json`, `kql-recovery.json`
Invocation evidence	`kql-requests-before.json`, `kql-requests-during.json`, `kql-requests-after.json`
Incident notes	`timeline.md`, `verdict.md`

4.2 Baseline evidence snapshot¶

4.2.1 Baseline role assignments¶

Role	Scope	Expected
Storage Account Contributor	Storage account scope	Present
Storage Blob Data Owner	Storage account scope	Present
	Storage Queue Data Contributor	Storage account scope
	Storage Table Data Contributor	Storage account scope

4.2.2 Baseline host traces¶

Representative pattern before break:

[2026-04-04T12:05:11Z] Host started (62ms)
[2026-04-04T12:05:11Z] Job host started
[2026-04-04T12:05:12Z] Timer listener started for Functions.scheduled_cleanup

4.2.3 Baseline invocation summary¶

FunctionName	Invocations	Failures	FailureRatePercent	P95Ms
Functions.scheduled_cleanup	10	0	0.00	245

4.3 Trigger event log¶

RBAC break operation timestamped:

2026-04-04T12:14:58Z  Removed role: Storage Account Contributor
2026-04-04T12:15:02Z  Removed role: Storage Blob Data Owner
2026-04-04T12:15:02Z  Removed role: Storage Queue Data Contributor
2026-04-04T12:15:12Z  Function app restart initiated

Role coverage in this experiment

This experiment removed three of the four required roles. Storage Table Data Contributor was not present in the original assignment and therefore was not removed. In production, ensure all four roles are verified.

4.4 During-incident KQL observations¶

4.4.1 Host storage failure traces¶

timestamp	severityLevel	message
2026-04-04T12:22:42Z	3	Process reporting unhealthy: azure.functions.webjobs.storage: Unhealthy
2026-04-04T12:22:42Z	3	Unable to access AzureWebJobsStorage. Status: 403 (AuthorizationPermissionMismatch)
2026-04-04T12:17:54Z	3	Storage operation failed: Status: 403 (AuthorizationPermissionMismatch) after 325ms while probing AzureWebJobsStorage.

4.4.2 Listener startup failure traces¶

timestamp	severityLevel	message
2026-04-04T12:17:42Z	3	The listener for function 'Functions.scheduled_cleanup' was unable to start.
2026-04-04T12:15:42Z	3	The listener for function 'Functions.scheduled_cleanup' was unable to start.

4.4.3 Request and exception correlation¶

timestamp	operation_Id	FunctionName	resultCode	duration	ExceptionType	ExceptionMessage
2026-04-04T12:17:54Z	xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx	Functions.scheduled_cleanup	500	8023	System.UnauthorizedAccessException	Storage operation failed: Status: 403 (AuthorizationPermissionMismatch)
2026-04-04T12:17:42Z	xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx	Functions.scheduled_cleanup	500	7988	Microsoft.Azure.WebJobs.Host.Listeners.FunctionListenerException	The listener for function 'Functions.scheduled_cleanup' was unable to start.

4.4.4 Dependency failure concentration¶

target	resultCode	Calls	Failed	FailureRatePercent	P95Ms
`stlabstorageaccess.blob.core.windows.net`	403	18	18	100.00	9800
`stlabstorageaccess.queue.core.windows.net`	403	18	18	100.00	8700

4.5 Recovery-phase observations¶

Role restoration and restart executed:

2026-04-04T12:28:14Z  Added role: Storage Account Contributor
2026-04-04T12:28:18Z  Added role: Storage Blob Data Owner
2026-04-04T12:28:18Z  Added role: Storage Queue Data Contributor
2026-04-04T12:28:31Z  Function app restart initiated

Role coverage in this experiment

This experiment restored three of the four required roles. Storage Table Data Contributor should also be verified and assigned in production scenarios for complete coverage of table-based operations.

Recovery traces:

[2026-04-04T12:31:08Z] Storage probe succeeded (70ms, 4 containers). Listener initialization resumed.
[2026-04-04T12:31:10Z] Job host started
[2026-04-04T12:31:11Z] The listener for function 'Functions.scheduled_cleanup' started.

Recovery invocation summary:

FunctionName	Invocations	Failures	FailureRatePercent	LastSeen
Functions.scheduled_cleanup	6	0	0.00	2026-04-04T12:39:00Z

4.6 Core finding¶

Key finding (validated)

Removing storage data-plane roles from the Function App identity produces a deterministic and reproducible failure cascade:

Storage probe returns 403 AuthorizationPermissionMismatch.
Functions.scheduled_cleanup listener fails to start.
Host reports storage unhealthy state.
Scheduled execution halts.

Restoring role assignments immediately reverses the pattern and resumes trigger execution.

4.7 Hypothesis verdict table¶

Criterion	Result	Evidence
Storage auth failure exists	Supported	Repeated `AuthorizationPermissionMismatch` traces
Listener fails on storage issue	Supported	Listener unable to start logs
Host unhealthy signal appears	Supported	`azure.functions.webjobs.storage: Unhealthy`
Trigger execution halts	Supported	Missing or failed scheduled invocations
Recovery after role restore	Supported	Probe success (`70ms`, `4 containers`) and resumed invocations

Final verdict: Hypothesis supported.

4.8 Practical implications for production operations¶

Include storage-role drift checks in deployment validation.
Monitor explicit host storage health messages, not only request failure rates.
Keep incident playbooks separated by error family (auth vs network vs runtime).
For FC1, treat worker startup windows as high-sensitivity points for RBAC regressions.
Capture recovery evidence explicitly to avoid false closure.

4.9 Reproducibility notes¶

Use the same storage account scope for role remove/add operations.
Allow telemetry propagation delay before concluding no-signal.
Keep all exported evidence sanitized.
Replace IDs with placeholders:
- xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
- <subscription-id>
- <tenant-id>
- <object-id>

Expected Evidence¶

This section defines the expected observation pattern before, during, and after the induced incident.

Before Trigger (Baseline)¶

Evidence Source	Expected State	What to Capture
Role assignments	Required roles present on storage scope	Role list export
Host traces	No repeated storage auth failures	Host lifecycle snippet
Timer execution	Scheduled function runs normally	Request summary table
Exceptions	No dominant unauthorized exception class	Exception trend query output

During Incident¶

Evidence Source	Expected State	Key Indicator
`traces`	Storage authorization failure repeats	`AuthorizationPermissionMismatch`
`traces`	Listener startup fails	`The listener for function 'Functions.scheduled_cleanup' was unable to start.`
`traces`	Host storage health degraded	`Process reporting unhealthy: azure.functions.webjobs.storage: Unhealthy`
`requests`	Trigger invocations stop or fail	Missing cadence / 500s
`exceptions`	Unauthorized/listener exceptions dominate	`System.UnauthorizedAccessException`, `FunctionListenerException`

After Recovery¶

Evidence Source	Expected State	Key Indicator
Role assignments	Required roles restored	Role list confirms required storage roles are present
`traces`	Storage probe returns success	`Storage probe succeeded (70ms, 4 containers)`
`traces`	Listener starts successfully	Listener start message present
`requests`	Timer invocations resume	Success count increases
Exceptions	Unauthorized error family subsides	Incident exceptions stop repeating

Evidence Timeline¶

graph LR
    A[Baseline healthy state] --> B[Role removal trigger]
    B --> C[Auth failure traces appear]
    C --> D[Listener startup failure]
    D --> E[Host unhealthy storage signal]
    E --> F[Invocation halt]
    F --> G[Role restoration]
    G --> H[Storage probe succeeded 70ms 4 containers]
    H --> I[Listener recovery]
    I --> J[Invocation recovery]

Evidence Chain: Why This Proves the Hypothesis¶

Falsification logic

Hypothesis is confirmed when the evidence chain appears in this order:

RBAC break event at storage scope.
Storage authorization failure (AuthorizationPermissionMismatch).
Listener startup failure for Functions.scheduled_cleanup.
Host unhealthy signal for azure.functions.webjobs.storage.
Trigger execution disruption.
Role restoration.
Storage probe success (70ms, 4 containers) and invocation recovery.

Hypothesis is falsified if this chain cannot be established and an alternative cause explains the data better.

Clean Up¶

az group delete \
  --name "$RG" \
  --yes \
  --no-wait

Functions Failing