Evidence Map for Storage Troubleshooting¶
This page maps common storage investigation questions to the best evidence source, the command to run, and the signal to interpret.
mermaid flowchart LR A[Investigation question] --> B[Evidence source] B --> C[Command or portal check] C --> D[Proof artifact] D --> E[Hypothesis validated or disproved]
Quick evidence matrix¶
| Question | Best Source | Command or Check | What Good Looks Like |
|---|---|---|---|
| Is DNS resolving the intended endpoint? | local DNS result | nslookup <account>.blob.core.windows.net | private IP for private path, public IP for public path |
| Is the account reachable on the expected port? | client connectivity test | Test-NetConnection <account>.file.core.windows.net --port 445 | required port reachable from source network |
| Is the firewall blocking the request? | storage account network config | az storage account show --name $STORAGE_NAME --resource-group $RG --query "networkRuleSet" | source path allowed by rule set |
| Is the private endpoint healthy? | private endpoint connection state | az network private-endpoint-connection list --id <storage-resource-id> | connection approved and in expected state |
| Is auth failing because of RBAC scope? | role assignments | az role assignment list --scope <scope> | principal has correct data-plane role at correct scope |
| Is SAS invalid because of time or permissions? | SAS fields and system clock | inspect st, se, sp, sip, spr | valid time window and required permissions |
| Is the account being throttled? | Azure Monitor metrics | check Transactions, SuccessE2ELatency, SuccessServerLatency, Availability | throughput spike aligns with 429/503 or server latency growth |
| Is transfer slowness mostly client-side? | client throughput and RTT | compare local transfer test vs storage metrics | client latency/concurrency explains slow end-to-end path |
| Can deleted data be recovered? | account protection settings | check soft delete, versioning, backup, retention | feature was enabled before incident and retention window still open |
Evidence recipes by category¶
1) Access evidence¶
nslookup <account>.blob.core.windows.net
nslookup <account>.privatelink.blob.core.windows.net
az storage account show --name $STORAGE_NAME --resource-group $RG --query "{publicNetworkAccess:publicNetworkAccess,networkRuleSet:networkRuleSet}"
az network private-endpoint-connection list --id <storage-resource-id>
Look for mismatches between intended path and actual DNS answer. A private endpoint incident often starts as a DNS evidence problem, not a transport problem.
2) Security evidence¶
az role assignment list --scope <scope>
az storage account show --name $STORAGE_NAME --resource-group $RG --query "{allowSharedKeyAccess:allowSharedKeyAccess,defaultToOAuthAuthentication:defaultToOAuthAuthentication}"
Also capture the sanitized error body or response code and record which auth path was used: Azure AD, SAS, or shared key.
3) Performance evidence¶
az monitor metrics list --resource <storage-resource-id> --metric "Transactions,SuccessE2ELatency,SuccessServerLatency,Availability,Ingress,Egress" --interval PT1M
Separate server-side pressure from client-side inefficiency. If storage server latency stays low while end-to-end latency is high, the bottleneck usually sits in the client, network distance, object shape, or concurrency pattern.
4) Recovery evidence¶
az storage account blob-service-properties show --account-name $STORAGE_NAME --resource-group $RG
az backup vault backup-properties show --name <vault-name> --resource-group $RG
The critical question is historical: was the protection feature already enabled before the delete or overwrite event?
Evidence anti-patterns¶
- Using only the portal summary without recording raw commands and outputs.
- Treating a 403 as pure identity failure before validating DNS and endpoint path.
- Treating slow upload as throttling without checking object size mix and concurrency.
- Assuming recovery is possible without confirming retention and feature state.