Skip to content

Hands-on Labs

Bicep-based reproduction environments for each troubleshooting scenario. Deploy to your Azure subscription, trigger the symptom, observe signals, then clean up.

graph TD
    A[Hands-on Labs] --> B[Performance]
    A --> C[Startup or Availability]
    A --> D[Outbound or Network]
    B --> E[Memory Pressure]
    B --> F[Intermittent 5xx]
    B --> G[No Space Left on Device]
    B --> H[Slow Start / Cold Start]
    C --> I[Container HTTP Pings]
    C --> J[Deployment Ok Startup Failed]
    C --> K[Failed to Forward Request]
    C --> L[Slot Swap Config Drift]
    D --> M[SNAT Exhaustion]
    D --> N[DNS Resolution VNet]

How Labs Work

Each lab includes:

  1. main.bicep — Infrastructure template (App Service Plan B1, App Service Python 3.11 Linux, Log Analytics, Diagnostic Settings)
  2. app/ — Flask application designed to reproduce a specific symptom
  3. trigger.sh — Script to trigger the symptom
  4. verify.sh — Script to query Log Analytics and confirm expected signals appeared
  5. Documentation page — Step-by-step walkthrough with KQL queries and expected observations

Portal view: Diagnose and solve problems entry point

Azure portal Diagnose and solve problems blade for app-test-20251107 with the Common Solutions tab selected (AI-powered Diagnostics (preview) tab also visible) and a top toolbar showing a Search for common problems or tools search box, Refresh, and Feedback. A Risk alerts section shows a single Availability card with 2 Critical alerts and a View more details link. Seven Troubleshooting categories follow. A Popular troubleshooting tools list at the bottom shows Application Logs, App Down Workflow, Web App Down, Web App Slow, and Process Full List. Left nav highlights Diagnose and solve problems.

The Diagnose and solve problems blade is the first Portal stop when running these labs. The Common Solutions tab groups seven troubleshooting categories that map to the lab folders below, and the Popular troubleshooting tools row at the bottom links to App Down Workflow, Web App Down, Web App Slow, and Process Full List - the same detectors most labs ask you to open after reproducing the failure mode. The Risk alerts Availability card with 2 Critical alerts is exactly the signal you want to see fire when a lab triggers an outage. Treat this blade as the operational companion to the lab metadata table: it tells you which Portal detector to consult after triggering each scenario.

Available Labs

Performance

Lab Symptom Related Playbook
Memory Pressure Rising memory, worker timeouts, container restarts Memory Pressure & Worker Degradation
Intermittent 5xx Under Load Sporadic 5xx responses during traffic spikes Intermittent 5xx Under Load
No Space Left on Device Disk full errors from /home or /tmp exhaustion No Space Left on Device
Slow Start / Cold Start First request slow after deploy or idle Slow Start / Cold Start vs Regression

Startup / Availability

Lab Symptom Related Playbook
Container Didn't Respond to HTTP Pings Container fails to start — no HTTP response on expected port Container Didn't Respond to HTTP Pings
Deployment Succeeded but Startup Failed Deploy green but app down — wrong startup command Deployment Succeeded but Startup Failed
Failed to Forward Request Platform proxy can't reach app — wrong bind address Failed to Forward Request
Slot Swap Config Drift Swap succeeds but production restarts or config breaks Slot Swap Config Drift / Restart Race

Outbound / Network

Lab Symptom Related Playbook
SNAT Exhaustion Outbound connection failures from SNAT port exhaustion SNAT or Application Issue?
DNS Resolution (VNet) DNS resolution failure for private endpoints in VNet-integrated apps DNS Resolution (VNet)

Identity & Deployment

Lab Symptom Related Playbook
CD Reconnect RBAC Conflict RoleAssignmentExists on Deployment Center reconnect for container CD with managed-identity ACR pulls CD RBAC Role Assignment Conflict

Prerequisites

All labs require:

  • Azure subscription with Contributor access
  • Azure CLI installed and logged in (az login)
  • Bash shell (Linux, macOS, or WSL)

General Workflow

# 1. Create resource group
az group create --name rg-lab-<name> --location koreacentral

# 2. Deploy infrastructure
az deployment group create \
  --resource-group rg-lab-<name> \
  --template-file labs/<name>/main.bicep \
  --parameters baseName=lab<short>

# 3. Deploy app code (zip deploy or local git)
# 4. Run trigger script
# 5. Wait 2-5 minutes for logs to appear
# 6. Run verify script or query Log Analytics manually

# 7. Clean up
az group delete --name rg-lab-<name> --yes --no-wait

Cost

Each lab deploys a B1 App Service Plan. Delete the resource group after completing the lab to avoid ongoing charges.

See Also

Sources