Azure PaaS Troubleshooting Labs¶
Reproducible experiments for Azure App Service, Azure Functions, and Azure Container Apps.
By Yeongseon Choe
This site documents hypothesis-driven experiments that reproduce failure modes, performance edge cases, and platform boundary ambiguities in Azure PaaS services. Each experiment records what was observed, what can be concluded, and what remains unproven.
The target audience is Azure support engineers, escalation engineers, and platform operators who need to distinguish between platform-side and application-side issues under real-world conditions.
This site is the troubleshooting companion to the practical guide series, not a replacement. The guides cover broad reference material; these labs cover deep, narrow investigation.
Experiment Status Overview¶
| Service | Experiment | Status | Last Updated |
|---|---|---|---|
| App Service | Filesystem Persistence | Published | 2026-04 |
| App Service | Health Check Eviction | Published | 2026-04 |
| App Service | SNAT Exhaustion | Published | 2026-04 |
| App Service | Memory Pressure | Published | 2025-07 |
| Container Apps | Scale-to-Zero 503 | Published | 2026-04 |
| Container Apps | Target Port Detection | Published | 2026-04 |
| Container Apps | OOM Visibility Gap | Published | 2026-04 |
| App Service | Custom DNS Resolution | Planned | — |
| Functions | Cold Start | Draft | — |
| Container Apps | Startup Probes | Draft | — |
7 Experiments Published
Seven experiments across App Service and Container Apps have been completed with real Azure data. Each includes full evidence chains, raw data, and reproducible procedures.
Getting Started¶
New to this project? Start here:
- Read the methodology — Experiment Framework explains the standardized structure every experiment follows.
- Understand evidence levels — Evidence Levels defines how findings are tagged with calibrated confidence.
- Browse published experiments — Start with any experiment that matches your area of interest:
- App Service: Filesystem Persistence, Health Check Eviction, SNAT Exhaustion
- Container Apps: OOM Visibility Gap, Scale-to-Zero 503, Target Port Detection
- Check the glossary — Glossary defines key Azure and troubleshooting terms used throughout this site.
Quick Links¶
- App Service Labs — Filesystem persistence, health check eviction, SNAT exhaustion, memory pressure (4 published)
- Container Apps Labs — Scale-to-zero, target port detection, OOM visibility gap (3 published)
- Functions Labs — Cold start, storage edge cases, dependency visibility (planned)
- Cross-cutting — MI RBAC propagation, PE DNS negative caching
- Methodology — Experiment framework, evidence model, interpretation guidelines
- Glossary — Key terms and definitions
Site Map¶
Methodology¶
- Experiment Framework — standardized structure for all experiments
- Statistical Methods — repeated-run methodology for performance experiments
- Evidence Levels — tagging system for calibrated confidence
- Platform vs App Boundary — framework for boundary analysis
- Interpretation Guidelines — how to read and communicate results
App Service Labs¶
- Filesystem Persistence — /home vs writable layer data survival — Published
- Health Check Eviction — cascading outage from partial dependency failure — Published
- SNAT Exhaustion — connection failures without CPU/memory pressure — Published
- Memory Pressure — plan-level degradation, swap thrashing, kernel page reclaim — Published
- Custom DNS Resolution — private name resolution drift after VNet changes
- procfs Interpretation — /proc reliability and limits in Linux containers
- Slow Requests — frontend timeout vs. worker-side delay vs. dependency latency
- Zip Deploy vs Container — deployment method behavioral differences
Container Apps Labs¶
- Scale-to-Zero 503 — first-request failure modes after idle scale-down — Published
- Target Port Detection — auto-detection failures causing 502 — Published
- OOM Visibility Gap — observability gaps for OOM kills — Published
- Ingress SNI / Host Header — SNI and host header routing behavior
- Private Endpoint FQDN vs IP — FQDN vs. direct IP access differences
- Startup Probes — probe interaction and failure patterns — Draft
Functions Labs¶
- Flex Router Queueing — hidden latency between request arrival and invocation
- HTTP Concurrency Cliffs — per-instance degradation thresholds
- Telemetry Auth Blackhole — monitoring misconfiguration preventing startup
- Flex Consumption Storage — storage identity misconfiguration edge cases
- Cold Start — dependency initialization and cold start duration breakdown — Draft
- Dependency Visibility — outbound dependency observability limits
Patterns¶
- Symptom to Hypothesis — common symptoms and investigation starting points
- False Positives — signals that suggest problems that don't exist
- Metric Misreads — commonly misinterpreted Azure metrics
Background¶
See About for the full motivation, goals, and positioning of this project.