Container Didn't Respond to HTTP Pings (Azure App Service Linux)¶

1. Summary¶

Symptom¶

The container starts but App Service reports "Container didn't respond to HTTP pings on port: , failing site start." The app never becomes available, or it starts successfully locally but fails on App Service.

Why this scenario is confusing¶

Multiple unrelated issues produce the same error message. Port binding, startup time, health endpoint behavior, and base image issues can all manifest as "didn't respond to HTTP pings." The error message itself does not distinguish between these causes.

Troubleshooting decision flow¶

graph TD
    A[Symptom: Container didn't respond to HTTP pings] --> B{What to check first?}
    B --> C[Expected port != actual listen port]
    B --> D[Bound to 127.0.0.1/localhost]
    B --> E[Long init before first HTTP response]
    B --> F[Startup tracebacks or restart loop]
    C --> H1[H1: Port mismatch]
    D --> H2[H2: Binding to localhost]
    E --> H3[H3: Startup time exceeded]
    F --> H4[H4: Application crash during startup]

Limitations¶

Windows-specific container behavior is out of scope
This playbook covers custom containers and built-in Linux images; it does not cover App Service on Kubernetes
Detailed framework-specific startup optimization is referenced but not exhaustively documented

Quick Conclusion¶

Treat this error as a startup reachability workflow, not a single failure mode. First align Linux port and bind behavior (PORT, WEBSITES_PORT, 0.0.0.0), then correlate platform timeout events with console startup logs to separate timeout from crash conditions. Linux startup behavior is more nuanced than a direct WEBSITES_PORT-equals-probe-target model. See Container HTTP Pings Lab for experimental evidence on Linux port behavior. Once the immediate issue is mitigated, harden startup commands and initialization behavior so every deployment responds to HTTP pings within the configured startup window.

2. Common Misreadings¶

"The container crashed" (it may be running but listening on the wrong port)
"The app is too slow to start" (it may be fast but binding to 127.0.0.1 instead of 0.0.0.0)
"Health check is failing" (startup pings and configured health checks are different mechanisms)
"The Dockerfile is wrong" (it may work locally but fail on App Service due to environment differences)
"The platform is broken" (almost always an application or configuration issue)

3. Competing Hypotheses¶

H1: Port or listener contract mismatch — the container listens on a different port or address than the Linux startup path actually reaches (PORT, WEBSITES_PORT, bind address, or startup command drift)
H2: Binding to localhost — container binds to 127.0.0.1 instead of 0.0.0.0, so the platform health probe cannot reach it
H3: Startup time exceeded — container takes longer than the startup time limit (WEBSITES_CONTAINER_START_TIME_LIMIT, default 230 seconds) to begin responding to HTTP
H4: Application crash during startup — container process exits or enters an error loop before it can respond to any HTTP request
H5: Missing or incorrect startup command — CMD/ENTRYPOINT issue where the web server never actually starts

4. What to Check First¶

Metrics¶

Linux - Number of Running Containers detector (is the container actually running?)
Restart count (is the platform killing and restarting the container repeatedly?)

Portal view: Web App Down detector (the first signal for HTTP ping failures)¶

This is the detector to open first when the symptom is "Container didn't respond to HTTP pings". The two KPIs distinguish the failure surface: if App Availability is low but Platform Availability is 100%, the container itself is the problem - container is running but never reaches probe-readiness, which is exactly this playbook's scope. If Platform Availability is also low, the issue is upstream of the container and this playbook doesn't apply. The left-rail siblings - especially Container Issues and Web App Restarted - are the next detectors to open: Container Issues surfaces ping failure messages and exit codes directly, and Web App Restarted exposes the restart-loop pattern that H4 (application crash during startup) produces.

Logs¶

AppServiceConsoleLogs: stdout/stderr from the container — look for listen port, binding address, startup errors, crash traces
AppServicePlatformLogs: platform-level events — "Container started", "Container failed to start", "Stopping site because it failed to start"

Platform Signals¶

WEBSITES_PORT app setting value (custom-container port configuration input).
PORT environment variable (runtime-injected into the container and usually the value the app should bind to).
WEBSITES_CONTAINER_START_TIME_LIMIT value
Startup command configuration in Azure Portal
Docker log stream (Log stream blade in Portal)

Portal view: Log stream (live container stdout/stderr)¶

For HTTP ping failures, Log stream is the fastest way to see whether your application process even reaches a bind line. With the Runtime radio shown selected (the alternative is Platform), watch in real-time during a restart - you should see Gunicorn/Uvicorn/Node/Java startup banners followed by a Listening at: http://0.0.0.0:<port> line. If those lines never appear before the container terminates, you're in H4 (application crash) or H5 (missing startup command) territory. If they appear with 127.0.0.1, you're in H2 (localhost bind). If they appear but the port doesn't match WEBSITES_PORT, you're in H1 (port mismatch). The OpenTelemetry exporter lines visible here are normal background telemetry noise - filter past them to find the actual app startup output.

5. Evidence to Collect¶

Required Evidence¶

Container console output during startup (AppServiceConsoleLogs)
Platform events during startup (AppServicePlatformLogs)
App settings: WEBSITES_PORT, WEBSITES_CONTAINER_START_TIME_LIMIT (and confirm app code reads PORT for listen target)
Dockerfile: EXPOSE directive, CMD/ENTRYPOINT
Whether the app starts successfully locally with docker run -p 8000:8000 <image>

Useful Context¶

Recent image changes
Recent app setting changes
Custom startup command configuration
Base image changes
Multi-stage build configuration

CLI Investigation Commands¶

# Check overall app state
az webapp show --resource-group <resource-group> --name <app-name> --query "{state:state,enabled:enabled,defaultHostName:defaultHostName}" --output table

# Verify startup-critical settings
az webapp config appsettings list --resource-group <resource-group> --name <app-name> --query "[?name=='WEBSITES_PORT' || name=='WEBSITES_CONTAINER_START_TIME_LIMIT' || name=='WEBSITE_WARMUP_PATH' || name=='WEBSITE_WARMUP_STATUSES'].{name:name,value:value}" --output table

# Inspect startup command/runtime
az webapp config show --resource-group <resource-group> --name <app-name> --query "{linuxFxVersion:linuxFxVersion,appCommandLine:appCommandLine,http20Enabled:http20Enabled}" --output table

# Stream live logs while reproducing ping failures
az webapp log tail --resource-group <resource-group> --name <app-name>

Example Output:

State    Enabled    DefaultHostName
-------  ---------  -------------------------------------------
Running  True       <app-name>.azurewebsites.net

Name                                   Value
-------------------------------------  ------
WEBSITES_PORT                          8000
WEBSITES_CONTAINER_START_TIME_LIMIT    230
WEBSITE_WARMUP_PATH                    /diag/env
WEBSITE_WARMUP_STATUSES                200

LinuxFxVersion    AppCommandLine                               Http20Enabled
----------------  -------------------------------------------  -------------
PYTHON|3.11       gunicorn --bind 0.0.0.0:8000 src.app:app    True

How to Read This

If settings and bind are aligned but startup pings still fail intermittently, investigate startup duration variance, heavy initialization, and recycle timing rather than port/address first.

Sample Log Patterns¶

AppServiceConsoleLogs (container process did start)¶

[AppServiceConsoleLogs]
2026-04-04T11:14:04Z  Informational  WARNING: Could not find package directory /home/site/wwwroot/__oryx_packages__.
2026-04-04T11:14:08Z  Error          [2026-04-04 11:14:08 +0000] [1891] [INFO] Starting gunicorn 24.1.1
2026-04-04T11:14:08Z  Error          [2026-04-04 11:14:08 +0000] [1891] [INFO] Listening at: http://0.0.0.0:8000 (1891)
2026-04-04T11:14:08Z  Error          [2026-04-04 11:14:08 +0000] [1892] [INFO] Booting worker with pid: 1892

AppServiceHTTPLogs (app became reachable when healthy)¶

[AppServiceHTTPLogs]
2026-04-04T11:21:55Z  GET  /           200  20
2026-04-04T11:23:03Z  GET  /diag/env   200  6
2026-04-04T11:23:03Z  GET  /diag/stats 200  23

AppServicePlatformLogs (lifecycle stop sequence)¶

[AppServicePlatformLogs]
2026-04-04T11:14:31Z  Informational  State: Stopping, Action: StoppingSiteContainers
2026-04-04T11:14:36Z  Informational  Container is terminated. Total time elapsed: 5469 ms.
2026-04-04T11:14:36Z  Informational  Site: <app-name> stopped.

How to Read This

This pattern shows the process can start and respond (200 responses exist), so this incident class is usually intermittent startup reachability/configuration drift, not a permanent app boot failure.

KQL Queries with Example Output¶

Query 1: Confirm the app process bound to an externally reachable address¶

// Look for bind/listen signatures from the app process
AppServiceConsoleLogs
| where TimeGenerated between (datetime(2026-04-04 11:14:00) .. datetime(2026-04-04 11:14:10))
| where ResultDescription has_any ("Starting gunicorn", "Listening at", "Booting worker", "127.0.0.1", "0.0.0.0")
| project TimeGenerated, Level, ResultDescription
| order by TimeGenerated asc

Example Output:

TimeGenerated	Level	ResultDescription
2026-04-04 11:14:08	Error	[2026-04-04 11:14:08 +0000] [1891] [INFO] Starting gunicorn 24.1.1
2026-04-04 11:14:08	Error	[2026-04-04 11:14:08 +0000] [1891] [INFO] Listening at: http://0.0.0.0:8000 (1891)
2026-04-04 11:14:08	Error	[2026-04-04 11:14:08 +0000] [1892] [INFO] Booting worker with pid: 1892

How to Read This

0.0.0.0:8000 means bind-address itself is correct in this sample. If the same query shows 127.0.0.1, prioritize H2 immediately.

Query 2: Validate endpoint responses and latency once startup succeeds¶

// Validate post-start endpoint health and latency
AppServiceHTTPLogs
| where TimeGenerated between (datetime(2026-04-04 11:21:50) .. datetime(2026-04-04 11:23:10))
| project TimeGenerated, CsMethod, CsUriStem, ScStatus, TimeTaken
| order by TimeGenerated asc

Example Output:

TimeGenerated	CsMethod	CsUriStem	ScStatus	TimeTaken
2026-04-04 11:21:55	GET	/	200	20
2026-04-04 11:23:03	GET	/diag/env	200	6
2026-04-04 11:23:03	GET	/diag/stats	200	23

How to Read This

Low-latency 200 responses indicate successful readiness after startup. If incident windows show no rows or repeated 503, compare to startup timeout signatures from platform logs.

Query 3: Platform stop timeline around startup investigation¶

// Correlate stop actions and container termination timing
AppServicePlatformLogs
| where TimeGenerated between (datetime(2026-04-04 11:14:30) .. datetime(2026-04-04 11:14:40))
| where Message has_any ("StoppingSiteContainers", "Container is terminated", "Site:")
| project TimeGenerated, Level, Message
| order by TimeGenerated asc

Example Output:

TimeGenerated	Level	Message
2026-04-04 11:14:31	Informational	State: Stopping, Action: StoppingSiteContainers
2026-04-04 11:14:36	Informational	Container is terminated. Total time elapsed: 5469 ms.
2026-04-04 11:14:36	Informational	Site: stopped.

How to Read This

Use this timeline to separate operator/platform restarts from startup probe failures. A short terminate window usually indicates a controlled stop, not a long startup timeout.

Incident timeline template (copy into case notes)¶

Timestamp (UTC)	Source	Signal	Interpretation
2026-04-04 11:14:08	AppServiceConsoleLogs	`Listening at: http://0.0.0.0:8000`	Process reached listener state
2026-04-04 11:14:31	AppServicePlatformLogs	`StoppingSiteContainers`	Platform/operator initiated stop
2026-04-04 11:14:36	AppServicePlatformLogs	`Container is terminated. Total time elapsed: 5469 ms.`	Controlled stop completed
2026-04-04 11:21:55	AppServiceHTTPLogs	`GET / 200`	App reachable post-restart
2026-04-04 11:23:03	AppServiceHTTPLogs	`GET /diag/env 200`	Diagnostics endpoint healthy

How to Read This

Build the timeline first, then classify by hypothesis. Misclassification usually happens when teams inspect only one data source (for example, just console logs without platform lifecycle context).

Fast Verification Query (Post-fix)¶

// Verify that startup window now returns successful responses
AppServiceHTTPLogs
| where TimeGenerated > ago(15m)
| where CsUriStem in ("/", "/diag/env", "/diag/stats")
| summarize Requests=count(), Errors=countif(ScStatus >= 500), P95=percentile(TimeTaken, 95) by CsUriStem
| order by CsUriStem asc

Example Output:

CsUriStem	Requests	P95
/	42	38
/diag/env	42	16
/diag/stats	42	44

How to Read This

This confirms remediation only when Errors stays at zero during fresh restarts and P95 remains below your startup SLO.

6. Validation and Disproof by Hypothesis¶

H1: Port mismatch¶

Signals that support

AppServiceConsoleLogs show a concrete listen line such as Listening on 8000 while WEBSITES_PORT=8080.
Platform logs repeatedly show startup ping failure on one port while app logs indicate another port.
App works locally only when mapped to a non-8080 container port (for example -p 8080:8000).

Signals that weaken

Console and app settings both consistently show the same port.
Manual request to the expected in-container port succeeds during SSH investigation.

What to verify

In App Service configuration, record WEBSITES_PORT and PORT.
In AppServiceConsoleLogs, identify the actual listen port emitted by the server startup line.
Compare expected vs actual port; if different, align to a single value and restart.
Re-check AppServicePlatformLogs for successful transition after the change.

H2: Binding to localhost¶

Signals that support

Console logs show bind target 127.0.0.1:<port> or localhost:<port>.
Framework defaults are in use (for example Flask dev server) without explicit host override.
Process is running but startup pings still fail, indicating network reachability problem rather than process absence.

Signals that weaken

Logs explicitly show 0.0.0.0:<port> binding.
A known-good production server command (for example Gunicorn bound to 0.0.0.0) is in effect.

What to verify

Inspect startup logs for exact bind address.
Inspect startup command (Dockerfile CMD/ENTRYPOINT or Portal startup command) for host argument.
Ensure production command binds to 0.0.0.0.
Redeploy/restart and confirm startup ping success in platform logs.

H3: Startup time exceeded¶

Signals that support

Console logs show long-running initialization (dependency download, migrations, model load) before server starts listening.
AppServicePlatformLogs show Container started followed by timeout/failing site start near or beyond the default 230 seconds.
Increasing WEBSITES_CONTAINER_START_TIME_LIMIT improves startup success rate.

Signals that weaken

App reaches a listen state quickly (<30 seconds) before the failure occurs.
Failure happens immediately with explicit crash output rather than timeout behavior.

What to verify

Measure elapsed time between first platform start event and failing site start event.
Locate server-ready timestamp in AppServiceConsoleLogs.
If startup duration exceeds current limit, temporarily set WEBSITES_CONTAINER_START_TIME_LIMIT to 300-600 (up to 1800 max).
Confirm whether timeout errors disappear; if yes, optimize startup path and keep a justified limit.

H4: Application crash during startup¶

Signals that support

AppServiceConsoleLogs show traceback/exception, segmentation fault, missing module, or fatal config error.
AppServicePlatformLogs show repeated start/exit cycles.
Exit code appears in logs or detector output and indicates non-zero termination.

Signals that weaken

No crash signatures appear and process remains alive.
Failure pattern matches timeout without process termination.

What to verify

Capture full startup stderr/stdout from AppServiceConsoleLogs.
Correlate each platform restart event with preceding exception traces.
Validate image dependency completeness and runtime configuration required at startup.
Fix startup exception, redeploy, and confirm stable single-start behavior.

H5: Missing or incorrect startup command¶

Signals that support

Console logs are empty, minimal, or show a process that is not an HTTP server.
Startup command in Portal overrides Dockerfile CMD with invalid or no-op command.
CMD/ENTRYPOINT references missing script path or wrong working directory.

Signals that weaken

Command clearly launches a web server binary with correct arguments.
Logs show web server startup sequence and port bind attempt.

What to verify

Inspect Dockerfile CMD and ENTRYPOINT exactly as built into deployed image.
Check Azure Portal startup command override and remove/fix conflicting values.
Validate command path inside container (via SSH/Kudu) and confirm executable exists.
Restart app and confirm logs show expected web server process listening.

Normal vs Abnormal Comparison¶

Signal	Normal	Container Didn't Respond to HTTP Pings
Console bind line	`Listening at: http://0.0.0.0:<port>` appears quickly	Missing bind line, wrong bind (`127.0.0.1`), or very delayed bind
HTTP logs during startup window	Early `200` on warm-up or `/`	No responses before timeout or repeated startup-phase failures
Platform logs	`Site started`/steady run	Timeout/stop cycle or repeated startup retries
TimeTaken pattern	Low to moderate, variable by route	Near timeout limit if probe waits then fails
Operational impact	Availability restored after deploy/restart	App remains unavailable until port/bind/startup issue fixed

Extended Triage Matrix¶

Observation in Incident Window	Most Likely Hypothesis	Next Command / Query	Immediate Action
`Listening at: http://127.0.0.1:<port>`	H2 (localhost bind)	`AppServiceConsoleLogs` bind query	Change bind host to `0.0.0.0` and restart
`WEBSITES_PORT=8080`, logs show `:8000`	H1 (port mismatch)	`az webapp config appsettings list ...`	Align port in app and setting
No listen line before timeout	H3 or H5	Console + platform startup sequence	Increase timeout temporarily and verify command path
Traceback before listener	H4	Console error query	Fix startup exception and redeploy
Healthy bind + healthy settings + intermittent failures	H3 (startup variance)	Latency trend around restarts	Reduce initialization work on startup

Normal vs Abnormal Endpoint Examples¶

Endpoint	Healthy Startup Example	Startup Ping Failure Example
`/`	`200`, TimeTaken 20-150 ms	Missing row or `503` near startup timeout
`/diag/env`	`200`, TimeTaken <20 ms	Missing row or delayed first success
`/diag/stats`	`200`, TimeTaken <50 ms	Missing row during startup/restart loop
Warm-up path (`WEBSITE_WARMUP_PATH`)	Responds with approved status (`200` or `202`)	Returns non-approved status or not served in time

7. Likely Root Cause Patterns¶

Pattern A: Framework defaults to localhost binding (Flask dev server, Django runserver)
Pattern B: Missing WEBSITES_PORT setting when app doesn't listen on 8080
Pattern C: Heavy initialization (ML model loading, database migrations) exceeding startup timeout
Pattern D: Missing dependency in container image (works locally with cached layers, fails on fresh pull)
Pattern E: Startup command references a file path that doesn't exist in the container

8. Immediate Mitigations¶

Set WEBSITES_PORT to match your app's actual listen port (diagnostic, production-safe)
Change app binding from 127.0.0.1 to 0.0.0.0 (diagnostic, production-safe)
Increase WEBSITES_CONTAINER_START_TIME_LIMIT to 300-600 (temporary, production-safe)
SSH into the container via Kudu to verify the process is running and listening (diagnostic)
Check docker run locally with the EXACT same environment variables (diagnostic)

9. Prevention¶

Always bind to 0.0.0.0 in production configurations
Use the PORT environment variable for dynamic port binding, and set WEBSITES_PORT to match
Optimize startup time (lazy-load heavy dependencies, run migrations separately)
Add a lightweight health endpoint that responds immediately even during warm-up
Test with docker run -e PORT=8080 -e WEBSITES_PORT=8080 -p 8080:8080 locally before deploying

Investigation Notes¶

App Service Linux probes the container on the configured port via HTTP. If no response within the startup time limit, the platform stops the site.
The default expected port is 8080 for built-in images. Custom containers should set WEBSITES_PORT explicitly.
WEBSITES_PORT is the app setting you configure; PORT is the environment variable the platform injects into the container. They serve different roles: set WEBSITES_PORT to tell the platform, and read PORT in your code for the listen target.
WEBSITES_CONTAINER_START_TIME_LIMIT default is 230 seconds. Maximum is 1800 seconds.
The startup probe is NOT the same as a configured Health Check. The startup probe is a simple HTTP ping on the root port. Health Check uses a configured path and has different retry logic.
For more advanced warm-up control, WEBSITE_WARMUP_PATH lets you specify a custom path for startup probes, and WEBSITE_WARMUP_STATUSES lets you define which HTTP status codes count as successful warm-up responses (e.g., 200,202).
For Python apps: Gunicorn should use --bind 0.0.0.0:$PORT. Flask dev server defaults to 127.0.0.1 which WILL fail.
For Node.js apps: Express should use app.listen(process.env.PORT || 8080, '0.0.0.0').
SSH into the running container: Azure Portal → App Service → Development Tools → SSH (or use Kudu at https://.scm.azurewebsites.net)

Container Didn't Respond to HTTP Pings (Azure App Service Linux)¶

1. Summary¶

Symptom¶

Why this scenario is confusing¶

Troubleshooting decision flow¶

Limitations¶

Quick Conclusion¶

2. Common Misreadings¶

3. Competing Hypotheses¶

4. What to Check First¶

Metrics¶

Portal view: Web App Down detector (the first signal for HTTP ping failures)¶

Logs¶

Platform Signals¶

Portal view: Log stream (live container stdout/stderr)¶

5. Evidence to Collect¶

Required Evidence¶

Useful Context¶

CLI Investigation Commands¶

Sample Log Patterns¶

AppServiceConsoleLogs (container process did start)¶

AppServiceHTTPLogs (app became reachable when healthy)¶

AppServicePlatformLogs (lifecycle stop sequence)¶

KQL Queries with Example Output¶

Query 1: Confirm the app process bound to an externally reachable address¶

Query 2: Validate endpoint responses and latency once startup succeeds¶

Query 3: Platform stop timeline around startup investigation¶

Incident timeline template (copy into case notes)¶

Fast Verification Query (Post-fix)¶

6. Validation and Disproof by Hypothesis¶

H1: Port mismatch¶

H2: Binding to localhost¶

H3: Startup time exceeded¶

H4: Application crash during startup¶

H5: Missing or incorrect startup command¶

Normal vs Abnormal Comparison¶

Extended Triage Matrix¶

Normal vs Abnormal Endpoint Examples¶

7. Likely Root Cause Patterns¶

8. Immediate Mitigations¶

9. Prevention¶

Investigation Notes¶

See Also¶

Related Queries¶

Related Checklists¶

Related Labs¶

Sources¶