Skip to content

Probe Tuning

Probe tuning is about reducing false positives without hiding real failures.

Prerequisites

  • Access to system logs, console logs, and revision state
  • A known-good health endpoint or TCP listener
  • A rollback plan for any production probe change
export RG="rg-aca-prod"
export APP_NAME="app-python-api-prod"

When to Use

  • When probes fail during rollout or warm-up
  • When restart loops suggest liveness misconfiguration
  • When downstream dependencies make readiness checks too strict

Procedure

  1. Check system logs for probe failures and restart reasons.
  2. Confirm the endpoint, port, and scheme match the real application listener.
  3. Adjust one timing parameter at a time.
  4. Deploy a new revision and compare restart behavior.

Recommended conservative starting posture for many web apps:

  • keep startup more tolerant than liveness
  • keep readiness focused on traffic readiness, not every downstream dependency
  • keep liveness narrow enough to catch hangs, not slow warm-up

Microsoft Learn recommends tuning probe settings around real startup behavior rather than treating one timing profile as universal. If a revision needs longer to become healthy, adjust probe settings to match the app's startup characteristics, and in multiple revision mode wait for readiness probes to succeed before shifting traffic.

flowchart TD
    A[Probe failures observed] --> B[Inspect system logs]
    B --> C[Check endpoint and startup timing]
    C --> D[Adjust probe thresholds]
    D --> E[Deploy new revision]
    E --> F[Compare readiness and restart behavior]

Verification

  • Confirm restart counts decrease after tuning.
  • Confirm the app becomes ready within the expected activation window.
  • Confirm liveness still detects genuine stuck or failed processes.

Rollback / Troubleshooting

  • If tuning makes detection too slow, roll back to the previous revision.
  • If readiness depends on unavailable downstream services, separate dependency checks from basic process readiness.
  • If failures persist, inspect image startup, secret loading, and dependency latency before widening thresholds further.

See Also

Sources