Cold Start and Scale-to-Zero Lab¶
Measure the latency impact of scale-to-zero, then compare it with a configuration that keeps a replica always ready.
Lab Metadata¶
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Estimated Duration | 30-40 minutes |
| Tier | Consumption or Dedicated |
| Failure Mode | First request after idle is slow because the app scaled to zero and must cold start a new replica |
| Skills Practiced | Scale configuration analysis, latency measurement, KQL correlation, revision comparison |
Architecture¶
flowchart TD
A[Client sends request after idle period] --> B{minReplicas setting}
B -->|0| C[Revision scaled to zero]
C --> D[Platform creates new replica]
D --> E[First request sees cold start latency]
B -->|1| F[Replica remains running]
F --> G[First request is served immediately]
E --> H[Measure latency in client output and logs]
G --> H 1) Question¶
Does Azure Container Apps introduce measurable first-request latency after an app scales to zero, and does keeping one always-ready replica remove that latency penalty?
2) Setup¶
This lab uses the existing baseline infrastructure pattern from ./labs/scale-rule-mismatch/infra/main.bicep, then changes scale settings to reproduce cold start behavior.
Run the following commands from the repository root so the relative Bicep template path resolves correctly.
export RESOURCE_GROUP="rg-aca-lab-coldstart"
export LOCATION="koreacentral"
export DEPLOYMENT_NAME="lab-cold-start"
az extension add --name containerapp --upgrade
az login
az group create \
--name "$RESOURCE_GROUP" \
--location "$LOCATION"
az deployment group create \
--name "$DEPLOYMENT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--template-file "./labs/scale-rule-mismatch/infra/main.bicep" \
--parameters baseName="coldstartlab"
Capture outputs for later steps:
export APP_NAME="$(az deployment group show \
--name "$DEPLOYMENT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--query "properties.outputs.containerAppName.value" \
--output tsv)"
export ENVIRONMENT_NAME="$(az deployment group show \
--name "$DEPLOYMENT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--query "properties.outputs.environmentName.value" \
--output tsv)"
export APP_URL="$(az deployment group show \
--name "$DEPLOYMENT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--query "properties.outputs.containerAppUrl.value" \
--output tsv)"
export LOG_ANALYTICS_WORKSPACE_NAME="$(az deployment group show \
--name "$DEPLOYMENT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--query "properties.outputs.logAnalyticsWorkspaceName.value" \
--output tsv)"
Expected starting state: one revision exists and the app responds successfully over HTTPS.
| Command | Why it is used |
|---|---|
az extension add --name containerapp --upgrade | Ensures the Azure Container Apps CLI extension is installed and current. |
az login | Authenticates the Azure CLI session before creating resources. |
az group create --name "$RESOURCE_GROUP" --location "$LOCATION" | Creates an isolated resource group for the lab. |
az deployment group create --name "$DEPLOYMENT_NAME" --resource-group "$RESOURCE_GROUP" --template-file "./labs/scale-rule-mismatch/infra/main.bicep" --parameters baseName="coldstartlab" | Deploys the baseline Azure Container Apps environment, registry, workspace, and app. |
az deployment group show ... --query "properties.outputs..." | Extracts deployment outputs needed for later steps. |
3) Hypothesis¶
IF minReplicas=0, THEN the first request after the app has been idle long enough to scale to zero will show higher latency than steady-state requests; IF minReplicas=1 is configured to keep one replica always ready, THEN the cold start latency spike is eliminated for the same test path.
| Variable | Control State | Experimental State |
|---|---|---|
| Minimum replicas | 1 | 0 |
| Idle behavior | One replica remains available | Revision can scale to zero |
| First request latency after idle | Near warm-request latency | Noticeably higher than warm-request latency |
| Plan support | Consumption or Dedicated | Consumption or Dedicated |
4) Prediction¶
- [Measured] The first request after scale-to-zero will have the highest observed latency in the test run.
- [Observed]
az containerapp replica listwill return zero running replicas before the cold-start request in the experimental state. - [Measured] Warm requests immediately after the first request will be faster than the first request.
- [Observed] After changing to
minReplicas=1, the app will keep one running replica and the latency spike will no longer appear in the same idle-window test.
5) Experiment¶
Configure two states against the same app revision pattern:
- Experimental state: allow scale-to-zero with
minReplicas=0. - Wait for idle scale-in and measure the first request.
- Send several follow-up warm requests and compare latency.
- Control state: update to
minReplicas=1. - Repeat the same idle-and-request sequence.
- Correlate request timing with replica state and log timestamps.
6) Execution¶
Configure the scale-to-zero state¶
az containerapp update \
--name "$APP_NAME" \
--resource-group "$RESOURCE_GROUP" \
--min-replicas 0 \
--max-replicas 3 \
--scale-rule-name "http-rule" \
--scale-rule-type "http" \
--scale-rule-http-concurrency 10
Confirm the active scale settings¶
az containerapp show \
--name "$APP_NAME" \
--resource-group "$RESOURCE_GROUP" \
--query "properties.template.scale" \
--output json
Wait for the revision to scale in¶
while true; do
az containerapp replica list \
--name "$APP_NAME" \
--resource-group "$RESOURCE_GROUP" \
--output table
sleep 15
done
When no replicas remain, stop the loop with Ctrl+C and measure the first request:
curl \
--silent \
--show-error \
--output /dev/null \
--write-out "first_request_after_idle total=%{time_total}s connect=%{time_connect}s starttransfer=%{time_starttransfer}s\n" \
"$APP_URL"
Collect warm-request samples immediately after the cold request:
for attempt in 1 2 3 4 5; do
curl \
--silent \
--show-error \
--output /dev/null \
--write-out "warm_request_${attempt} total=%{time_total}s connect=%{time_connect}s starttransfer=%{time_starttransfer}s\n" \
"$APP_URL"
done
Apply the always-ready control state¶
az containerapp update \
--name "$APP_NAME" \
--resource-group "$RESOURCE_GROUP" \
--min-replicas 1 \
--max-replicas 3 \
--scale-rule-name "http-rule" \
--scale-rule-type "http" \
--scale-rule-http-concurrency 10
Wait at least the same idle interval, then repeat the request measurements:
az containerapp replica list \
--name "$APP_NAME" \
--resource-group "$RESOURCE_GROUP" \
--output table
curl \
--silent \
--show-error \
--output /dev/null \
--write-out "first_request_with_min1 total=%{time_total}s connect=%{time_connect}s starttransfer=%{time_starttransfer}s\n" \
"$APP_URL"
| Command | Why it is used |
|---|---|
az containerapp update --name "$APP_NAME" --resource-group "$RESOURCE_GROUP" --min-replicas 0 ... | Reproduces scale-to-zero behavior for the experimental state. |
az containerapp show --query "properties.template.scale" | Verifies the revision uses the intended minimum replica setting. |
az containerapp replica list --output table in a loop | Confirms whether the revision has fully scaled in before the cold request. |
curl --write-out ... "$APP_URL" | Measures first-request and warm-request latency from the client perspective. |
az containerapp update --name "$APP_NAME" --resource-group "$RESOURCE_GROUP" --min-replicas 1 ... | Switches the app into the always-ready control state. |
7) Observation¶
Record raw platform signals and request output.
Replica state before the first test¶
az containerapp replica list \
--name "$APP_NAME" \
--resource-group "$RESOURCE_GROUP" \
--output table
Expected experimental-state pattern:
Revision and system log signals¶
Expected log pattern:
Interpretation: [Observed] the platform creates a replica only when traffic arrives after idle scale-to-zero.
| Command | Why it is used |
|---|---|
az containerapp replica list --name "$APP_NAME" --resource-group "$RESOURCE_GROUP" --output table | Captures the raw replica state before each measurement. |
az containerapp logs show --name "$APP_NAME" --resource-group "$RESOURCE_GROUP" --type system | Captures scale, revision, and replica lifecycle events around the cold request. |
8) Measurement¶
Capture the latency delta between cold and warm requests.
| Measurement | Experimental State (minReplicas=0) | Control State (minReplicas=1) |
|---|---|---|
| First request total latency | Higher | Lower |
| Warm request total latency | Lower than first request | Similar to first request |
| Running replicas before request | 0 | 1 |
| Cold start gap | Present | Absent or negligible |
Use these KQL queries to quantify the event.
KQL: system events around scale-from-zero¶
let AppName = "my-container-app";
ContainerAppSystemLogs_CL
| where ContainerAppName_s == AppName
| where TimeGenerated > ago(2h)
| where Reason_s has_any ("Replica", "Revision") or Log_s has_any ("scale", "starting", "Started")
| project TimeGenerated, RevisionName_s, ReplicaName_s, Reason_s, Log_s
| order by TimeGenerated asc
KQL: estimate startup duration from console logs¶
let AppName = "my-container-app";
let Startup =
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == AppName
| where TimeGenerated > ago(2h)
| where Log_s has_any ("Starting", "Booting", "Initializing", "Launching")
| summarize arg_min(TimeGenerated, *) by RevisionName_s, ContainerGroupName_s;
let Ready =
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == AppName
| where TimeGenerated > ago(2h)
| where Log_s has_any ("Listening on", "Ready to accept connections", "Application started")
| summarize arg_min(TimeGenerated, *) by RevisionName_s, ContainerGroupName_s;
Startup
| join kind=inner Ready on RevisionName_s, ContainerGroupName_s
| extend startupDurationSeconds = datetime_diff('second', TimeGenerated1, TimeGenerated)
| project RevisionName_s, ContainerGroupName_s, startupAt=TimeGenerated, readyAt=TimeGenerated1, startupDurationSeconds
| order by startupDurationSeconds desc
KQL: compare first-hit versus steady-state request latency in Application Insights (optional)¶
Use this query only if Application Insights is already enabled for the app. The baseline lab deployment in this guide provisions Log Analytics, not Application Insights.
requests
| where timestamp > ago(2h)
| where cloud_RoleName == "my-container-app"
| extend UrlPath = tostring(parse_url(url).Path)
| summarize RequestCount=count(), P50=percentile(duration, 50), P95=percentile(duration, 95), P99=percentile(duration, 99) by UrlPath, bin(timestamp, 5m)
| order by timestamp asc
9) Analysis¶
- [Correlated] Higher first-request latency is meaningful only when it lines up with zero replicas before the request and replica-start events immediately afterward.
- [Measured] If warm requests are consistently fast while only the first request is slow, the problem is cold start latency rather than sustained performance degradation.
- [Inferred] When
minReplicas=1removes the spike without changing code or image, scale-to-zero behavior is the dominant variable. - [Strongly Suggested] If latency remains high even with
minReplicas=1, then startup code path, external dependency initialization, or image pull time likely contributes more than scale-to-zero alone.
10) Conclusion¶
The hypothesis is confirmed when the scale-to-zero state shows a slower first request after idle and the always-ready state does not. In that outcome, the latency penalty is tied to creating a new replica on demand rather than to ordinary request handling.
11) Falsification¶
The hypothesis is falsified if either of the following occurs:
- The first request after idle is not materially slower when
minReplicas=0. - The first request remains equally slow after changing to
minReplicas=1and verifying one replica stays running.
That result would mean the latency issue is not primarily caused by scale-to-zero and should shift investigation toward application startup, dependency warm-up, or network path delays.
12) Evidence¶
| Evidence Source | Expected State |
|---|---|
az containerapp show --query "properties.template.scale" | Shows minReplicas as 0 in the experimental phase and 1 in the control phase |
az containerapp replica list --output table before cold request | No running replicas in the experimental phase |
curl --write-out ... "$APP_URL" first request after idle | Highest total latency sample in the experimental phase |
Follow-up curl requests | Lower warm-request latency than the first request |
ContainerAppSystemLogs_CL KQL query | Replica creation or start events align with the first cold request |
ContainerAppConsoleLogs_CL KQL query | Startup-to-ready interval is visible for cold-started replicas |
13) Solution¶
Use one of these mitigations based on the latency objective:
- Set
minReplicas=1when user-facing latency is more important than maximum scale-to-zero cost savings. - Keep startup paths lightweight so new replicas become ready faster.
- Pre-initialize expensive dependencies during deployment validation instead of on the first live request.
- Measure changes with the same idle-window test before and after each tuning step.
14) Prevention¶
- Document whether the app is allowed to scale to zero as a deliberate design choice.
- Add a latency SLO check for first request after idle, not only warm steady-state traffic.
- Validate replica minimums during release reviews for user-facing HTTP apps.
- Keep the image size, dependency graph, and startup initialization path small enough to reduce cold start time.
- For environments where user-facing latency must stay predictable, prefer an always-ready configuration over pure cost optimization.
15) Takeaway¶
Scale-to-zero is working as designed when the first request after idle is slower. The operational question is not whether cold start exists, but whether the measured latency is acceptable for the workload.
16) Support Takeaway¶
When a customer reports “the first request is slow but everything after that is fast,” immediately compare minReplicas, replica count before the request, and startup timing in logs. If the app is allowed to scale to zero, reproduce the idle-window test before investigating deeper application issues.
Clean Up¶
| Command | Why it is used |
|---|---|
az group delete --name "$RESOURCE_GROUP" --yes --no-wait | Removes all lab resources after evidence collection is complete. |