Cold Start¶
This guide explains cold start behavior in Azure Functions and practical mitigation techniques per hosting plan. Cold start is startup latency after an app has no warm instance available.
Platform Guide
For scaling architecture and plan comparison, see Scaling.
Language Guide
For Python deployment specifics, see the Python Tutorial.
Prerequisites¶
Before tuning cold start in production, prepare these inputs and permissions: - Azure CLI 2.60.0 or later with functionapp, monitor, and resource commands. - A Function App running in Consumption, Flex Consumption, Premium, or Dedicated. - Application Insights enabled for the Function App. - Contributor or higher RBAC on the Function App and plan resources. - A baseline latency objective (for example, p95 < 1.5s for HTTP endpoints). Use long-form CLI flags and keep sensitive values masked in runbooks.
When to Use¶
Invest in cold start mitigation when at least one condition is true: - Interactive HTTP APIs show first-hit latency that violates your SLO. - Burst traffic patterns trigger repeated scale-out and startup penalties. - Background trigger latency affects queue drain targets or downstream SLAs. - Deployment slots swap correctly, but first requests still spike. - Cost or platform constraints prevent immediate migration to a larger plan tier.
Procedure¶
1) Understand startup phases¶
What causes cold start¶
Cold start usually combines several phases: 1. Worker instance allocation. 2. Functions host startup. 3. Language worker startup. 4. Application initialization (dependency loading, configuration reads, SDK clients). Non-HTTP triggers can also experience startup latency when scale controller activates new workers.
Cold start phase timeline¶
sequenceDiagram
autonumber
participant C as Client/Trigger
participant S as Scale Controller
participant W as Worker Allocation
participant H as Host Startup
participant L as Language Worker
participant A as App Init
C->>S: New request/event
S->>W: Allocate worker instance
W->>H: Start Functions host
H->>L: Start language runtime
L->>A: Load dependencies and config
A-->>C: First request response 2) Select plan-specific mitigation¶
Plan-specific mitigation options¶
| Hosting plan | Mitigation controls |
|---|---|
| Consumption | Optimize startup path, reduce package size, avoid heavy initialization |
| Flex Consumption | Configure always-ready instances for selected functions or app behavior |
| Premium | Use always-ready and pre-warmed instances |
| Dedicated (App Service Plan) | Enable Always On to keep workers active |
Consumption¶
Use Consumption when cost efficiency is primary and latency tolerance is moderate. Apply startup optimization first:
az functionapp config appsettings set \
--resource-group <resource-group> \
--name <function-app-name> \
--settings WEBSITE_RUN_FROM_PACKAGE=1
Flex Consumption (FC1)¶
Flex Consumption supports always-ready instances to reduce cold-start frequency. Use this when low-latency response is required without moving to Premium. Check always-ready settings:
az functionapp scale config show \
--resource-group <resource-group> \
--name <function-app-name> \
--query "alwaysReady" \
--output json
az functionapp scale config always-ready set \
--resource-group <resource-group> \
--name <function-app-name> \
--settings http=2
Premium (EP)¶
Premium supports always-ready instances and pre-warmed capacity for scale-out events. This is the strongest built-in mitigation for latency-sensitive serverless workloads. Premium always-ready and pre-warmed settings are configured in the Azure portal or with ARM/Bicep templates (CLI support varies by API version):
# Premium always-ready/pre-warmed configuration:
# Use Azure portal (Function App > Scale out)
# or ARM/Bicep properties for the plan.
az resource show \
--resource-group <resource-group> \
--name <function-app-name>/web \
--resource-type Microsoft.Web/sites/config \
--api-version 2023-12-01 \
--query "properties.{preWarmedInstanceCount:preWarmedInstanceCount,minimumElasticInstanceCount:minimumElasticInstanceCount}" \
--output json
az resource update \
--resource-group <resource-group> \
--name <function-app-name>/web \
--resource-type Microsoft.Web/sites/config \
--api-version 2023-12-01 \
--set properties.minimumElasticInstanceCount=2 \
--set properties.preWarmedInstanceCount=1
Dedicated plan¶
Enable Always On so the app remains loaded:
az functionapp config set \
--resource-group <resource-group> \
--name <function-app-name> \
--always-on true
az functionapp config show \
--resource-group <resource-group> \
--name <function-app-name> \
--query "alwaysOn" \
--output tsv
3) Optimize startup path¶
Startup optimization techniques¶
Apply these regardless of hosting plan: - Keep deployment artifact small. - Remove unused dependencies. - Delay expensive initialization until first use where safe. - Reuse SDK clients and HTTP connections. - Avoid synchronous network calls in module initialization.
Dependency and package optimization¶
Operationally, dependency volume and native package loading often dominate startup time. Recommended practices: - Audit dependencies quarterly and remove unused packages. - Prefer lighter libraries for common operations. - Use run-from-package deployment for immutable and consistent startup files. - Keep extension bundle and worker runtime versions current and supported.
Warmup and traffic shaping¶
For plans without strong always-ready guarantees, use conservative warmup patterns: - Scheduled health ping to keep app active where appropriate. - Gradual traffic ramp during deployments. - Pre-deployment synthetic checks before traffic cutover.
Cost tradeoff
Warmup strategies can increase baseline cost. Tune against your latency SLO and budget.
4) Apply decision flow¶
Operational decision flow¶
- If latency SLO is moderate, optimize startup path first.
- If p95 remains unstable, enable plan-native warm capacity.
- If strict low-latency is mandatory, use Premium or Dedicated with warm strategy.
flowchart TD
A[Start: latency issue identified] --> B{Is p95 within SLO?}
B -->|Yes| C[Keep current setup and monitor weekly]
B -->|No| D{Startup path optimized?}
D -->|No| E["Reduce package/init costs"]
E --> F[Observe 24-72 hours]
F --> G{Improved?}
G -->|Yes| C
G -->|No| H{Hosting plan}
D -->|Yes| H
H -->|Consumption| I[Move to Flex always-ready or Premium]
H -->|Flex| J[Increase always-ready instances]
H -->|Premium| K[Tune minimum elastic + pre-warmed]
H -->|Dedicated| L[Enable Always On]
I --> M[Re-measure p95 and p99]
J --> M
K --> M
L --> M
M --> N{SLO met at acceptable cost?}
N -->|Yes| O[Adopt baseline]
N -->|No| P[Escalate architecture review] 5) Compare expected outcomes¶
The following cold-start duration ranges are operational estimates from field experience and are not official Microsoft benchmark values.
| Hosting plan | Optimization level | Typical cold start duration (HTTP) | Expected operational outcome |
|---|---|---|---|
| Consumption | None | 2s-15s | Lowest cost, highest startup variability |
| Consumption | Startup optimized | 1.5s-8s | Better median latency, cold starts still frequent |
| Flex Consumption | Startup optimized + always-ready | 0.5s-3s | Lower cold-start frequency with moderate baseline cost |
| Premium | Always-ready + pre-warmed tuned | 0.2s-1.5s | Strong latency consistency under burst and scale-out |
| Dedicated | Always On enabled | 0.2s-1.2s | Near-warm behavior for most traffic patterns |
Verification¶
Measure latency trend¶
Track cold-start symptoms with KQL and metrics:
requests
| where timestamp > ago(24h)
| summarize p95_duration=percentile(duration, 95), p99_duration=percentile(duration, 99) by bin(timestamp, 5m)
| render timechart
Measure inferred cold start duration¶
Use this query to estimate cold-start duration when request gaps indicate worker inactivity:
let inactivityThresholdMinutes = 10;
requests
| where timestamp > ago(24h)
| where success == true
| project timestamp, operation_Name, cloud_RoleInstance, duration
| order by cloud_RoleInstance asc, timestamp asc
| serialize
| extend prevTs = prev(timestamp), prevInstance = prev(cloud_RoleInstance)
| extend gapMinutes = iif(cloud_RoleInstance == prevInstance, datetime_diff('minute', timestamp, prevTs), 0)
| where gapMinutes >= inactivityThresholdMinutes
| summarize coldStartCount=count(), p50ColdStart=percentile(duration, 50), p95ColdStart=percentile(duration, 95), maxColdStart=max(duration) by operation_Name
| order by p95ColdStart desc
View startup metrics from Azure Monitor¶
az monitor metrics list \
--resource "/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Web/sites/<function-app-name>" \
--metric "AverageResponseTime" "Requests" "FunctionExecutionUnits" \
--interval PT5M \
--aggregation Average \
--start-time 2026-01-15T00:00:00Z \
--end-time 2026-01-15T02:00:00Z \
--output table
Name TimeGrain Average
---------------------- ----------- -------
AverageResponseTime 00:05:00 0.31
Requests 00:05:00 52
FunctionExecutionUnits 00:05:00 1.47
Rollback / Troubleshooting¶
If mitigation does not improve latency, run this sequence: 1. Verify settings are persisted on the target resource. 2. Re-check app initialization for blocking network calls and heavy package load. 3. Roll back warm-capacity changes if cost increased without measurable benefit. 4. Compare before/after KQL evidence across a 24-hour window.
Common checks and rollback commands: - Dedicated Always On not effective: confirm plan type and rollback if required.
az functionapp config set \
--resource-group <resource-group> \
--name <function-app-name> \
--always-on false
az resource update \
--resource-group <resource-group> \
--name <function-app-name> \
--resource-type Microsoft.Web/sites \
--api-version 2023-12-01 \
--set properties.functionAppConfig.scaleAndConcurrency.alwaysReady.instanceCount=0
az resource update \
--resource-group <resource-group> \
--name <function-app-name>/web \
--resource-type Microsoft.Web/sites/config \
--api-version 2023-12-01 \
--set properties.minimumElasticInstanceCount=1 \
--set properties.preWarmedInstanceCount=0