Cold Start and Dependency Initialization¶
Status: Draft - Awaiting Execution
This experiment design is complete, but the measurements in Results are SIMULATED and based on documented Azure Functions behavior plus reasonable engineering assumptions. No live customer or lab measurements are claimed on this page.
Execution Blocked
Attempted execution on 2026-04-07 was blocked by Azure subscription policy.
Root Cause: Organization policy enforces Microsoft.Storage/storageAccounts/allowSharedKeyAccess = false on all storage accounts. This policy cannot be overridden at the resource level.
Error: "Shared key access is not permitted because storage SAS and Account Key are disabled by storage policy."
Attempts Made:
- Created new storage account with
--allow-shared-key-access true→ Policy overrode setting tofalse - Tried updating existing storage account → Same result
- Attempted Function App creation with managed identity → Failed (storage file share creation requires shared key access)
Resolution Required: Either a policy exemption or access to a subscription without this policy restriction. Flex Consumption with managed identity for storage may work in future Azure CLI versions.
1. Question¶
What is the relative contribution of host startup, package restoration, framework initialization, and application code execution to total cold start duration on Azure Functions for Python 3.11 and Node.js 20 on Consumption and Flex Consumption?
2. Why this matters¶
Customers often report that the first request after idle is slow, but the remediation depends on which startup phase dominates. If most delay comes from package restore or dependency loading, code-path optimization alone will not materially improve cold start. If the dominant phase is application initialization, support should focus on import graphs, lazy loading, and startup side effects instead of platform scaling behavior.
3. Customer symptom¶
- "The first HTTP request after a few minutes of inactivity takes 6-15 seconds."
- "Warm requests are fast, but cold requests are inconsistent across deployments."
- "Python gets much slower when we add packages, even though the function body is tiny."
4. Hypothesis¶
Cold start on Azure Functions is additive across four phases:
- Host startup contributes a mostly fixed platform cost.
- Package restore / package acquisition contributes the largest variable cost when deployment artifacts or dependency trees are large.
- Framework initialization adds a smaller language/runtime cost.
- Application code initialization scales with import side effects, global object construction, and startup work.
Expected outcome:
- Minimal dependency / fast init apps are dominated by host startup.
- Heavy dependency apps are dominated by package restore or dependency load.
- Slow init apps are dominated by application code even when the function body itself is trivial.
- Flex Consumption should show lower and more stable cold start than classic Consumption because Microsoft documents reduced cold starts and optional always-ready instances for Flex Consumption.
5. Environment¶
| Parameter | Value |
|---|---|
| Service | Azure Functions |
| Plans | Consumption, Flex Consumption |
| Region | koreacentral |
| Runtime | Python 3.11, Node.js 20 |
| Trigger | HTTP trigger |
| OS | Linux |
| Date designed | 2026-04-02 |
| Always ready | 0 for baseline comparison |
| Deployment shape | Same function logic, varying dependency and init profile |
6. Variables¶
Controlled
- Plan type: Consumption vs. Flex Consumption
- Runtime: Python 3.11 vs. Node.js 20
- Dependency count: minimal (~2), moderate (~10), heavy (30+)
- Init complexity: fast (~100 ms) vs. slow (~2 s)
- Region: koreacentral
- Trigger type: HTTP
- Test cadence: idle window long enough to force scale-to-zero behavior before next request
Observed
- Total cold start duration to first successful HTTP response
- Time from first platform trace to worker ready
- Time spent restoring/acquiring package content
- Time spent in language/framework import or bootstrap
- Time spent in application-level initialization before first handler execution
- Variance across repeated cold starts per configuration
7. Instrumentation¶
Telemetry sources¶
- Application Insights requests
- Application Insights traces
- Azure Functions runtime logs in Log Analytics / Application Insights
- Custom trace markers emitted from app startup and first invocation
Trace markers to add¶
Use consistent custom dimensions on all startup traces:
| Marker | Meaning | Example customDimensions |
|---|---|---|
coldstart.test.begin |
First line emitted when worker process begins app bootstrap | phase=app-bootstrap |
coldstart.imports.begin |
Before heavy imports / module loads | phase=framework-init |
coldstart.imports.end |
After imports complete | phase=framework-init |
coldstart.appinit.begin |
Before app-level singleton construction | phase=app-init |
coldstart.appinit.end |
After app-level initialization | phase=app-init |
coldstart.handler.begin |
First request enters function handler | phase=handler |
coldstart.handler.end |
First request completed | phase=handler |
Derived phase model¶
- Host startup = first platform startup trace -> first worker/user trace
- Package restore = deployment package acquisition / package mount / dependency acquisition window inferred from platform traces before user code becomes visible
- Framework init =
coldstart.imports.begin->coldstart.imports.end - App code init =
coldstart.appinit.begin->coldstart.appinit.end - Handler execution =
coldstart.handler.begin->coldstart.handler.end
KQL queries for analysis¶
Query 1: Find candidate cold requests¶
requests
| where cloud_RoleName =~ "<function-app-name>"
| where name startswith "GET /api/coldstart"
| order by timestamp asc
| serialize
| extend previousRequest = prev(timestamp)
| extend idleGapMinutes = datetime_diff('minute', timestamp, previousRequest)
| extend isColdCandidate = iff(isnull(previousRequest) or idleGapMinutes >= 10, true, false)
| project timestamp, operation_Id, duration, resultCode, success, idleGapMinutes, isColdCandidate
Query 2: Reconstruct startup markers for one cold invocation¶
traces
| where cloud_RoleName =~ "<function-app-name>"
| where operation_Id == "<operation-id>"
| where message startswith "coldstart." or message has "Host started" or message has "Worker process started"
| project timestamp, message, severityLevel, customDimensions
| order by timestamp asc
Query 3: Estimate phase durations from custom markers¶
let targetOperation = "<operation-id>";
let markerTimes = traces
| where cloud_RoleName =~ "<function-app-name>"
| where operation_Id == targetOperation
| summarize
importsBegin = minif(timestamp, message == "coldstart.imports.begin"),
importsEnd = minif(timestamp, message == "coldstart.imports.end"),
appInitBegin = minif(timestamp, message == "coldstart.appinit.begin"),
appInitEnd = minif(timestamp, message == "coldstart.appinit.end"),
handlerBegin = minif(timestamp, message == "coldstart.handler.begin"),
handlerEnd = minif(timestamp, message == "coldstart.handler.end");
markerTimes
| extend frameworkInitMs = datetime_diff('millisecond', importsEnd, importsBegin)
| extend appInitMs = datetime_diff('millisecond', appInitEnd, appInitBegin)
| extend handlerMs = datetime_diff('millisecond', handlerEnd, handlerBegin)
Query 4: Compare cold starts by profile¶
requests
| where name startswith "GET /api/coldstart"
| extend dependencyProfile = tostring(customDimensions["dependencyProfile"])
| extend initProfile = tostring(customDimensions["initProfile"])
| extend planType = tostring(customDimensions["planType"])
| extend runtime = tostring(customDimensions["runtime"])
| summarize
coldCount = count(),
avgMs = avg(duration),
p50Ms = percentile(duration, 50),
p95Ms = percentile(duration, 95),
maxMs = max(duration)
by planType, runtime, dependencyProfile, initProfile
| order by runtime asc, planType asc, dependencyProfile asc, initProfile asc
8. Procedure¶
- Create four Function Apps per runtime/plan combination so that each app maps to one startup profile family.
- Keep the HTTP handler body constant and lightweight so startup dominates total latency.
- Build six test profiles per runtime:
- minimal dependencies + fast init
- minimal dependencies + slow init
- moderate dependencies + fast init
- moderate dependencies + slow init
- heavy dependencies + fast init
- heavy dependencies + slow init
- Configure Application Insights and ensure sampling does not drop request or trace telemetry for the test window.
- Add the custom trace markers listed in Instrumentation.
- Deploy all apps to
koreacentralon the same day with the same Functions runtime major version. - For Flex Consumption baseline, set always-ready instance count to
0so the test still measures cold start. - After deployment, send one warm-up request only to validate health, then wait long enough for the app to scale to zero.
- Trigger a single HTTP request to
/api/coldstartand capture the resultingoperation_Id. - Repeat the idle -> single request cycle at least 10 times per profile.
- Run the KQL queries to reconstruct phase durations per cold request.
- Aggregate median and p95 durations by runtime, plan, dependency profile, and init profile.
- Separately repeat one Flex Consumption profile with always-ready enabled to validate whether startup phases collapse as expected.
Managed identity workaround for storage policy
The original setup path for this experiment assumed shared-key based storage access. In this environment, org policy enforces Microsoft.Storage/storageAccounts/allowSharedKeyAccess = false, so shared-key access is blocked.
Workaround: configure the Function App to use managed identity for storage access.
# Assign Storage Blob Data Contributor to the Function App's managed identity
az role assignment create \
--assignee $FUNCTION_APP_PRINCIPAL_ID \
--role "Storage Blob Data Contributor" \
--scope /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RG/providers/Microsoft.Storage/storageAccounts/$STORAGE_NAME
# Update function app to use identity-based connection
az functionapp config appsettings set \
--resource-group $RG \
--name $APP_NAME \
--settings "AzureWebJobsStorage__accountName=$STORAGE_NAME"
9. Expected signal¶
If the hypothesis is correct, the data should show:
- Host startup clustering in the 500 ms to 2 s range across most profiles.
- Package-related time near 0-1 s for minimal dependencies but several seconds for heavy dependency sets.
- Framework initialization usually below 1 s, with Python slightly more sensitive to import-heavy packages.
- App code initialization increasing by roughly the injected startup delay for the slow-init profile.
- Flex Consumption cold starts lower than Consumption for the same profile, especially when package/setup cost is not dominant.
sequenceDiagram
participant Client
participant FrontEnd as Azure Functions Front End
participant Host as Functions Host
participant Worker as Language Worker
participant App as User App Startup
participant Handler as Function Handler
Client->>FrontEnd: First HTTP request after idle
FrontEnd->>Host: Allocate instance / start host
Host->>Host: Host startup
Host->>Worker: Start language worker
Worker->>Worker: Package acquisition / dependency load
Worker->>App: Framework import / bootstrap
App->>App: Application initialization
App->>Handler: Invoke first request
Handler-->>Client: First HTTP response
10. Results¶
SIMULATED RESULTS
The tables in this section are realistic placeholders for planning and support-readiness purposes. They are not lab measurements.
Simulated median cold-start breakdown by profile¶
| Runtime | Plan | Dependencies | Init profile | Host startup (ms) | Package restore (ms) | Framework init (ms) | App init (ms) | First handler (ms) | Total (ms) |
|---|---|---|---|---|---|---|---|---|---|
| Node.js 20 | Consumption | Minimal (2) | Fast | 900 | 150 | 180 | 110 | 70 | 1410 |
| Node.js 20 | Consumption | Minimal (2) | Slow | 920 | 170 | 190 | 2080 | 75 | 3435 |
| Node.js 20 | Consumption | Moderate (10) | Fast | 980 | 1300 | 260 | 120 | 75 | 2735 |
| Node.js 20 | Consumption | Moderate (10) | Slow | 1000 | 1350 | 270 | 2090 | 80 | 4790 |
| Node.js 20 | Consumption | Heavy (30+) | Fast | 1100 | 4200 | 420 | 130 | 85 | 5935 |
| Node.js 20 | Consumption | Heavy (30+) | Slow | 1120 | 4350 | 430 | 2110 | 90 | 8100 |
| Node.js 20 | Flex Consumption | Minimal (2) | Fast | 620 | 110 | 160 | 105 | 65 | 1060 |
| Node.js 20 | Flex Consumption | Minimal (2) | Slow | 650 | 120 | 170 | 2070 | 70 | 3080 |
| Node.js 20 | Flex Consumption | Moderate (10) | Fast | 700 | 900 | 220 | 115 | 70 | 2005 |
| Node.js 20 | Flex Consumption | Moderate (10) | Slow | 730 | 950 | 220 | 2085 | 75 | 4060 |
| Node.js 20 | Flex Consumption | Heavy (30+) | Fast | 820 | 2900 | 360 | 125 | 80 | 4285 |
| Node.js 20 | Flex Consumption | Heavy (30+) | Slow | 850 | 3050 | 370 | 2100 | 85 | 6455 |
| Python 3.11 | Consumption | Minimal (2) | Fast | 980 | 180 | 320 | 120 | 75 | 1675 |
| Python 3.11 | Consumption | Minimal (2) | Slow | 1000 | 190 | 330 | 2090 | 80 | 3690 |
| Python 3.11 | Consumption | Moderate (10) | Fast | 1080 | 1700 | 520 | 130 | 85 | 3515 |
| Python 3.11 | Consumption | Moderate (10) | Slow | 1100 | 1750 | 530 | 2100 | 90 | 5570 |
| Python 3.11 | Consumption | Heavy (30+) | Fast | 1250 | 6100 | 900 | 150 | 95 | 8495 |
| Python 3.11 | Consumption | Heavy (30+) | Slow | 1280 | 6400 | 920 | 2120 | 100 | 10820 |
| Python 3.11 | Flex Consumption | Minimal (2) | Fast | 700 | 130 | 280 | 115 | 70 | 1295 |
| Python 3.11 | Flex Consumption | Minimal (2) | Slow | 720 | 140 | 290 | 2080 | 75 | 3305 |
| Python 3.11 | Flex Consumption | Moderate (10) | Fast | 790 | 1200 | 430 | 125 | 80 | 2625 |
| Python 3.11 | Flex Consumption | Moderate (10) | Slow | 810 | 1250 | 440 | 2095 | 85 | 4680 |
| Python 3.11 | Flex Consumption | Heavy (30+) | Fast | 900 | 3900 | 700 | 145 | 90 | 5735 |
| Python 3.11 | Flex Consumption | Heavy (30+) | Slow | 930 | 4100 | 720 | 2120 | 95 | 7965 |
Simulated p95 total cold-start duration¶
| Runtime | Plan | Minimal / Fast | Moderate / Fast | Heavy / Fast | Heavy / Slow |
|---|---|---|---|---|---|
| Node.js 20 | Consumption | 2200 ms | 4100 ms | 8600 ms | 11000 ms |
| Node.js 20 | Flex Consumption | 1700 ms | 3100 ms | 6100 ms | 8400 ms |
| Python 3.11 | Consumption | 2600 ms | 5200 ms | 11800 ms | 14500 ms |
| Python 3.11 | Flex Consumption | 1900 ms | 3900 ms | 8100 ms | 10300 ms |
Simulated relative contribution for selected profiles¶
| Profile | Host % | Package % | Framework % | App init % | Handler % | Dominant phase |
|---|---|---|---|---|---|---|
| Node.js Consumption, Minimal / Fast | 63.8 | 10.6 | 12.8 | 7.8 | 5.0 | Host startup |
| Node.js Consumption, Heavy / Fast | 18.5 | 70.8 | 7.1 | 2.2 | 1.4 | Package restore |
| Python Consumption, Heavy / Fast | 14.7 | 71.8 | 10.6 | 1.8 | 1.1 | Package restore |
| Python Flex, Minimal / Slow | 21.8 | 4.2 | 8.8 | 62.9 | 2.3 | App init |
11. Interpretation¶
- Measured (simulated design target): The largest swing factor is package/dependency weight, not first-request handler time.
- Correlated: Higher dependency count correlates with larger cold-start totals in both runtimes.
- Correlated: Slow application initialization adds almost linearly to total cold start, regardless of hosting plan.
- Inferred: Python is likely to show more sensitivity than Node.js when dependency graphs are import-heavy.
- Inferred: Flex Consumption reduces the platform baseline, but it does not eliminate startup work performed by packages or user code.
Support interpretation should therefore separate three different questions:
- Is the delay mostly platform baseline?
- Is the delay mostly artifact/dependency size?
- Is the delay mostly user-controlled startup logic?
Without this separation, teams may incorrectly escalate a dependency-heavy app as a platform cold-start regression.
12. What this proves¶
If live execution matches the simulated pattern, this experiment would support the following conclusions:
- Cold start is not a single opaque delay; it can be decomposed into platform, dependency, framework, and app-init phases.
- Heavy dependency sets can dominate total cold start more than the function handler itself.
- Artificially slow startup code is clearly distinguishable from platform host startup when trace markers are added.
- Flex Consumption improves baseline cold-start behavior, but customer code and dependency footprint still materially matter.
13. What this does NOT prove¶
- It does not prove exact cold-start numbers for all regions, runtimes, or trigger types.
- It does not prove that every long cold start is a platform issue.
- It does not prove package restoration is always remote download time; some of that window may be package mount, extraction, import, or filesystem access.
- It does not prove behavior for Premium, Dedicated, Durable Functions, or non-HTTP triggers.
- It does not replace customer-specific telemetry from an actual production incident.
14. Support takeaway¶
When a customer reports cold start:
- Ask which hosting plan they use and whether Flex always-ready is enabled.
- Ask whether the app recently added large dependencies or heavy startup logic.
- Compare first-request duration with warm-request duration.
- Collect request + trace correlation for one known cold invocation.
- Identify whether delay is before user code, during imports, or inside app initialization.
Decision heuristic:
- Delay before user traces appear -> investigate host/platform/package acquisition.
- Delay between import markers -> investigate dependency tree and framework startup.
- Delay between app-init markers -> investigate customer startup logic and global object creation.
- Delay only inside handler -> cold start might be secondary; investigate downstream dependency latency.
15. Reproduction notes¶
- Cold-start timing is sensitive to idle duration; use a conservative idle gap such as 10-15 minutes.
- Repeat enough times to avoid overfitting to one noisy platform allocation event.
- Keep deployment artifacts stable during the test window; redeployment can change package acquisition characteristics.
- For Flex Consumption, document whether always-ready is
0or nonzero, because that changes interpretation. - Disable or reduce telemetry sampling for the experiment window so startup traces are not lost.
- If host startup exceeds 30 seconds on Flex Consumption, Microsoft documents startup timeout considerations for app initialization; treat such runs as a separate failure mode, not just a slow cold start.