Cold Start and Dependency Initialization¶

Status: Draft - Awaiting Execution

This experiment design is complete, but the measurements in Results are SIMULATED and based on documented Azure Functions behavior plus reasonable engineering assumptions. No live customer or lab measurements are claimed on this page.

Execution Blocked

Attempted execution on 2026-04-07 was blocked by Azure subscription policy.

Root Cause: Organization policy enforces Microsoft.Storage/storageAccounts/allowSharedKeyAccess = false on all storage accounts. This policy cannot be overridden at the resource level.

Error: "Shared key access is not permitted because storage SAS and Account Key are disabled by storage policy."

Attempts Made:

Created new storage account with --allow-shared-key-access true → Policy overrode setting to false
Tried updating existing storage account → Same result
Attempted Function App creation with managed identity → Failed (storage file share creation requires shared key access)

Resolution Required: Either a policy exemption or access to a subscription without this policy restriction. Flex Consumption with managed identity for storage may work in future Azure CLI versions.

1. Question¶

What is the relative contribution of host startup, package restoration, framework initialization, and application code execution to total cold start duration on Azure Functions for Python 3.11 and Node.js 20 on Consumption and Flex Consumption?

2. Why this matters¶

Customers often report that the first request after idle is slow, but the remediation depends on which startup phase dominates. If most delay comes from package restore or dependency loading, code-path optimization alone will not materially improve cold start. If the dominant phase is application initialization, support should focus on import graphs, lazy loading, and startup side effects instead of platform scaling behavior.

3. Customer symptom¶

"The first HTTP request after a few minutes of inactivity takes 6-15 seconds."
"Warm requests are fast, but cold requests are inconsistent across deployments."
"Python gets much slower when we add packages, even though the function body is tiny."

4. Hypothesis¶

Cold start on Azure Functions is additive across four phases:

Host startup contributes a mostly fixed platform cost.
Package restore / package acquisition contributes the largest variable cost when deployment artifacts or dependency trees are large.
Framework initialization adds a smaller language/runtime cost.
Application code initialization scales with import side effects, global object construction, and startup work.

Expected outcome:

Minimal dependency / fast init apps are dominated by host startup.
Heavy dependency apps are dominated by package restore or dependency load.
Slow init apps are dominated by application code even when the function body itself is trivial.
Flex Consumption should show lower and more stable cold start than classic Consumption because Microsoft documents reduced cold starts and optional always-ready instances for Flex Consumption.

5. Environment¶

Parameter	Value
Service	Azure Functions
Plans	Consumption, Flex Consumption
Region	koreacentral
Runtime	Python 3.11, Node.js 20
Trigger	HTTP trigger
OS	Linux
Date designed	2026-04-02
Always ready	0 for baseline comparison
Deployment shape	Same function logic, varying dependency and init profile

6. Variables¶

Controlled

Plan type: Consumption vs. Flex Consumption
Runtime: Python 3.11 vs. Node.js 20
Dependency count: minimal (~2), moderate (~10), heavy (30+)
Init complexity: fast (~100 ms) vs. slow (~2 s)
Region: koreacentral
Trigger type: HTTP
Test cadence: idle window long enough to force scale-to-zero behavior before next request

Observed

Total cold start duration to first successful HTTP response
Time from first platform trace to worker ready
Time spent restoring/acquiring package content
Time spent in language/framework import or bootstrap
Time spent in application-level initialization before first handler execution
Variance across repeated cold starts per configuration

7. Instrumentation¶

Telemetry sources¶

Application Insights requests
Application Insights traces
Azure Functions runtime logs in Log Analytics / Application Insights
Custom trace markers emitted from app startup and first invocation

Trace markers to add¶

Use consistent custom dimensions on all startup traces:

Marker	Meaning	Example customDimensions
`coldstart.test.begin`	First line emitted when worker process begins app bootstrap	`phase=app-bootstrap`
`coldstart.imports.begin`	Before heavy imports / module loads	`phase=framework-init`
`coldstart.imports.end`	After imports complete	`phase=framework-init`
`coldstart.appinit.begin`	Before app-level singleton construction	`phase=app-init`
`coldstart.appinit.end`	After app-level initialization	`phase=app-init`
`coldstart.handler.begin`	First request enters function handler	`phase=handler`
`coldstart.handler.end`	First request completed	`phase=handler`

Derived phase model¶

Host startup = first platform startup trace -> first worker/user trace
Package restore = deployment package acquisition / package mount / dependency acquisition window inferred from platform traces before user code becomes visible
Framework init = coldstart.imports.begin -> coldstart.imports.end
App code init = coldstart.appinit.begin -> coldstart.appinit.end
Handler execution = coldstart.handler.begin -> coldstart.handler.end

KQL queries for analysis¶

Query 1: Find candidate cold requests¶

requests
| where cloud_RoleName =~ "<function-app-name>"
| where name startswith "GET /api/coldstart"
| order by timestamp asc
| serialize
| extend previousRequest = prev(timestamp)
| extend idleGapMinutes = datetime_diff('minute', timestamp, previousRequest)
| extend isColdCandidate = iff(isnull(previousRequest) or idleGapMinutes >= 10, true, false)
| project timestamp, operation_Id, duration, resultCode, success, idleGapMinutes, isColdCandidate

Query 2: Reconstruct startup markers for one cold invocation¶

traces
| where cloud_RoleName =~ "<function-app-name>"
| where operation_Id == "<operation-id>"
| where message startswith "coldstart." or message has "Host started" or message has "Worker process started"
| project timestamp, message, severityLevel, customDimensions
| order by timestamp asc

Query 3: Estimate phase durations from custom markers¶

let targetOperation = "<operation-id>";
let markerTimes = traces
| where cloud_RoleName =~ "<function-app-name>"
| where operation_Id == targetOperation
| summarize
    importsBegin = minif(timestamp, message == "coldstart.imports.begin"),
    importsEnd = minif(timestamp, message == "coldstart.imports.end"),
    appInitBegin = minif(timestamp, message == "coldstart.appinit.begin"),
    appInitEnd = minif(timestamp, message == "coldstart.appinit.end"),
    handlerBegin = minif(timestamp, message == "coldstart.handler.begin"),
    handlerEnd = minif(timestamp, message == "coldstart.handler.end");
markerTimes
| extend frameworkInitMs = datetime_diff('millisecond', importsEnd, importsBegin)
| extend appInitMs = datetime_diff('millisecond', appInitEnd, appInitBegin)
| extend handlerMs = datetime_diff('millisecond', handlerEnd, handlerBegin)

Query 4: Compare cold starts by profile¶

requests
| where name startswith "GET /api/coldstart"
| extend dependencyProfile = tostring(customDimensions["dependencyProfile"])
| extend initProfile = tostring(customDimensions["initProfile"])
| extend planType = tostring(customDimensions["planType"])
| extend runtime = tostring(customDimensions["runtime"])
| summarize
    coldCount = count(),
    avgMs = avg(duration),
    p50Ms = percentile(duration, 50),
    p95Ms = percentile(duration, 95),
    maxMs = max(duration)
    by planType, runtime, dependencyProfile, initProfile
| order by runtime asc, planType asc, dependencyProfile asc, initProfile asc

8. Procedure¶

Create four Function Apps per runtime/plan combination so that each app maps to one startup profile family.
Keep the HTTP handler body constant and lightweight so startup dominates total latency.
Build six test profiles per runtime:
- minimal dependencies + fast init
- minimal dependencies + slow init
- moderate dependencies + fast init
- moderate dependencies + slow init
- heavy dependencies + fast init
- heavy dependencies + slow init
Configure Application Insights and ensure sampling does not drop request or trace telemetry for the test window.
Add the custom trace markers listed in Instrumentation.
Deploy all apps to koreacentral on the same day with the same Functions runtime major version.
For Flex Consumption baseline, set always-ready instance count to 0 so the test still measures cold start.
After deployment, send one warm-up request only to validate health, then wait long enough for the app to scale to zero.
Trigger a single HTTP request to /api/coldstart and capture the resulting operation_Id.
Repeat the idle -> single request cycle at least 10 times per profile.
Run the KQL queries to reconstruct phase durations per cold request.
Aggregate median and p95 durations by runtime, plan, dependency profile, and init profile.
Separately repeat one Flex Consumption profile with always-ready enabled to validate whether startup phases collapse as expected.

Managed identity workaround for storage policy

The original setup path for this experiment assumed shared-key based storage access. In this environment, org policy enforces Microsoft.Storage/storageAccounts/allowSharedKeyAccess = false, so shared-key access is blocked.

Workaround: configure the Function App to use managed identity for storage access.

# Assign Storage Blob Data Contributor to the Function App's managed identity
az role assignment create \
  --assignee $FUNCTION_APP_PRINCIPAL_ID \
  --role "Storage Blob Data Contributor" \
  --scope /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RG/providers/Microsoft.Storage/storageAccounts/$STORAGE_NAME

# Update function app to use identity-based connection
az functionapp config appsettings set \
  --resource-group $RG \
  --name $APP_NAME \
  --settings "AzureWebJobsStorage__accountName=$STORAGE_NAME"

9. Expected signal¶

If the hypothesis is correct, the data should show:

Host startup clustering in the 500 ms to 2 s range across most profiles.
Package-related time near 0-1 s for minimal dependencies but several seconds for heavy dependency sets.
Framework initialization usually below 1 s, with Python slightly more sensitive to import-heavy packages.
App code initialization increasing by roughly the injected startup delay for the slow-init profile.
Flex Consumption cold starts lower than Consumption for the same profile, especially when package/setup cost is not dominant.

sequenceDiagram
    participant Client
    participant FrontEnd as Azure Functions Front End
    participant Host as Functions Host
    participant Worker as Language Worker
    participant App as User App Startup
    participant Handler as Function Handler

    Client->>FrontEnd: First HTTP request after idle
    FrontEnd->>Host: Allocate instance / start host
    Host->>Host: Host startup
    Host->>Worker: Start language worker
    Worker->>Worker: Package acquisition / dependency load
    Worker->>App: Framework import / bootstrap
    App->>App: Application initialization
    App->>Handler: Invoke first request
    Handler-->>Client: First HTTP response

10. Results¶

SIMULATED RESULTS

The tables in this section are realistic placeholders for planning and support-readiness purposes. They are not lab measurements.

Simulated median cold-start breakdown by profile¶

Runtime	Plan	Dependencies	Init profile	Host startup (ms)	Package restore (ms)	Framework init (ms)	App init (ms)	First handler (ms)	Total (ms)
Node.js 20	Consumption	Minimal (2)	Fast	900	150	180	110	70	1410
Node.js 20	Consumption	Minimal (2)	Slow	920	170	190	2080	75	3435
Node.js 20	Consumption	Moderate (10)	Fast	980	1300	260	120	75	2735
Node.js 20	Consumption	Moderate (10)	Slow	1000	1350	270	2090	80	4790
Node.js 20	Consumption	Heavy (30+)	Fast	1100	4200	420	130	85	5935
Node.js 20	Consumption	Heavy (30+)	Slow	1120	4350	430	2110	90	8100
Node.js 20	Flex Consumption	Minimal (2)	Fast	620	110	160	105	65	1060
Node.js 20	Flex Consumption	Minimal (2)	Slow	650	120	170	2070	70	3080
Node.js 20	Flex Consumption	Moderate (10)	Fast	700	900	220	115	70	2005
Node.js 20	Flex Consumption	Moderate (10)	Slow	730	950	220	2085	75	4060
Node.js 20	Flex Consumption	Heavy (30+)	Fast	820	2900	360	125	80	4285
Node.js 20	Flex Consumption	Heavy (30+)	Slow	850	3050	370	2100	85	6455
Python 3.11	Consumption	Minimal (2)	Fast	980	180	320	120	75	1675
Python 3.11	Consumption	Minimal (2)	Slow	1000	190	330	2090	80	3690
Python 3.11	Consumption	Moderate (10)	Fast	1080	1700	520	130	85	3515
Python 3.11	Consumption	Moderate (10)	Slow	1100	1750	530	2100	90	5570
Python 3.11	Consumption	Heavy (30+)	Fast	1250	6100	900	150	95	8495
Python 3.11	Consumption	Heavy (30+)	Slow	1280	6400	920	2120	100	10820
Python 3.11	Flex Consumption	Minimal (2)	Fast	700	130	280	115	70	1295
Python 3.11	Flex Consumption	Minimal (2)	Slow	720	140	290	2080	75	3305
Python 3.11	Flex Consumption	Moderate (10)	Fast	790	1200	430	125	80	2625
Python 3.11	Flex Consumption	Moderate (10)	Slow	810	1250	440	2095	85	4680
Python 3.11	Flex Consumption	Heavy (30+)	Fast	900	3900	700	145	90	5735
Python 3.11	Flex Consumption	Heavy (30+)	Slow	930	4100	720	2120	95	7965

Simulated p95 total cold-start duration¶

Runtime	Plan	Minimal / Fast	Moderate / Fast	Heavy / Fast	Heavy / Slow
Node.js 20	Consumption	2200 ms	4100 ms	8600 ms	11000 ms
Node.js 20	Flex Consumption	1700 ms	3100 ms	6100 ms	8400 ms
Python 3.11	Consumption	2600 ms	5200 ms	11800 ms	14500 ms
Python 3.11	Flex Consumption	1900 ms	3900 ms	8100 ms	10300 ms

Simulated relative contribution for selected profiles¶

Profile	Host %	Package %	Framework %	App init %	Handler %	Dominant phase
Node.js Consumption, Minimal / Fast	63.8	10.6	12.8	7.8	5.0	Host startup
Node.js Consumption, Heavy / Fast	18.5	70.8	7.1	2.2	1.4	Package restore
Python Consumption, Heavy / Fast	14.7	71.8	10.6	1.8	1.1	Package restore
Python Flex, Minimal / Slow	21.8	4.2	8.8	62.9	2.3	App init

11. Interpretation¶

Measured (simulated design target): The largest swing factor is package/dependency weight, not first-request handler time.
Correlated: Higher dependency count correlates with larger cold-start totals in both runtimes.
Correlated: Slow application initialization adds almost linearly to total cold start, regardless of hosting plan.
Inferred: Python is likely to show more sensitivity than Node.js when dependency graphs are import-heavy.
Inferred: Flex Consumption reduces the platform baseline, but it does not eliminate startup work performed by packages or user code.

Support interpretation should therefore separate three different questions:

Is the delay mostly platform baseline?
Is the delay mostly artifact/dependency size?
Is the delay mostly user-controlled startup logic?

Without this separation, teams may incorrectly escalate a dependency-heavy app as a platform cold-start regression.

12. What this proves¶

If live execution matches the simulated pattern, this experiment would support the following conclusions:

Cold start is not a single opaque delay; it can be decomposed into platform, dependency, framework, and app-init phases.
Heavy dependency sets can dominate total cold start more than the function handler itself.
Artificially slow startup code is clearly distinguishable from platform host startup when trace markers are added.
Flex Consumption improves baseline cold-start behavior, but customer code and dependency footprint still materially matter.

13. What this does NOT prove¶

It does not prove exact cold-start numbers for all regions, runtimes, or trigger types.
It does not prove that every long cold start is a platform issue.
It does not prove package restoration is always remote download time; some of that window may be package mount, extraction, import, or filesystem access.
It does not prove behavior for Premium, Dedicated, Durable Functions, or non-HTTP triggers.
It does not replace customer-specific telemetry from an actual production incident.

14. Support takeaway¶

When a customer reports cold start:

Ask which hosting plan they use and whether Flex always-ready is enabled.
Ask whether the app recently added large dependencies or heavy startup logic.
Compare first-request duration with warm-request duration.
Collect request + trace correlation for one known cold invocation.
Identify whether delay is before user code, during imports, or inside app initialization.

Decision heuristic:

Delay before user traces appear -> investigate host/platform/package acquisition.
Delay between import markers -> investigate dependency tree and framework startup.
Delay between app-init markers -> investigate customer startup logic and global object creation.
Delay only inside handler -> cold start might be secondary; investigate downstream dependency latency.

15. Reproduction notes¶

Cold-start timing is sensitive to idle duration; use a conservative idle gap such as 10-15 minutes.
Repeat enough times to avoid overfitting to one noisy platform allocation event.
Keep deployment artifacts stable during the test window; redeployment can change package acquisition characteristics.
For Flex Consumption, document whether always-ready is 0 or nonzero, because that changes interpretation.
Disable or reduce telemetry sampling for the experiment window so startup traces are not lost.
If host startup exceeds 30 seconds on Flex Consumption, Microsoft documents startup timeout considerations for app initialization; treat such runs as a separate failure mode, not just a slow cold start.