Production Guide¶
This guide focuses on production hardening for azure-functions-langgraph deployments on Azure Functions.
Authentication & Authorization¶
Default auth behavior¶
LangGraphApp defaults to anonymous HTTP access:
In production environments, this default is intentionally noisy.
When AZURE_FUNCTIONS_ENVIRONMENT is set and auth_level remains anonymous,
the app logs a warning at startup (app.py, __post_init__, around lines 98-104).
⚠️ ANONYMOUS is convenient for local development but too permissive for internet-facing production APIs.
Set a production-safe app-level auth level¶
Use FUNCTION (recommended baseline) or ADMIN (only for tightly controlled internal surfaces).
import azure.functions as func
from azure_functions_langgraph import LangGraphApp
app = LangGraphApp(auth_level=func.AuthLevel.FUNCTION)
Per-graph auth override¶
You can override auth for specific graphs via register():
import azure.functions as func
app = LangGraphApp(auth_level=func.AuthLevel.FUNCTION)
# Production default for private endpoints
app.register(graph=internal_graph, name="internal", auth_level=func.AuthLevel.FUNCTION)
# Explicitly public endpoint (only when intended)
app.register(graph=public_graph, name="public", auth_level=func.AuthLevel.ANONYMOUS)
Use per-graph overrides to keep a strict default while exposing narrowly scoped public routes.
⚠️ Scope: Per-graph auth_level overrides apply only to native routes (/api/graphs/{name}/invoke, /api/graphs/{name}/stream).
When platform_compat=True is enabled, all platform-compatible routes (/api/threads/..., /api/runs/...) use the app-level auth_level regardless of per-graph overrides.
API Management integration pattern¶
For production, a common pattern is:
- Set Function routes to
FUNCTIONauth. - Place Azure API Management (APIM) in front of the Function App.
- Require client auth at APIM (JWT/OAuth2/subscription key/mTLS).
- Keep Function keys private between APIM and Function App.
- Apply APIM rate limits, IP filtering, and request validation policies.
This creates a layered security model: edge auth and governance in APIM, key-based gate at the Function layer.
Observability¶
Logging integration (recommended operator instrumentation)¶
This package does not emit structured log fields automatically. The following are recommended practices for production observability when building your Function App:
- Emit structured fields (
graph_name,thread_id,assistant_id,run_id,status_code,duration_ms). - Log explicit lifecycle markers: request received, graph started, graph completed/failed.
- Include error categories (
validation_error,execution_error,storage_error) to simplify alerting.
azure-functions-logging provides structured logging helpers that pair well with this package.
Application Insights correlation¶
Azure Functions automatically wires requests/dependencies into Application Insights when enabled in the Function App. Use this to correlate:
- incoming HTTP request
- graph invocation log records
- downstream dependency calls (LLM/API/storage)
- final success/error outcome
This enables end-to-end traceability for each run.
Health endpoint¶
GET /health (exposed as GET /api/health with the default Functions route prefix) returns a liveness and configuration response.
It confirms the app is running and lists registered graphs with their checkpointer status.
⚠️ This is a liveness/configuration endpoint, not a dependency-readiness check. It does not probe Blob Storage, Table Storage, or downstream LLM availability. For deep health checks, implement a custom endpoint or use Azure Monitor availability tests.
The /health endpoint inherits the app-level auth_level, not per-graph overrides.
If the app uses FUNCTION auth, /health also requires a function key — even if individual graphs are ANONYMOUS.
The response includes a list of graphs and whether each has a checkpointer.
{
"status": "ok",
"graphs": [
{
"name": "my_agent",
"description": "Customer support agent",
"has_checkpointer": true
}
]
}
Use this endpoint for liveness checks and deployment validation.
Monitor health check success rates, response-time percentiles (P50/P95/P99), and HTTP error rates by endpoint and graph name.
Timeouts & Cancellation¶
Azure Functions timeout limits¶
Execution timeout is governed by Azure Functions plan and host.json:
| Plan | Default timeout | Maximum timeout |
|---|---|---|
| Consumption | 5 minutes | 10 minutes |
| Flex Consumption | 30 minutes | Unlimited (configurable) |
| Premium | 30 minutes | Unlimited (configurable) |
| Dedicated (App Service) | 30 minutes | Unlimited |
Runtime behavior in this package¶
Graph execution is synchronous from the HTTP handler perspective.
graph.invoke() and graph.stream() run until completion (or failure).
- No package-level timeout wrapper is applied around graph calls.
- No built-in cancellation endpoint is provided for long-running graph runs.
⚠️ If a graph exceeds platform timeout, the request fails at the Functions host boundary.
⚠️ HTTP response ceiling: Azure Functions enforces a hard 230-second limit on HTTP response time regardless of functionTimeout.
Graph invocations that exceed 230 seconds will fail with a gateway timeout even if functionTimeout allows longer execution.
For workloads approaching this limit, consider async patterns (queue trigger + status polling) instead of synchronous HTTP.
Configure timeout explicitly¶
Set functionTimeout in host.json to a value aligned with your plan and workload SLO.
{
"version": "2.0",
"functionTimeout": "00:10:00",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
}
}
Production guidance:
- Keep graph completion comfortably under timeout limits.
- Cap upstream LLM/tool call timeouts so they fail fast.
- Break long workflows into resumable steps via checkpointers.
- Route very long orchestration to Durable Functions patterns when needed.
Request & Input Limits¶
LangGraphApp enforces the following defaults to protect against oversized or deeply nested payloads:
| Limit | Default | Config parameter |
|---|---|---|
| Request body size | 1 MiB | max_request_body_bytes |
| Stream response size | 1 MiB | max_stream_response_bytes |
| Input JSON depth | 32 levels | max_input_depth |
| Input JSON nodes | 10,000 | max_input_nodes |
Override these in LangGraphApp constructor if your workload requires larger payloads:
app = LangGraphApp(
max_request_body_bytes=2 * 1024 * 1024, # 2 MiB
max_stream_response_bytes=4 * 1024 * 1024, # 4 MiB
)
Requests exceeding these limits are rejected before graph execution begins.
Streaming Behavior¶
Current behavior (critical)¶
Endpoints with stream in the path and Content-Type: text/event-stream return SSE-formatted payloads,
but delivery is buffered in Azure Functions Python today.
This affects native and platform routes such as:
POST /api/graphs/{name}/streamPOST /api/threads/{thread_id}/runs/streamPOST /api/runs/stream
⚠️ Clients receive the complete SSE body at once, not incremental chunks.
⚠️ Buffered SSE response limit: Stream responses are capped at 1 MiB (max_stream_response_bytes=1_048_576 by default).
If the accumulated SSE payload exceeds this limit, an event: error is injected into the SSE body rather than an HTTP 413/500.
Adjust max_stream_response_bytes in LangGraphApp if your graph produces large streaming output.
Why this happens¶
This is an Azure Functions Python worker limitation for HTTP streaming/chunked transfer. The package currently collects stream events and returns a single response body.
Operational recommendation¶
For long-running production runs, prefer:
POST /api/threads/{thread_id}/runs/waitPOST /api/runs/wait
over the corresponding /stream routes to avoid UX and latency expectation mismatch.
⚠️ Note: Thread and run routes (/api/threads/..., /api/runs/...) are only available when platform_compat=True is set in LangGraphApp.
Concurrency & Scale¶
Thread-assistant binding and TOCTOU race¶
Platform routes bind a thread to its first assistant and reject assistant switches later.
There is an explicit TOCTOU window between read and update in platform/routes.py.
⚠️ Operator impact: In multi-instance deployments, concurrent requests for the same thread can race between read and write. Without external serialization (e.g., queue-based workers), the second writer may silently overwrite the first.
See DESIGN.md (thread-assistant binding design decision and concurrency notes) for detailed constraints and trade-offs.
Blob checkpointer single-writer assumption¶
AzureBlobCheckpointSaver is designed around single-writer-per-thread semantics.
The implementation documents concurrent-writer conflict resolution as a non-goal.
⚠️ Operator impact: Concurrent writes to the same thread/checkpoint namespace from multiple instances can produce inconsistent checkpoint state. If your deployment runs multiple Function App instances, ensure each thread's writes are serialized (see Recommended production pattern below).
Azure Table thread store scale envelope¶
AzureTableThreadStore uses a single partition key (PartitionKey="thread") with client-side metadata filtering.
This is a design-envelope approximation, not an enforced limit, and generally works well up to roughly 100K threads (~500 entities/sec throughput).
Beyond that envelope, consider a sharded or higher-scale backend such as Cosmos DB.
Concurrency controls¶
Only multitask_strategy="reject" is supported.
Concurrent run submissions for the same thread are rejected with HTTP 409 — no queuing or interruption is implemented.
Operator impact: If your workload has bursts of concurrent requests targeting the same thread, implement client-side retry with backoff, or use the queue-based worker pattern below.
Recommended production pattern¶
For multi-instance writes, serialize thread mutations through a queue-based worker pattern:
- HTTP layer validates and enqueues run requests by
thread_id. - Workers process one message per thread key at a time.
- Storage writes remain effectively single-writer per thread.
- Completion status is written back for polling/webhook delivery.
This removes most race windows and aligns with current storage assumptions.
Storage Configuration¶
Azure Blob checkpointer¶
Install optional dependency:
Configure with connection string from environment and construct a container client:
import os
from azure.storage.blob import BlobServiceClient
from azure_functions_langgraph.checkpointers.azure_blob import AzureBlobCheckpointSaver
conn = os.environ["AZURE_STORAGE_CONNECTION_STRING"]
service = BlobServiceClient.from_connection_string(conn)
container = service.get_container_client("langgraph-checkpoints")
checkpointer = AzureBlobCheckpointSaver(container_client=container)
Use langgraph-checkpoints as the default production container name unless you have environment-specific naming requirements.
If you already manage Azure clients elsewhere, pass a prepared container client directly.
Azure Table thread store¶
Install optional dependency:
Configure with connection string and table name:
import os
from azure_functions_langgraph.stores.azure_table import AzureTableThreadStore
conn = os.environ["AZURE_STORAGE_CONNECTION_STRING"]
thread_store = AzureTableThreadStore.from_connection_string(
connection_string=conn,
table_name="langgraphthreads",
)
Use langgraphthreads as the default production table name unless you need a custom naming scheme.
Connection string security¶
Do not hardcode storage secrets in source code or deployment artifacts.
Use one of these patterns:
- App Settings with Key Vault references
- Managed identity with Azure SDK identity-based auth
- Secret rotation policy with zero-downtime rollout
⚠️ Treat AZURE_STORAGE_CONNECTION_STRING as a high-value credential.
Sources¶
- Azure Functions Python developer reference
- Azure Functions host.json reference
- Azure Functions authentication and authorization
- Azure Blob Storage documentation
- Azure Table Storage documentation
- Azure Functions scale and hosting
See Also¶
- DESIGN.md — Key design decisions and constraints
- COMPATIBILITY.md — SDK version compatibility policy
- azure-functions-logging — Structured logging
- azure-functions-doctor — Pre-deploy diagnostics
- azure-functions-openapi — API documentation
- azure-functions-validation — Request/response validation