Skip to content

Production Guide

This guide focuses on production hardening for azure-functions-langgraph deployments on Azure Functions.

Authentication & Authorization

Default auth behavior

LangGraphApp defaults to anonymous HTTP access:

# src/azure_functions_langgraph/app.py
auth_level: func.AuthLevel = func.AuthLevel.ANONYMOUS

In production environments, this default is intentionally noisy. When AZURE_FUNCTIONS_ENVIRONMENT is set and auth_level remains anonymous, the app logs a warning at startup (app.py, __post_init__, around lines 98-104).

⚠️ ANONYMOUS is convenient for local development but too permissive for internet-facing production APIs.

Set a production-safe app-level auth level

Use FUNCTION (recommended baseline) or ADMIN (only for tightly controlled internal surfaces).

import azure.functions as func

from azure_functions_langgraph import LangGraphApp

app = LangGraphApp(auth_level=func.AuthLevel.FUNCTION)

Per-graph auth override

You can override auth for specific graphs via register():

import azure.functions as func

app = LangGraphApp(auth_level=func.AuthLevel.FUNCTION)

# Production default for private endpoints
app.register(graph=internal_graph, name="internal", auth_level=func.AuthLevel.FUNCTION)

# Explicitly public endpoint (only when intended)
app.register(graph=public_graph, name="public", auth_level=func.AuthLevel.ANONYMOUS)

Use per-graph overrides to keep a strict default while exposing narrowly scoped public routes.

⚠️ Scope: Per-graph auth_level overrides apply only to native routes (/api/graphs/{name}/invoke, /api/graphs/{name}/stream). When platform_compat=True is enabled, all platform-compatible routes (/api/threads/..., /api/runs/...) use the app-level auth_level regardless of per-graph overrides.

API Management integration pattern

For production, a common pattern is:

  1. Set Function routes to FUNCTION auth.
  2. Place Azure API Management (APIM) in front of the Function App.
  3. Require client auth at APIM (JWT/OAuth2/subscription key/mTLS).
  4. Keep Function keys private between APIM and Function App.
  5. Apply APIM rate limits, IP filtering, and request validation policies.

This creates a layered security model: edge auth and governance in APIM, key-based gate at the Function layer.

Observability

This package does not emit structured log fields automatically. The following are recommended practices for production observability when building your Function App:

  • Emit structured fields (graph_name, thread_id, assistant_id, run_id, status_code, duration_ms).
  • Log explicit lifecycle markers: request received, graph started, graph completed/failed.
  • Include error categories (validation_error, execution_error, storage_error) to simplify alerting.

azure-functions-logging provides structured logging helpers that pair well with this package.

Application Insights correlation

Azure Functions automatically wires requests/dependencies into Application Insights when enabled in the Function App. Use this to correlate:

  • incoming HTTP request
  • graph invocation log records
  • downstream dependency calls (LLM/API/storage)
  • final success/error outcome

This enables end-to-end traceability for each run.

Health endpoint

GET /health (exposed as GET /api/health with the default Functions route prefix) returns a liveness and configuration response. It confirms the app is running and lists registered graphs with their checkpointer status.

⚠️ This is a liveness/configuration endpoint, not a dependency-readiness check. It does not probe Blob Storage, Table Storage, or downstream LLM availability. For deep health checks, implement a custom endpoint or use Azure Monitor availability tests.

The /health endpoint inherits the app-level auth_level, not per-graph overrides. If the app uses FUNCTION auth, /health also requires a function key — even if individual graphs are ANONYMOUS.

The response includes a list of graphs and whether each has a checkpointer.

{
  "status": "ok",
  "graphs": [
    {
      "name": "my_agent",
      "description": "Customer support agent",
      "has_checkpointer": true
    }
  ]
}

Use this endpoint for liveness checks and deployment validation.

Monitor health check success rates, response-time percentiles (P50/P95/P99), and HTTP error rates by endpoint and graph name.

Timeouts & Cancellation

Azure Functions timeout limits

Execution timeout is governed by Azure Functions plan and host.json:

Plan Default timeout Maximum timeout
Consumption 5 minutes 10 minutes
Flex Consumption 30 minutes Unlimited (configurable)
Premium 30 minutes Unlimited (configurable)
Dedicated (App Service) 30 minutes Unlimited

Runtime behavior in this package

Graph execution is synchronous from the HTTP handler perspective. graph.invoke() and graph.stream() run until completion (or failure).

  • No package-level timeout wrapper is applied around graph calls.
  • No built-in cancellation endpoint is provided for long-running graph runs.

⚠️ If a graph exceeds platform timeout, the request fails at the Functions host boundary.

⚠️ HTTP response ceiling: Azure Functions enforces a hard 230-second limit on HTTP response time regardless of functionTimeout. Graph invocations that exceed 230 seconds will fail with a gateway timeout even if functionTimeout allows longer execution. For workloads approaching this limit, consider async patterns (queue trigger + status polling) instead of synchronous HTTP.

Configure timeout explicitly

Set functionTimeout in host.json to a value aligned with your plan and workload SLO.

{
  "version": "2.0",
  "functionTimeout": "00:10:00",
  "logging": {
    "applicationInsights": {
      "samplingSettings": {
        "isEnabled": true,
        "excludedTypes": "Request"
      }
    }
  }
}

Production guidance:

  1. Keep graph completion comfortably under timeout limits.
  2. Cap upstream LLM/tool call timeouts so they fail fast.
  3. Break long workflows into resumable steps via checkpointers.
  4. Route very long orchestration to Durable Functions patterns when needed.

Request & Input Limits

LangGraphApp enforces the following defaults to protect against oversized or deeply nested payloads:

Limit Default Config parameter
Request body size 1 MiB max_request_body_bytes
Stream response size 1 MiB max_stream_response_bytes
Input JSON depth 32 levels max_input_depth
Input JSON nodes 10,000 max_input_nodes

Override these in LangGraphApp constructor if your workload requires larger payloads:

app = LangGraphApp(
    max_request_body_bytes=2 * 1024 * 1024,  # 2 MiB
    max_stream_response_bytes=4 * 1024 * 1024,  # 4 MiB
)

Requests exceeding these limits are rejected before graph execution begins.

Streaming Behavior

Current behavior (critical)

Endpoints with stream in the path and Content-Type: text/event-stream return SSE-formatted payloads, but delivery is buffered in Azure Functions Python today.

This affects native and platform routes such as:

  • POST /api/graphs/{name}/stream
  • POST /api/threads/{thread_id}/runs/stream
  • POST /api/runs/stream

⚠️ Clients receive the complete SSE body at once, not incremental chunks.

⚠️ Buffered SSE response limit: Stream responses are capped at 1 MiB (max_stream_response_bytes=1_048_576 by default). If the accumulated SSE payload exceeds this limit, an event: error is injected into the SSE body rather than an HTTP 413/500. Adjust max_stream_response_bytes in LangGraphApp if your graph produces large streaming output.

Why this happens

This is an Azure Functions Python worker limitation for HTTP streaming/chunked transfer. The package currently collects stream events and returns a single response body.

Operational recommendation

For long-running production runs, prefer:

  • POST /api/threads/{thread_id}/runs/wait
  • POST /api/runs/wait

over the corresponding /stream routes to avoid UX and latency expectation mismatch.

⚠️ Note: Thread and run routes (/api/threads/..., /api/runs/...) are only available when platform_compat=True is set in LangGraphApp.

Concurrency & Scale

Thread-assistant binding and TOCTOU race

Platform routes bind a thread to its first assistant and reject assistant switches later. There is an explicit TOCTOU window between read and update in platform/routes.py.

⚠️ Operator impact: In multi-instance deployments, concurrent requests for the same thread can race between read and write. Without external serialization (e.g., queue-based workers), the second writer may silently overwrite the first.

See DESIGN.md (thread-assistant binding design decision and concurrency notes) for detailed constraints and trade-offs.

Blob checkpointer single-writer assumption

AzureBlobCheckpointSaver is designed around single-writer-per-thread semantics. The implementation documents concurrent-writer conflict resolution as a non-goal.

⚠️ Operator impact: Concurrent writes to the same thread/checkpoint namespace from multiple instances can produce inconsistent checkpoint state. If your deployment runs multiple Function App instances, ensure each thread's writes are serialized (see Recommended production pattern below).

Azure Table thread store scale envelope

AzureTableThreadStore uses a single partition key (PartitionKey="thread") with client-side metadata filtering. This is a design-envelope approximation, not an enforced limit, and generally works well up to roughly 100K threads (~500 entities/sec throughput).

Beyond that envelope, consider a sharded or higher-scale backend such as Cosmos DB.

Concurrency controls

Only multitask_strategy="reject" is supported. Concurrent run submissions for the same thread are rejected with HTTP 409 — no queuing or interruption is implemented.

Operator impact: If your workload has bursts of concurrent requests targeting the same thread, implement client-side retry with backoff, or use the queue-based worker pattern below.

For multi-instance writes, serialize thread mutations through a queue-based worker pattern:

  1. HTTP layer validates and enqueues run requests by thread_id.
  2. Workers process one message per thread key at a time.
  3. Storage writes remain effectively single-writer per thread.
  4. Completion status is written back for polling/webhook delivery.

This removes most race windows and aligns with current storage assumptions.

Storage Configuration

Azure Blob checkpointer

Install optional dependency:

pip install azure-functions-langgraph[azure-blob]

Configure with connection string from environment and construct a container client:

import os

from azure.storage.blob import BlobServiceClient
from azure_functions_langgraph.checkpointers.azure_blob import AzureBlobCheckpointSaver

conn = os.environ["AZURE_STORAGE_CONNECTION_STRING"]
service = BlobServiceClient.from_connection_string(conn)
container = service.get_container_client("langgraph-checkpoints")

checkpointer = AzureBlobCheckpointSaver(container_client=container)

Use langgraph-checkpoints as the default production container name unless you have environment-specific naming requirements.

If you already manage Azure clients elsewhere, pass a prepared container client directly.

Azure Table thread store

Install optional dependency:

pip install azure-functions-langgraph[azure-table]

Configure with connection string and table name:

import os

from azure_functions_langgraph.stores.azure_table import AzureTableThreadStore

conn = os.environ["AZURE_STORAGE_CONNECTION_STRING"]
thread_store = AzureTableThreadStore.from_connection_string(
    connection_string=conn,
    table_name="langgraphthreads",
)

Use langgraphthreads as the default production table name unless you need a custom naming scheme.

Connection string security

Do not hardcode storage secrets in source code or deployment artifacts.

Use one of these patterns:

  • App Settings with Key Vault references
  • Managed identity with Azure SDK identity-based auth
  • Secret rotation policy with zero-downtime rollout

⚠️ Treat AZURE_STORAGE_CONNECTION_STRING as a high-value credential.

Sources

See Also