LangGraph RAG Agent¶

Trigger: HTTP | State: stateful (agent) | Guarantee: request-response | Difficulty: advanced | Showcase: LangGraph + Knowledge

Overview¶

This recipe combines azure-functions-langgraph-python with Azure AI Search to expose a Retrieval-Augmented Generation (RAG) agent on Azure Functions.

The agent keeps per-thread conversation state, routes each turn through a LangGraph workflow, and decides whether to search a knowledge base or answer directly. The example keeps the decision policy simple so the integration stays easy to understand.

When to Use¶

You want a serverless LangGraph agent with a built-in knowledge search tool.
You need a single HTTP endpoint for multi-turn conversations.
You want typed request validation, OpenAPI metadata, and structured logs in the same recipe.

When NOT to Use¶

You only need a stateless FAQ endpoint with no conversation memory.
You need durable, hours-long orchestration better handled by Durable Functions.
You need production-grade retrieval ranking, persistent checkpoints, and LLM-based tool selection out of the box.

Architecture¶

flowchart LR
    A[Client Query] --> B[POST /api/chat]
    B --> C[LangGraph Agent]
    C --> D{Route decision}
    D -->|Need grounding| E[Tool: Knowledge Search]
    D -->|Can answer directly| F[Tool: Direct Response]
    E --> G[Answer]
    F --> G

Prerequisites¶

Python 3.10+
Azure Functions Core Tools v4
langgraph, azure-functions-langgraph-python
azure-functions-validation-python, azure-functions-openapi-python, and azure-functions-logging-python

Project Structure¶

examples/ai-and-agents/langgraph_rag_agent/
|- function_app.py
|- host.json
|- local.settings.json.example
|- pyproject.toml
`- README.md

Implementation¶

The example defines a small LangGraph with three logical steps:

Read the latest user turn and decide whether retrieval is needed.
Call the knowledge tool when the question looks domain-specific.
Otherwise generate a direct response and append it to thread history.

def router_node(state: AgentState) -> dict[str, str]:
    message = state["messages"][-1]["content"].lower()
    keywords = ("search", "docs", "policy", "manual", "knowledge")
    route = "knowledge_search" if any(word in message for word in keywords) else "direct_response"
    return {"route": route}


def knowledge_search_node(state: AgentState) -> dict[str, object]:
    query = state["messages"][-1]["content"]
    citations = search_knowledge(query=query, top_k=state.get("top_k", 3))
    answer = build_rag_answer(query, citations)
    return {
        "route": "knowledge_search",
        "answer": answer,
        "citations": citations,
        "messages": state["messages"] + [{"role": "assistant", "content": answer}],
    }

The HTTP handler remains a normal Azure Functions route, so it can use the usual decorator stack:

@app.route(route="chat", methods=["POST"])
@with_context
@openapi(summary="Chat with LangGraph RAG agent", request_body=ChatRequest, response={200: ChatResponse}, tags=["ai"])
@validate_http(body=ChatRequest, response_model=ChatResponse)
def chat(req: func.HttpRequest, body: ChatRequest) -> func.HttpResponse:
    ...

This keeps the integration matrix explicit:

LangGraph for graph definition and routing
Knowledge for retrieval augmentation
Validation for typed request and response contracts
OpenAPI for discoverable API metadata
Logging for per-thread observability

Behavior¶

The diagram below shows the runtime interaction between components.

stateDiagram-v2
    [*] --> ReceiveTurn
    ReceiveTurn --> DecideRoute
    DecideRoute --> KnowledgeSearch: retrieval needed
    DecideRoute --> DirectResponse: answer directly
    KnowledgeSearch --> ComposeAnswer
    DirectResponse --> ComposeAnswer
    ComposeAnswer --> PersistThread
    PersistThread --> [*]

The sample keeps conversation state in memory with thread_id. For production, switch to a persistent checkpointer or external thread store.

Request Format¶

{
  "message": "Search the onboarding runbook for password reset steps.",
  "thread_id": "support-42",
  "top_k": 3
}

Run Locally¶

cd examples/ai-and-agents/langgraph_rag_agent
pip install -e ".[dev]"
cp local.settings.json.example local.settings.json
func start

Expected Output¶

Functions:

    chat: [POST] http://localhost:7071/api/chat

Example call:

curl -X POST http://localhost:7071/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Search the onboarding runbook for password reset steps.",
    "thread_id": "support-42",
    "top_k": 2
  }'

Example response:

{
  "thread_id": "support-42",
  "route": "knowledge_search",
  "answer": "I searched the knowledge base and found two relevant passages about password reset steps.",
  "citations": [
    {
      "title": "Onboarding Runbook",
      "snippet": "Reset the password from the Helpdesk portal before reissuing MFA.",
      "source": "mock://onboarding-runbook"
    }
  ],
  "history_length": 2
}

Production Considerations¶

State: replace in-memory thread storage with a persistent store or LangGraph checkpointer.
Retrieval quality: use embeddings, filtering, and citation formatting that match your corpus.
Security: protect /chat with FUNCTION or stronger auth before deployment.
Observability: log thread_id, route, knowledge hit count, and latency per turn.
Timeouts: keep retrieval bounded and stream long-running model calls when needed.