Streaming AI Response¶
Trigger: HTTP | State: stateless | Guarantee: buffered SSE | Difficulty: intermediate | Showcase: SSE from Azure OpenAI
Overview¶
This recipe exposes an HTTP endpoint that returns a buffered Server-Sent Events (SSE) response produced from Azure OpenAI streaming chat completions.
It keeps the route in the normal cookbook HTTP shape with
@with_context, @openapi, and @validate_http, and uses
azure-functions-logging-python so each streamed request can be correlated in logs.
This is useful when clients need SSE-formatted output from Azure OpenAI.
When to Use¶
- You want SSE-formatted output from Azure OpenAI chat completions.
- Your client can parse SSE-formatted text responses.
- You want a simple streaming wrapper over Azure OpenAI without a full agent stack.
When NOT to Use¶
- You only need a single JSON response after generation completes.
- Your client platform cannot consume SSE streams.
- You need durable orchestration or background execution instead of a live stream.
Architecture¶
flowchart LR
A[Client] --> B[HTTP trigger\nPOST /api/stream]
B --> C[@with_context + @openapi + @validate_http]
C --> D[Azure OpenAI streaming completion]
D --> E[SSE events]
E --> A
Behavior¶
The sequence below shows the runtime interaction between components.
sequenceDiagram
participant Client
participant Function as Azure Function
participant AOAI as Azure OpenAI
Client->>Function: POST /api/stream { message }
Function->>AOAI: chat.completions.create(stream=True)
loop each chunk
AOAI-->>Function: delta token
Function-->>Client: data: {"delta":"..."}
end
Function-->>Client: event: done
Prerequisites¶
- Python 3.10+
- Azure Functions Core Tools v4
openaiSDK- Azure OpenAI resource with a chat deployment
Project Structure¶
examples/ai-and-agents/streaming_ai_response/
|- function_app.py
|- host.json
|- local.settings.json.example
|- pyproject.toml
`- README.md
Implementation¶
The example project is examples/ai-and-agents/streaming_ai_response/.
function_app.py configures azure-functions-logging-python, validates the request
body, and converts Azure OpenAI streaming chunks into SSE frames with the
text/event-stream content type.
The route still uses the toolkit decorators, even though the payload format is a stream rather than a JSON response model:
@app.route(route="stream", methods=["POST"])
@with_context
@openapi(summary="Stream Azure OpenAI response", request_body=StreamRequest, tags=["ai"])
@validate_http(body=StreamRequest)
def stream_chat(req: func.HttpRequest, body: StreamRequest) -> func.HttpResponse:
...
Inside the handler, the function reads Azure OpenAI streaming events via the
openai SDK and writes SSE-formatted output:
for event in client.chat.completions.create(..., stream=True):
delta = event.choices[0].delta.content
if delta:
frames.append(f"data: {json.dumps({'delta': delta})}\n\n")
azure-functions-logging-python is useful here because streaming failures often happen
mid-response and need request-level correlation data in logs.
Run Locally¶
cd examples/ai-and-agents/streaming_ai_response
pip install -e ".[dev]"
cp local.settings.json.example local.settings.json
func start
Expected Output¶
Example request:
curl -N -X POST http://localhost:7071/api/stream \
-H "Content-Type: application/json" \
-d '{"message": "Explain Azure Functions scaling in three short points."}'
Example SSE output:
data: {"delta": "Azure Functions "}
data: {"delta": "scales automatically "}
event: done
data: {"status": "completed"}
Production Considerations¶
- Note that this example buffers the full response before returning it. For true token-by-token streaming, verify your hosting plan supports HTTP streaming.
- Add timeouts and client disconnect handling around long model responses.
- Record request IDs, deployment names, and stream completion status with
azure-functions-logging-python. - Consider chunked responses for very large outputs if your hosting plan supports HTTP streaming.