Architecture¶
1. Top-Level Structure¶
azure-functions-db is composed of three top-level modules:
- core
DbConfigEngineProvider- shared types
- shared errors
- serializers
- trigger
- polling orchestrator, source adapter, state management, Azure Functions integration
- consumes
coreto handle pseudo trigger execution - binding
DbReader(input binding)DbWriter(output binding)- per-invocation session/connection lifecycle management
- decorator
DbBindingsclass providing Azure Functions-style decoratorstriggerwrapsPollTrigger,inputinjects query results,outputinjectsDbOutfor explicit writesinject_reader/inject_writerprovide imperativeDbReader/DbWriterinjection- consumes both
triggerandbindingmodules
1.1 Core Layer¶
Owns shared DB configuration, engine, types, errors, and serialization responsibilities.
DbConfig: shared configuration representation including DB URL, schema, table, etc.EngineProvider: lazy singleton engine/pool management keyed by config- shared types:
Row,RowDict, common payload types - shared errors: base exceptions inherited by both trigger and binding layers
- serializers: DB value normalization and row dict conversion
1.2 Trigger Layer¶
The existing trigger-related layers are preserved; DB connectivity, types, errors, and serialization use core.
1.3 Binding Layer¶
The binding layer provides an imperative API called directly from Azure Functions function bodies.
DbReader: single-row lookup by PK, raw SQL queriesDbWriter: insert/upsert/update/delete and batch writes- per-invocation session management: sessions are opened and closed per function invocation by default
1.4 Dependency Rules¶
trigger -> coredependency onlybinding -> coredependency onlydecorator -> trigger, binding, core(thin orchestration layer)- Cross-import between
triggerandbindingis forbidden
1.5 Request Flow and Runtime Relationship¶
azure-functions-db decorators integrate at two lifecycle points within the Azure Functions runtime:
- Import time — decorators validate configuration.
outputalso creates a module-scopedDbOutinstance. - Per invocation — the decorator wrapper performs DB I/O around the handler call. The exact behavior differs by decorator type.
The four decorator types have distinct invocation flows:
| Decorator | Pre-handler | Handler receives | Post-handler |
|---|---|---|---|
input |
Creates DbReader, reads data, closes reader |
Query result (data) | — |
output |
— | DbOut (module-scoped) |
DbWriter created only when handler calls .set() |
inject_reader |
Creates DbReader |
DbReader instance |
Closes reader |
inject_writer |
Creates DbWriter |
DbWriter instance |
Closes writer |
Binding Flow (input / inject_reader / inject_writer)¶
sequenceDiagram
participant Client as HTTP Client
participant Host as Azure Functions Host
participant Worker as Python Worker
participant Dec as Decorator Wrapper
participant DB as DbReader / DbWriter
participant Handler as Function Handler
rect rgb(240, 248, 255)
note over Worker,Dec: Import Time (startup)
Worker->>Worker: import function modules
Worker->>Dec: @db.input() / @db.inject_reader() / @db.inject_writer() validates config
end
rect rgb(255, 248, 240)
note over Client,DB: Per Invocation
Client->>Host: HTTP Request
Host->>Worker: invoke function
Worker->>Dec: decorator wrapper
Dec->>DB: create per-invocation DbReader or DbWriter
alt @db.input()
DB->>DB: execute query / pk lookup
DB-->>Dec: query result
Dec->>Handler: call handler(req, data=result)
else @db.inject_reader() / @db.inject_writer()
Dec->>Handler: call handler(req, reader=DbReader) or handler(req, writer=DbWriter)
Handler->>DB: handler performs reads / writes directly
end
Handler-->>Dec: return result
Dec->>DB: close reader / writer
Dec-->>Worker: HttpResponse
Worker-->>Host: response
Host-->>Client: HTTP Response
end
Output Flow (@db.output)¶
output differs from the other decorators: DbOut is created once at decoration time (module-scoped), and a DbWriter is only created when the handler calls .set().
sequenceDiagram
participant Client as HTTP Client
participant Host as Azure Functions Host
participant Worker as Python Worker
participant Dec as Decorator Wrapper
participant Out as DbOut (module-scoped)
participant Handler as Function Handler
participant DB as DbWriter
rect rgb(240, 248, 255)
note over Worker,Out: Import Time (startup)
Worker->>Worker: import function modules
Worker->>Dec: @db.output() validates config
Dec->>Out: create DbOut instance (once)
end
rect rgb(255, 248, 240)
note over Client,DB: Per Invocation
Client->>Host: HTTP Request
Host->>Worker: invoke function
Worker->>Dec: decorator wrapper
Dec->>Handler: call handler(req, out=DbOut)
Handler->>Out: out.set(data)
Out->>DB: create DbWriter, insert/upsert, close
Handler-->>Dec: return result
Dec-->>Worker: HttpResponse
Worker-->>Host: response
Host-->>Client: HTTP Response
end
Trigger Flow (poll-based change detection)¶
sequenceDiagram
participant Timer as Azure Timer Trigger
participant Host as Azure Functions Host
participant Worker as Python Worker
participant PT as PollTrigger
participant PR as PollRunner
participant SS as StateStore (Blob)
participant SA as SourceAdapter
participant DB as Database
participant Handler as User Handler
Timer->>Host: scheduled tick
Host->>Worker: invoke trigger function
Worker->>PT: PollTrigger.run(timer, handler)
PT->>PR: construct PollRunner
PR->>PR: runner.tick()
PR->>SS: acquire lease (ETag CAS)
PR->>SS: load checkpoint
PR->>SA: source.fetch(cursor, batch_size)
SA->>DB: SELECT ... WHERE cursor > checkpoint
DB-->>SA: changed rows
SA-->>PR: raw records
PR->>PR: normalize(raw records) → events
PR->>Handler: invoke(events)
Handler-->>PR: success
PR->>SS: commit new checkpoint (ETag CAS)
PR-->>Worker: done
Worker-->>Host: completed
When an EngineProvider is passed to a decorator, it supplies a shared connection pool across invocations at module scope. When omitted, each DbReader or DbWriter creates and owns its own engine. DbReader and DbWriter instances from input, inject_reader, and inject_writer are always per-invocation and closed automatically by the decorator wrapper.
Async Support¶
| Decorator | Async Handler | Mechanism |
|---|---|---|
trigger |
❌ Not supported | PollTrigger.run is synchronous; async handlers are rejected at decoration time |
input |
✅ Supported | DB I/O runs via asyncio.to_thread() |
output |
✅ Supported | DB write runs via asyncio.to_thread() through _AsyncDbOutProxy |
inject_reader |
✅ Supported | Returns _AsyncDbReaderProxy with async methods |
inject_writer |
✅ Supported | Returns _AsyncDbWriterProxy with async methods |
When an async handler is used with input, output, inject_reader, or inject_writer, all blocking database operations are automatically executed in a worker thread via asyncio.to_thread(), keeping the event loop free.
The trigger decorator explicitly rejects async handlers with a ConfigurationError because PollTrigger.run() is synchronous and calling an async handler without await would silently produce an unawaited coroutine.
Input Binding Modes¶
The input decorator operates in two mutually exclusive modes. Exactly one of pk or query must be provided.
| Mode | Parameter | Returns | Use When |
|---|---|---|---|
| Row lookup | pk= |
dict \| None |
You need a single row by primary key |
| SQL query | query= |
list[dict] |
You need multiple rows or custom SQL |
Both pk and params accept either a static dict or a callable that resolves values from other handler parameters at invocation time.
Decorator Composition¶
DbBindings decorators can be combined on a single handler. The following rules are enforced at decoration time:
Valid Combinations¶
| Combination | Use Case |
|---|---|
trigger + output |
Process DB changes and write results |
trigger + inject_writer |
Process DB changes with imperative writes |
input + output |
Read data and write results |
input + inject_writer |
Read data with imperative writes |
inject_reader + inject_writer |
Full imperative control |
inject_reader + output |
Imperative read + auto-write |
Invalid Combinations¶
| Combination | Reason |
|---|---|
input + inject_reader |
Redundant — both read data, use one |
output + inject_writer |
Redundant — both write data, use one |
| Any decorator applied twice | Not meaningful |
Ordering¶
Azure Functions decorators (e.g., @app.schedule) must be outermost. DbBindings decorators should be closest to the function definition.
Concurrency and Thread Safety¶
Azure Functions reuses workers and may invoke functions concurrently. This section documents the lifecycle and thread-safety guarantees of each component.
Component Lifecycle¶
| Component | Scope | Lifecycle | Thread-Safe |
|---|---|---|---|
EngineProvider |
Module-level (shared) | Long-lived, reused across invocations | ✅ Yes — threading.Lock guards cache |
SqlAlchemySource |
Module-level (shared) | Long-lived, reused across invocations | ✅ Yes — read-only after construction |
BlobCheckpointStore |
Module-level (shared) | Long-lived, reused across invocations | ✅ Yes — ETag-based CAS for all mutations |
DbReader |
Per-invocation | Created and closed within a single function call | N/A — not shared |
DbWriter |
Per-invocation | Created and closed within a single function call | N/A — not shared |
EngineProvider¶
EngineProvider caches SQLAlchemy Engine instances keyed by connection configuration. The internal cache is protected by a threading.Lock, making concurrent get_engine() calls safe. SQLAlchemy's Engine itself manages its own connection pool with thread-safe checkout/return.
Module-level usage pattern (recommended):
engine_provider = EngineProvider() # shared across all invocations
@db.input("user", url="%DB_URL%", table="users", pk={"id": 1},
engine_provider=engine_provider)
def handler(user): ...
DbReader / DbWriter¶
These are created per invocation by the decorator layer (or manually by the user). Each instance opens its own SQLAlchemy session and closes it when close() is called. Decorators manage this lifecycle automatically — users never need to call close() when using input, output, inject_reader, or inject_writer.
BlobCheckpointStore¶
Uses Azure Blob Storage ETag-based CAS (Compare-And-Swap) for all state mutations. Concurrent timer triggers racing to acquire the lease are safely resolved — only one wins, others skip the tick. This is the mechanism that prevents duplicate processing during scale-out.
Reflected Metadata¶
SQLAlchemy caches reflected table metadata on the Engine's MetaData object. If a schema migration occurs while the Function App is running, the cached metadata may become stale. Mitigation options:
- Restart the Function App after schema migrations
- Use explicit Table definitions instead of reflection (future: see #38)
2. Execution Flow¶
Azure Timer Trigger
-> PollRunner.start()
-> StateStore.load()
-> StateStore.acquire_lease() # CAS with ETag
-> SourceAdapter.fetch_changes()
-> core normalizer(raw records → RowChange)
-> Handler.invoke(events)
-> StateStore.commit() # checkpoint + lease in one CAS write
-> StateStore.heartbeat() # if handler is long-running
3. Core Components¶
3.1 PollRunner¶
Overall lifecycle coordinator.
Responsibilities: - per-tick execution - overlap prevention - batch-level commit - handler exception surfacing - retry-friendly exception classification
3.2 SourceAdapter¶
Abstraction for reading changes from each database.
Key methods:
- validate()
- open() / close()
- fetch_changes(checkpoint, limit) → FetchResult (raw records + next_checkpoint)
- capability() → AdapterCapability
The adapter returns raw records; core normalizes them into RowChange.
3.3 StateStore¶
Manages checkpoint and lease as a single state blob.
Contains: - current watermark + tiebreaker PK - last successful batch metadata - lease owner / fencing_token / expires_at / heartbeat_at - source fingerprint
All state changes are performed via ETag-based CAS (conditional write), making lease acquisition, heartbeat, and checkpoint commit atomic.
3.4 EventNormalizer (internal to core)¶
Normalizes raw records returned by the adapter into the common RowChange event.
This logic belongs to core; adapters only return raw records.
3.5 HandlerInvoker¶
Invokes the user handler in two modes:
- simple mode:
handler(events) - rich mode:
handler(events, context)
4. MVP Strategy: Cursor Polling¶
Default algorithm:
checkpoint = (cursor_value, tiebreaker_pk)
SELECT ...
FROM table
WHERE (cursor > :cursor_value)
OR (cursor = :cursor_value AND pk > :tiebreaker_pk)
ORDER BY cursor ASC, pk ASC
LIMIT :batch_size
Advantages of This Strategy¶
- High DB portability
- Integrates well with SQLAlchemy
- Reproducible locally and in production
- Easy to debug
Limitations¶
- Cannot detect hard deletes (alternative strategies required)
- Sensitive to
updated_atprecision and clock skew - Difficult to provide row-level before/after diffs
- Not a true event stream
5. Alternative Strategies¶
5.1 OutboxStrategy¶
The service writes to an outbox table; the library only consumes the outbox.
5.2 HashDiffStrategy¶
Periodically compares snapshots/hashes to infer changes. High cost; reserved for special cases.
5.3 NativeChangeStreamStrategy¶
Used when the database provides a native change stream/CDC.
Examples: - PostgreSQL logical decoding - MySQL binlog-based - MongoDB change stream - SQL Server change tracking / CDC
6. Lease Design¶
Even with a timer trigger, there is a risk of overlap during scale-out or restart timing. Therefore, the library manages a lease embedded within a single state blob.
Lease fields in the state blob: - owner_id - fencing_token - acquired_at - expires_at - heartbeat_at
Principles: - Other instances skip if a valid lease exists - A heartbeat is required for long-running handlers - Writers with a lower fencing token are forbidden from committing - Lease acquisition, renewal, and commit are all performed atomically via ETag CAS - No separate lease blob — prevents TOCTOU races
7. Commit Model¶
Default is checkpoint advance after batch success.
fetch batch -> invoke handler -> success -> commit new checkpoint
-> failure -> no checkpoint advance
This model allows duplicates but minimizes data loss.
8. Partitioning¶
MVP is based on a single logical stream. Partitioning will be introduced in subsequent versions.
Partition key candidates: - schema/table - tenant_id - modulo hash(pk) - native partition id (change stream-based)
9. Deployment Topology¶
Simple Configuration¶
- Azure Function App
- Azure Storage account
- Target database
- Application Insights
Extended Configuration¶
- All of the above +
- Service Bus/Event Hub relay
- Dedicated checkpoint store
- Multiple pollers per app
10. Architecture Principles¶
- Keep execution simple; keep semantics explicit
- Broad DB reach; conservative guarantees
- Handlers behave like pure functions; orchestration is the framework's job
- Operational concerns built-in by default; abstractions kept thin