Architecture¶

1. Top-Level Structure¶

azure-functions-db-python is composed of three top-level modules:

core
DbConfig
EngineProvider
shared types
shared errors
serializers
trigger
polling orchestrator, source adapter, state management, Azure Functions integration
consumes core to handle pseudo trigger execution
binding
- DbReader (input binding)
- DbWriter (output binding)
- per-invocation session/connection lifecycle management
decorator
- DbBindings class providing Azure Functions-style decorators
- trigger wraps PollTrigger, input injects query results, output injects DbOut for explicit writes
- inject_reader / inject_writer provide imperative DbReader / DbWriter injection
- consumes both trigger and binding modules

1.1 Core Layer¶

Owns shared DB configuration, engine, types, errors, and serialization responsibilities.

DbConfig: shared configuration representation including DB URL, schema, table, etc.
EngineProvider: lazy singleton engine/pool management keyed by config
shared types: Row, RowDict, common payload types
shared errors: base exceptions inherited by both trigger and binding layers
serializers: DB value normalization and row dict conversion

1.2 Trigger Layer¶

The existing trigger-related layers are preserved; DB connectivity, types, errors, and serialization use core.

1.3 Binding Layer¶

The binding layer provides an imperative API called directly from Azure Functions function bodies.

DbReader: single-row lookup by PK, raw SQL queries
DbWriter: insert/upsert/update/delete and batch writes
per-invocation session management: sessions are opened and closed per function invocation by default

1.4 Dependency Rules¶

trigger -> core dependency only
binding -> core dependency only
decorator -> trigger, binding, core (thin orchestration layer)
Cross-import between trigger and binding is forbidden

1.5 Request Flow and Runtime Relationship¶

azure-functions-db-python decorators integrate at two lifecycle points within the Azure Functions runtime:

Import time — decorators validate configuration. output also creates a module-scoped DbOut instance.
Per invocation — the decorator wrapper performs DB I/O around the handler call. The exact behavior differs by decorator type.

The four decorator types have distinct invocation flows:

Decorator	Pre-handler	Handler receives	Post-handler
`input`	Creates `DbReader`, reads data, closes reader	Query result (data)	—
`output`	—	`DbOut` (module-scoped)	`DbWriter` created only when handler calls `.set()`
`inject_reader`	Creates `DbReader`	`DbReader` instance	Closes reader
`inject_writer`	Creates `DbWriter`	`DbWriter` instance	Closes writer

Binding Flow (input / inject_reader / inject_writer)¶

sequenceDiagram
    participant Client as HTTP Client
    participant Host as Azure Functions Host
    participant Worker as Python Worker
    participant Dec as Decorator Wrapper
    participant DB as DbReader / DbWriter
    participant Handler as Function Handler

    rect rgb(240, 248, 255)
    note over Worker,Dec: Import Time (startup)
    Worker->>Worker: import function modules
    Worker->>Dec: @db.input() / @db.inject_reader() / @db.inject_writer() validates config
    end

    rect rgb(255, 248, 240)
    note over Client,DB: Per Invocation
    Client->>Host: HTTP Request
    Host->>Worker: invoke function
    Worker->>Dec: decorator wrapper
    Dec->>DB: create per-invocation DbReader or DbWriter
    alt @db.input()
        DB->>DB: execute query / pk lookup
        DB-->>Dec: query result
        Dec->>Handler: call handler(req, data=result)
    else @db.inject_reader() / @db.inject_writer()
        Dec->>Handler: call handler(req, reader=DbReader) or handler(req, writer=DbWriter)
        Handler->>DB: handler performs reads / writes directly
    end
    Handler-->>Dec: return result
    Dec->>DB: close reader / writer
    Dec-->>Worker: HttpResponse
    Worker-->>Host: response
    Host-->>Client: HTTP Response
    end

Output Flow (@db.output)¶

output differs from the other decorators: DbOut is created once at decoration time (module-scoped), and a DbWriter is only created when the handler calls .set().

sequenceDiagram
    participant Client as HTTP Client
    participant Host as Azure Functions Host
    participant Worker as Python Worker
    participant Dec as Decorator Wrapper
    participant Out as DbOut (module-scoped)
    participant Handler as Function Handler
    participant DB as DbWriter

    rect rgb(240, 248, 255)
    note over Worker,Out: Import Time (startup)
    Worker->>Worker: import function modules
    Worker->>Dec: @db.output() validates config
    Dec->>Out: create DbOut instance (once)
    end

    rect rgb(255, 248, 240)
    note over Client,DB: Per Invocation
    Client->>Host: HTTP Request
    Host->>Worker: invoke function
    Worker->>Dec: decorator wrapper
    Dec->>Handler: call handler(req, out=DbOut)
    Handler->>Out: out.set(data)
    Out->>DB: create DbWriter, insert/upsert, close
    Handler-->>Dec: return result
    Dec-->>Worker: HttpResponse
    Worker-->>Host: response
    Host-->>Client: HTTP Response
    end

Trigger Flow (poll-based change detection)¶

sequenceDiagram
    participant Timer as Azure Timer Trigger
    participant Host as Azure Functions Host
    participant Worker as Python Worker
    participant PT as PollTrigger
    participant PR as PollRunner
    participant SS as StateStore (Blob)
    participant SA as SourceAdapter
    participant DB as Database
    participant Handler as User Handler

    Timer->>Host: scheduled tick
    Host->>Worker: invoke trigger function
    Worker->>PT: PollTrigger.run(timer, handler)
    PT->>PR: construct PollRunner
    PR->>PR: runner.tick()
    PR->>SS: acquire lease (ETag CAS)
    PR->>SS: load checkpoint
    PR->>SA: source.fetch(cursor, batch_size)
    SA->>DB: SELECT ... WHERE cursor > checkpoint
    DB-->>SA: changed rows
    SA-->>PR: raw records
    PR->>PR: normalize(raw records) → events
    PR->>Handler: invoke(events)
    Handler-->>PR: success
    PR->>SS: commit new checkpoint (ETag CAS)
    PR-->>Worker: done
    Worker-->>Host: completed

When an EngineProvider is passed to a decorator, it supplies a shared connection pool across invocations at module scope. When omitted, each DbReader or DbWriter creates and owns its own engine. DbReader and DbWriter instances from input, inject_reader, and inject_writer are always per-invocation and closed automatically by the decorator wrapper.

Async Support¶

Decorator	Async Handler	Mechanism
`trigger`	❌ Not supported	`PollTrigger.run` is synchronous; async handlers are rejected at decoration time
`input`	✅ Supported	DB I/O runs via `asyncio.to_thread()`
`output`	✅ Supported	DB write runs via `asyncio.to_thread()` through `_AsyncDbOutProxy`
`inject_reader`	✅ Supported	Returns `_AsyncDbReaderProxy` with async methods
`inject_writer`	✅ Supported	Returns `_AsyncDbWriterProxy` with async methods

When an async handler is used with input, output, inject_reader, or inject_writer, all blocking database operations are automatically executed in a worker thread via asyncio.to_thread(), keeping the event loop free.

The trigger decorator explicitly rejects async handlers with a ConfigurationError because PollTrigger.run() is synchronous and calling an async handler without await would silently produce an unawaited coroutine.

Input Binding Modes¶

The input decorator operates in two mutually exclusive modes. Exactly one of pk or query must be provided.

Mode	Parameter	Returns	Use When
Row lookup	`pk=`	`dict \\| None`	You need a single row by primary key
SQL query	`query=`	`list[dict]`	You need multiple rows or custom SQL

Both pk and params accept either a static dict or a callable that resolves values from other handler parameters at invocation time.

Decorator Composition¶

DbBindings decorators can be combined on a single handler. The following rules are enforced at decoration time:

Valid Combinations¶

Combination	Use Case
`trigger` + `output`	Process DB changes and write results
`trigger` + `inject_writer`	Process DB changes with imperative writes
`input` + `output`	Read data and write results
`input` + `inject_writer`	Read data with imperative writes
`inject_reader` + `inject_writer`	Full imperative control
`inject_reader` + `output`	Imperative read + auto-write

Invalid Combinations¶

Combination	Reason
`input` + `inject_reader`	Redundant — both read data, use one
`output` + `inject_writer`	Redundant — both write data, use one
Any decorator applied twice	Not meaningful

Ordering¶

Azure Functions decorators (e.g., @app.schedule) must be outermost. DbBindings decorators should be closest to the function definition.

Concurrency and Thread Safety¶

Azure Functions reuses workers and may invoke functions concurrently. This section documents the lifecycle and thread-safety guarantees of each component.

Component Lifecycle¶

Component	Scope	Lifecycle	Thread-Safe
`EngineProvider`	Module-level (shared)	Long-lived, reused across invocations	✅ Yes — `threading.Lock` guards cache
`SqlAlchemySource`	Module-level (shared)	Long-lived, reused across invocations	✅ Yes — read-only after construction
`BlobCheckpointStore`	Module-level (shared)	Long-lived, reused across invocations	✅ Yes — ETag-based CAS for all mutations
`DbReader`	Per-invocation	Created and closed within a single function call	N/A — not shared
`DbWriter`	Per-invocation	Created and closed within a single function call	N/A — not shared

EngineProvider¶

EngineProvider caches SQLAlchemy Engine instances keyed by connection configuration. The internal cache is protected by a threading.Lock, making concurrent get_engine() calls safe. SQLAlchemy's Engine itself manages its own connection pool with thread-safe checkout/return.

Module-level usage pattern (recommended):

engine_provider = EngineProvider()  # shared across all invocations

@db.input("user", url="%DB_URL%", table="users", pk={"id": 1},
          engine_provider=engine_provider)
def handler(user): ...

DbReader / DbWriter¶

These are created per invocation by the decorator layer (or manually by the user). Each instance opens its own SQLAlchemy session and closes it when close() is called. Decorators manage this lifecycle automatically — users never need to call close() when using input, output, inject_reader, or inject_writer.

BlobCheckpointStore¶

Uses Azure Blob Storage ETag-based CAS (Compare-And-Swap) for all state mutations. Concurrent timer triggers racing to acquire the lease are safely resolved — only one wins, others skip the tick. This is the mechanism that prevents duplicate processing during scale-out.

Reflected Metadata¶

SQLAlchemy caches reflected table metadata on the Engine's MetaData object. If a schema migration occurs while the Function App is running, the cached metadata may become stale. Mitigation options: - Restart the Function App after schema migrations - Use explicit Table definitions instead of reflection (future: see #38)

2. Execution Flow¶

Azure Timer Trigger
    -> PollRunner.start()
        -> StateStore.load()
        -> StateStore.acquire_lease()    # CAS with ETag
        -> SourceAdapter.fetch_changes()
        -> core normalizer(raw records → RowChange)
        -> Handler.invoke(events)
        -> StateStore.commit()           # checkpoint + lease in one CAS write
        -> StateStore.heartbeat()        # if handler is long-running

3. Core Components¶

3.1 PollRunner¶

Overall lifecycle coordinator.

Responsibilities: - per-tick execution - overlap prevention - batch-level commit - handler exception surfacing - retry-friendly exception classification

3.2 SourceAdapter¶

Abstraction for reading changes from each database.

Key methods: - validate() - open() / close() - fetch_changes(checkpoint, limit) → FetchResult (raw records + next_checkpoint) - capability() → AdapterCapability

The adapter returns raw records; core normalizes them into RowChange.

3.3 StateStore¶

Manages checkpoint and lease as a single state blob.

Contains: - current watermark + tiebreaker PK - last successful batch metadata - lease owner / fencing_token / expires_at / heartbeat_at - source fingerprint

All state changes are performed via ETag-based CAS (conditional write), making lease acquisition, heartbeat, and checkpoint commit atomic.

3.4 EventNormalizer (internal to core)¶

Normalizes raw records returned by the adapter into the common RowChange event. This logic belongs to core; adapters only return raw records.

3.5 HandlerInvoker¶

Invokes the user handler in two modes:

simple mode: handler(events)
rich mode: handler(events, context)

4. MVP Strategy: Cursor Polling¶

Default algorithm:

checkpoint = (cursor_value, tiebreaker_pk)

SELECT ...
FROM table
WHERE (cursor > :cursor_value)
   OR (cursor = :cursor_value AND pk > :tiebreaker_pk)
ORDER BY cursor ASC, pk ASC
LIMIT :batch_size

Advantages of This Strategy¶

High DB portability
Integrates well with SQLAlchemy
Reproducible locally and in production
Easy to debug

Limitations¶

Cannot detect hard deletes (alternative strategies required)
Sensitive to updated_at precision and clock skew
Difficult to provide row-level before/after diffs
Not a true event stream

5. Alternative Strategies¶

5.1 OutboxStrategy¶

The service writes to an outbox table; the library only consumes the outbox.

5.2 HashDiffStrategy¶

Periodically compares snapshots/hashes to infer changes. High cost; reserved for special cases.

5.3 NativeChangeStreamStrategy¶

Used when the database provides a native change stream/CDC.

Examples: - PostgreSQL logical decoding - MySQL binlog-based - MongoDB change stream - SQL Server change tracking / CDC

6. Lease Design¶

Even with a timer trigger, there is a risk of overlap during scale-out or restart timing. Therefore, the library manages a lease embedded within a single state blob.

Lease fields in the state blob: - owner_id - fencing_token - acquired_at - expires_at - heartbeat_at

Principles: - Other instances skip if a valid lease exists - A heartbeat is required for long-running handlers - Writers with a lower fencing token are forbidden from committing - Lease acquisition, renewal, and commit are all performed atomically via ETag CAS - No separate lease blob — prevents TOCTOU races

7. Commit Model¶

Default is checkpoint advance after batch success.

fetch batch -> invoke handler -> success -> commit new checkpoint
                               -> failure -> no checkpoint advance

This model allows duplicates but minimizes data loss.

8. Partitioning¶

MVP is based on a single logical stream. Partitioning will be introduced in subsequent versions.

Partition key candidates: - schema/table - tenant_id - modulo hash(pk) - native partition id (change stream-based)

9. Deployment Topology¶

Simple Configuration¶

Azure Function App
Azure Storage account
Target database
Application Insights

Extended Configuration¶

All of the above +
Service Bus/Event Hub relay
Dedicated checkpoint store
Multiple pollers per app

10. Architecture Principles¶

Keep execution simple; keep semantics explicit
Broad DB reach; conservative guarantees
Handlers behave like pure functions; orchestration is the framework's job
Operational concerns built-in by default; abstractions kept thin