Skip to content

Architecture

1. Top-Level Structure

azure-functions-db is composed of three top-level modules:

  1. core
  2. DbConfig
  3. EngineProvider
  4. shared types
  5. shared errors
  6. serializers
  7. trigger
  8. polling orchestrator, source adapter, state management, Azure Functions integration
  9. consumes core to handle pseudo trigger execution
  10. binding
    • DbReader (input binding)
    • DbWriter (output binding)
    • per-invocation session/connection lifecycle management
  11. decorator
    • DbBindings class providing Azure Functions-style decorators
    • trigger wraps PollTrigger, input injects query results, output injects DbOut for explicit writes
    • inject_reader / inject_writer provide imperative DbReader / DbWriter injection
    • consumes both trigger and binding modules

1.1 Core Layer

Owns shared DB configuration, engine, types, errors, and serialization responsibilities.

  • DbConfig: shared configuration representation including DB URL, schema, table, etc.
  • EngineProvider: lazy singleton engine/pool management keyed by config
  • shared types: Row, RowDict, common payload types
  • shared errors: base exceptions inherited by both trigger and binding layers
  • serializers: DB value normalization and row dict conversion

1.2 Trigger Layer

The existing trigger-related layers are preserved; DB connectivity, types, errors, and serialization use core.

1.3 Binding Layer

The binding layer provides an imperative API called directly from Azure Functions function bodies.

  • DbReader: single-row lookup by PK, raw SQL queries
  • DbWriter: insert/upsert/update/delete and batch writes
  • per-invocation session management: sessions are opened and closed per function invocation by default

1.4 Dependency Rules

  • trigger -> core dependency only
  • binding -> core dependency only
  • decorator -> trigger, binding, core (thin orchestration layer)
  • Cross-import between trigger and binding is forbidden

1.5 Request Flow and Runtime Relationship

azure-functions-db decorators integrate at two lifecycle points within the Azure Functions runtime:

  1. Import time — decorators validate configuration. output also creates a module-scoped DbOut instance.
  2. Per invocation — the decorator wrapper performs DB I/O around the handler call. The exact behavior differs by decorator type.

The four decorator types have distinct invocation flows:

Decorator Pre-handler Handler receives Post-handler
input Creates DbReader, reads data, closes reader Query result (data)
output DbOut (module-scoped) DbWriter created only when handler calls .set()
inject_reader Creates DbReader DbReader instance Closes reader
inject_writer Creates DbWriter DbWriter instance Closes writer

Binding Flow (input / inject_reader / inject_writer)

sequenceDiagram
    participant Client as HTTP Client
    participant Host as Azure Functions Host
    participant Worker as Python Worker
    participant Dec as Decorator Wrapper
    participant DB as DbReader / DbWriter
    participant Handler as Function Handler

    rect rgb(240, 248, 255)
    note over Worker,Dec: Import Time (startup)
    Worker->>Worker: import function modules
    Worker->>Dec: @db.input() / @db.inject_reader() / @db.inject_writer() validates config
    end

    rect rgb(255, 248, 240)
    note over Client,DB: Per Invocation
    Client->>Host: HTTP Request
    Host->>Worker: invoke function
    Worker->>Dec: decorator wrapper
    Dec->>DB: create per-invocation DbReader or DbWriter
    alt @db.input()
        DB->>DB: execute query / pk lookup
        DB-->>Dec: query result
        Dec->>Handler: call handler(req, data=result)
    else @db.inject_reader() / @db.inject_writer()
        Dec->>Handler: call handler(req, reader=DbReader) or handler(req, writer=DbWriter)
        Handler->>DB: handler performs reads / writes directly
    end
    Handler-->>Dec: return result
    Dec->>DB: close reader / writer
    Dec-->>Worker: HttpResponse
    Worker-->>Host: response
    Host-->>Client: HTTP Response
    end

Output Flow (@db.output)

output differs from the other decorators: DbOut is created once at decoration time (module-scoped), and a DbWriter is only created when the handler calls .set().

sequenceDiagram
    participant Client as HTTP Client
    participant Host as Azure Functions Host
    participant Worker as Python Worker
    participant Dec as Decorator Wrapper
    participant Out as DbOut (module-scoped)
    participant Handler as Function Handler
    participant DB as DbWriter

    rect rgb(240, 248, 255)
    note over Worker,Out: Import Time (startup)
    Worker->>Worker: import function modules
    Worker->>Dec: @db.output() validates config
    Dec->>Out: create DbOut instance (once)
    end

    rect rgb(255, 248, 240)
    note over Client,DB: Per Invocation
    Client->>Host: HTTP Request
    Host->>Worker: invoke function
    Worker->>Dec: decorator wrapper
    Dec->>Handler: call handler(req, out=DbOut)
    Handler->>Out: out.set(data)
    Out->>DB: create DbWriter, insert/upsert, close
    Handler-->>Dec: return result
    Dec-->>Worker: HttpResponse
    Worker-->>Host: response
    Host-->>Client: HTTP Response
    end

Trigger Flow (poll-based change detection)

sequenceDiagram
    participant Timer as Azure Timer Trigger
    participant Host as Azure Functions Host
    participant Worker as Python Worker
    participant PT as PollTrigger
    participant PR as PollRunner
    participant SS as StateStore (Blob)
    participant SA as SourceAdapter
    participant DB as Database
    participant Handler as User Handler

    Timer->>Host: scheduled tick
    Host->>Worker: invoke trigger function
    Worker->>PT: PollTrigger.run(timer, handler)
    PT->>PR: construct PollRunner
    PR->>PR: runner.tick()
    PR->>SS: acquire lease (ETag CAS)
    PR->>SS: load checkpoint
    PR->>SA: source.fetch(cursor, batch_size)
    SA->>DB: SELECT ... WHERE cursor > checkpoint
    DB-->>SA: changed rows
    SA-->>PR: raw records
    PR->>PR: normalize(raw records) → events
    PR->>Handler: invoke(events)
    Handler-->>PR: success
    PR->>SS: commit new checkpoint (ETag CAS)
    PR-->>Worker: done
    Worker-->>Host: completed

When an EngineProvider is passed to a decorator, it supplies a shared connection pool across invocations at module scope. When omitted, each DbReader or DbWriter creates and owns its own engine. DbReader and DbWriter instances from input, inject_reader, and inject_writer are always per-invocation and closed automatically by the decorator wrapper.

Async Support

Decorator Async Handler Mechanism
trigger ❌ Not supported PollTrigger.run is synchronous; async handlers are rejected at decoration time
input ✅ Supported DB I/O runs via asyncio.to_thread()
output ✅ Supported DB write runs via asyncio.to_thread() through _AsyncDbOutProxy
inject_reader ✅ Supported Returns _AsyncDbReaderProxy with async methods
inject_writer ✅ Supported Returns _AsyncDbWriterProxy with async methods

When an async handler is used with input, output, inject_reader, or inject_writer, all blocking database operations are automatically executed in a worker thread via asyncio.to_thread(), keeping the event loop free.

The trigger decorator explicitly rejects async handlers with a ConfigurationError because PollTrigger.run() is synchronous and calling an async handler without await would silently produce an unawaited coroutine.

Input Binding Modes

The input decorator operates in two mutually exclusive modes. Exactly one of pk or query must be provided.

Mode Parameter Returns Use When
Row lookup pk= dict \| None You need a single row by primary key
SQL query query= list[dict] You need multiple rows or custom SQL

Both pk and params accept either a static dict or a callable that resolves values from other handler parameters at invocation time.

Decorator Composition

DbBindings decorators can be combined on a single handler. The following rules are enforced at decoration time:

Valid Combinations

Combination Use Case
trigger + output Process DB changes and write results
trigger + inject_writer Process DB changes with imperative writes
input + output Read data and write results
input + inject_writer Read data with imperative writes
inject_reader + inject_writer Full imperative control
inject_reader + output Imperative read + auto-write

Invalid Combinations

Combination Reason
input + inject_reader Redundant — both read data, use one
output + inject_writer Redundant — both write data, use one
Any decorator applied twice Not meaningful

Ordering

Azure Functions decorators (e.g., @app.schedule) must be outermost. DbBindings decorators should be closest to the function definition.

Concurrency and Thread Safety

Azure Functions reuses workers and may invoke functions concurrently. This section documents the lifecycle and thread-safety guarantees of each component.

Component Lifecycle

Component Scope Lifecycle Thread-Safe
EngineProvider Module-level (shared) Long-lived, reused across invocations ✅ Yes — threading.Lock guards cache
SqlAlchemySource Module-level (shared) Long-lived, reused across invocations ✅ Yes — read-only after construction
BlobCheckpointStore Module-level (shared) Long-lived, reused across invocations ✅ Yes — ETag-based CAS for all mutations
DbReader Per-invocation Created and closed within a single function call N/A — not shared
DbWriter Per-invocation Created and closed within a single function call N/A — not shared

EngineProvider

EngineProvider caches SQLAlchemy Engine instances keyed by connection configuration. The internal cache is protected by a threading.Lock, making concurrent get_engine() calls safe. SQLAlchemy's Engine itself manages its own connection pool with thread-safe checkout/return.

Module-level usage pattern (recommended):

engine_provider = EngineProvider()  # shared across all invocations

@db.input("user", url="%DB_URL%", table="users", pk={"id": 1},
          engine_provider=engine_provider)
def handler(user): ...

DbReader / DbWriter

These are created per invocation by the decorator layer (or manually by the user). Each instance opens its own SQLAlchemy session and closes it when close() is called. Decorators manage this lifecycle automatically — users never need to call close() when using input, output, inject_reader, or inject_writer.

BlobCheckpointStore

Uses Azure Blob Storage ETag-based CAS (Compare-And-Swap) for all state mutations. Concurrent timer triggers racing to acquire the lease are safely resolved — only one wins, others skip the tick. This is the mechanism that prevents duplicate processing during scale-out.

Reflected Metadata

SQLAlchemy caches reflected table metadata on the Engine's MetaData object. If a schema migration occurs while the Function App is running, the cached metadata may become stale. Mitigation options: - Restart the Function App after schema migrations - Use explicit Table definitions instead of reflection (future: see #38)

2. Execution Flow

Azure Timer Trigger
    -> PollRunner.start()
        -> StateStore.load()
        -> StateStore.acquire_lease()    # CAS with ETag
        -> SourceAdapter.fetch_changes()
        -> core normalizer(raw records → RowChange)
        -> Handler.invoke(events)
        -> StateStore.commit()           # checkpoint + lease in one CAS write
        -> StateStore.heartbeat()        # if handler is long-running

3. Core Components

3.1 PollRunner

Overall lifecycle coordinator.

Responsibilities: - per-tick execution - overlap prevention - batch-level commit - handler exception surfacing - retry-friendly exception classification

3.2 SourceAdapter

Abstraction for reading changes from each database.

Key methods: - validate() - open() / close() - fetch_changes(checkpoint, limit)FetchResult (raw records + next_checkpoint) - capability()AdapterCapability

The adapter returns raw records; core normalizes them into RowChange.

3.3 StateStore

Manages checkpoint and lease as a single state blob.

Contains: - current watermark + tiebreaker PK - last successful batch metadata - lease owner / fencing_token / expires_at / heartbeat_at - source fingerprint

All state changes are performed via ETag-based CAS (conditional write), making lease acquisition, heartbeat, and checkpoint commit atomic.

3.4 EventNormalizer (internal to core)

Normalizes raw records returned by the adapter into the common RowChange event. This logic belongs to core; adapters only return raw records.

3.5 HandlerInvoker

Invokes the user handler in two modes:

  • simple mode: handler(events)
  • rich mode: handler(events, context)

4. MVP Strategy: Cursor Polling

Default algorithm:

checkpoint = (cursor_value, tiebreaker_pk)

SELECT ...
FROM table
WHERE (cursor > :cursor_value)
   OR (cursor = :cursor_value AND pk > :tiebreaker_pk)
ORDER BY cursor ASC, pk ASC
LIMIT :batch_size

Advantages of This Strategy

  • High DB portability
  • Integrates well with SQLAlchemy
  • Reproducible locally and in production
  • Easy to debug

Limitations

  • Cannot detect hard deletes (alternative strategies required)
  • Sensitive to updated_at precision and clock skew
  • Difficult to provide row-level before/after diffs
  • Not a true event stream

5. Alternative Strategies

5.1 OutboxStrategy

The service writes to an outbox table; the library only consumes the outbox.

5.2 HashDiffStrategy

Periodically compares snapshots/hashes to infer changes. High cost; reserved for special cases.

5.3 NativeChangeStreamStrategy

Used when the database provides a native change stream/CDC.

Examples: - PostgreSQL logical decoding - MySQL binlog-based - MongoDB change stream - SQL Server change tracking / CDC

6. Lease Design

Even with a timer trigger, there is a risk of overlap during scale-out or restart timing. Therefore, the library manages a lease embedded within a single state blob.

Lease fields in the state blob: - owner_id - fencing_token - acquired_at - expires_at - heartbeat_at

Principles: - Other instances skip if a valid lease exists - A heartbeat is required for long-running handlers - Writers with a lower fencing token are forbidden from committing - Lease acquisition, renewal, and commit are all performed atomically via ETag CAS - No separate lease blob — prevents TOCTOU races

7. Commit Model

Default is checkpoint advance after batch success.

fetch batch -> invoke handler -> success -> commit new checkpoint
                               -> failure -> no checkpoint advance

This model allows duplicates but minimizes data loss.

8. Partitioning

MVP is based on a single logical stream. Partitioning will be introduced in subsequent versions.

Partition key candidates: - schema/table - tenant_id - modulo hash(pk) - native partition id (change stream-based)

9. Deployment Topology

Simple Configuration

  • Azure Function App
  • Azure Storage account
  • Target database
  • Application Insights

Extended Configuration

  • All of the above +
  • Service Bus/Event Hub relay
  • Dedicated checkpoint store
  • Multiple pollers per app

10. Architecture Principles

  • Keep execution simple; keep semantics explicit
  • Broad DB reach; conservative guarantees
  • Handlers behave like pure functions; orchestration is the framework's job
  • Operational concerns built-in by default; abstractions kept thin