Core Concepts¶
This document explains the fundamental abstractions in BenchForge. Understanding these concepts is essential for writing effective scenarios and interpreting results.
Scenario¶
A scenario is a YAML file that fully describes a benchmark experiment. It contains:
| Section | Purpose |
|---|---|
name |
Human-readable identifier |
description |
Optional description of what is being measured |
setup |
SQL queries executed before each iteration |
teardown |
SQL queries executed after each iteration |
steps |
The workload — queries to execute during measurement |
load |
Concurrency, duration, and warmup configuration |
experiment |
Multi-iteration settings (iterations, seed, pause) |
targets |
Database access stacks to benchmark |
A scenario is the unit of reproducibility. Given the same scenario file and database state, BenchForge produces comparable results across runs.
Step¶
A step is a single query within a scenario. Each step has:
name— Identifier for the step (appears in results and charts)query— SQL query string using%(param)splaceholder syntaxparams— Optional parameter generators (e.g.,random_int(1, 1000))
During execution, all steps are run in round-robin order by each worker thread. Latency is measured per-step, not per-query-type.
Parameter Generators¶
BenchForge supports inline parameter generation in the scenario DSL:
| Generator | Example | Description |
|---|---|---|
random_int(low, high) |
random_int(1, 1000) |
Uniform random integer in [low, high] |
random_choice(a, b, c) |
random_choice('read', 'write') |
Random selection from list |
Parameters are resolved fresh on each execution. When a seed is provided, parameter generation is deterministic and reproducible.
Target¶
A target defines a specific database access stack to benchmark:
targets:
- name: psycopg-raw # Display name
stack_id: python+psycopg # Registry key for worker lookup
language: python # Language identifier
driver: psycopg # Driver name
dsn: "postgresql://..." # Connection string
worker_config: {} # Optional driver-specific config
Multiple targets in a single scenario are benchmarked under identical conditions (same workload, same setup/teardown, same load profile), enabling fair comparison.
Worker¶
A worker is the execution engine for a target. Workers implement a strict lifecycle protocol:
| Method | Purpose |
|---|---|
setup() |
Store configuration (DSN, worker config). No connections yet. |
open() |
Establish database connection. Thread-local — never shared. |
warmup() |
Run queries without measurement to warm JIT, caches, etc. |
execute() |
Execute a single step. Runner measures latency externally. |
execute_raw() |
Execute raw SQL for setup/teardown (no parameter binding). |
introspect() |
Return server metadata (version, config) for reproducibility. |
close() |
Release connection and resources. |
Each concurrent thread gets its own Worker instance. Connection sharing across threads is forbidden.
Built-in Workers¶
| Stack ID | Worker | Description |
|---|---|---|
python+psycopg |
PsycopgWorker |
Raw psycopg3, one connection per thread, autocommit |
python+sqlalchemy |
SQLAlchemyWorker |
SQLAlchemy Core with shared engine, text() queries |
Iteration¶
An iteration is a single complete execution of all steps against all targets. Multi-iteration experiments run the full workload multiple times to measure statistical variance.
Each iteration: 1. Executes setup queries (if defined) 2. Runs the warmup phase (excluded from measurement) 3. Measures the workload for the configured duration 4. Executes teardown queries (if defined)
Between iterations, BenchForge pauses for pause_between seconds (default 5.0)
to allow OS and database caches to stabilize.
Experiment¶
An experiment is the complete multi-iteration run:
experiment:
iterations: 5 # Number of complete runs
seed: 42 # Base seed for reproducibility
pause_between: 2.0 # Seconds between iterations
- iterations: How many times to repeat the full workload. More iterations produce tighter confidence intervals.
- seed: Base seed for deterministic parameter generation. Iteration i uses
seed
seed + i, and each thread gets its own derived RNG. - pause_between: Quiet time between iterations to reduce carry-over effects.
Result Schema¶
BenchForge uses a versioned JSON result schema (currently v2). Key models:
| Model | Description |
|---|---|
RunResult |
Top-level — contains everything from a single benchmark session |
IterationResult |
Results for one iteration (targets, duration, seed) |
TargetResult |
Results for one target in one iteration (steps, overall latency, errors) |
StepResult |
Per-step metrics (ops, throughput, latency summary, time-series, ECDF samples) |
AggregateTargetResult |
Cross-iteration statistics (mean, stdev, CV, bootstrap CI) |
CompareResult |
Comparison between two runs (latency/throughput ratios) |
Results are saved as JSON and can be loaded for reporting, comparison, or further analysis with external tools.
Load Profile¶
The load profile controls how BenchForge drives the workload:
load:
concurrency: 4 # Number of concurrent worker threads
duration: 10 # Measurement duration in seconds
warmup:
duration: 3 # Warmup duration in seconds (excluded from measurement)
BenchForge uses a closed-loop model: each thread executes queries back-to-back as fast as possible (no artificial think time). This measures maximum throughput under the given concurrency level.
Threads are synchronized at start using a threading.Barrier — all threads
begin measurement at the same instant.
Setup and Teardown¶
Setup queries run before each iteration to prepare the database state. Teardown queries run after each iteration to clean up.
setup:
queries:
- "CREATE TABLE IF NOT EXISTS users (...)"
- "INSERT INTO users (...) SELECT ..."
teardown:
queries:
- "TRUNCATE TABLE users"
This per-iteration isolation ensures each iteration starts from an identical database state, which is critical for reproducible results.
Setup failures are fail-fast (abort the run). Teardown failures are logged but do not abort.