Seed Management¶
DBWarden provides built-in seed data management for populating databases with initial or reference data. Seeds complement migrations by handling data that belongs in version control.
Overview¶
There are two ways to define seeds, listed in order of preference:
- Code seeds (recommended): define seeds inline alongside your SQLAlchemy models using the
Seedbase class or@seed_datadecorator. No separate files, no manual versioning. - File seeds: traditional
.sqlor.pyfiles in aseeds/directory, useful for complex multi-statement SQL.
Both are tracked in the _dbwarden_seeds table and applied via dbwarden seed apply.
Code Seeds (Recommended)¶
Code seeds live alongside your models in your model_paths directories. They are the recommended way to define seed data because they stay in sync with your schema, support IDE autocompletion, and do not require manual version management.
Seed Base Class¶
Inherit from Seed and set a model + rows:
from dbwarden.seed import Seed
class CountrySeed(Seed):
__seed_database__ = "primary"
__seed_description__ = "initial countries"
__seed_on_conflict__ = "update"
__seed_conflict_columns__ = ["code"]
model = Country
rows = [
Country(code="UY", name="Uruguay"),
Country(code="AR", name="Argentina"),
]
Key advantages over the old decorator approach:
- Full IDE autocompletion:
rowsuses model instances directly, so your editor knows the column names and types - No
versionparameter: versions are auto-assigned (C0001,C0002, ...) based on deterministic class ordering - No manual import of
SeedRow: thoughSeedRowis still available if you prefer dict-like rows
Seed Class Reference¶
| Attribute | Default | Description |
|---|---|---|
__seed_database__ |
"default" |
Routes the seed to the named database handle configured in database_config(...). |
__seed_description__ |
"" |
Human-readable label shown in dbwarden seed list output. |
__seed_on_conflict__ |
"ignore" |
What to do when a row with matching columns exists: "ignore" (skip silently), "update" (overwrite), or "error" (raise). |
__seed_conflict_columns__ |
None |
List of column names used for conflict detection. Required when __seed_on_conflict__ is "update". |
Model Instances in Rows¶
Because rows accepts model instances, you get full autocompletion from your Mapped annotations:
from dbwarden.seed import Seed
class RepoSeed(Seed):
__seed_database__ = "clickhouse"
__seed_description__ = "Tracked Repos"
model = Repo
rows = [
Repo(name="dbwarden", owner="anomalyco", is_org=True, default_branch="main"),
Repo(name="vigil", owner="anomalyco", is_org=True, default_branch="master"),
]
Your editor will suggest name, owner, is_org, default_branch etc. as you type.
SQLAlchemy 2.0's
DeclarativeBasedoes not accept positional arguments in the constructor. Always use keyword arguments when instantiating models inrows:Repo(name="dbwarden", ...)instead ofRepo("dbwarden", ...).
SeedRow (Alternative)¶
If you prefer dict-like rows, SeedRow still works:
from dbwarden.seed import Seed, SeedRow
class CountrySeed(Seed):
__seed_database__ = "primary"
__seed_description__ = "initial countries"
__seed_on_conflict__ = "update"
__seed_conflict_columns__ = ["code"]
model = Country
rows = [
SeedRow(code="UY", name="Uruguay"),
SeedRow(code="AR", name="Argentina"),
]
on_conflict Behavior¶
| Value | Behavior |
|---|---|
"ignore" (default) |
Skips existing rows silently |
"update" |
Updates existing rows with new values |
"error" |
Raises an error on conflict |
Logic-Based Seeds¶
Define a generate(session) static/class method for programmatic data:
class PermissionSeed(Seed):
__seed_database__ = "primary"
__seed_description__ = "load permissions"
__seed_on_conflict__ = "ignore"
model = Permission
@staticmethod
def generate(session):
for resource in ["users", "orders"]:
for action in ["read", "write", "delete"]:
session.add(Permission(name=f"{resource}:{action}"))
@seed_data Decorator (Deprecated)¶
The old decorator still works but is deprecated in favour of the Seed base class:
from dbwarden.seed import seed_data, SeedRow
@seed_data(
database="primary",
description="initial countries",
on_conflict="update",
conflict_columns=["code"],
)
class CountrySeed:
model = Country
rows = [SeedRow(code="UY", name="Uruguay")]
Note that version is no longer required; it is auto-assigned.
Discovery and Ordering¶
Code seeds are discovered through the same model_paths scan as models. They use auto-assigned versions in the C namespace (C0001, C0002, ...) and are sorted deterministically by module and class name. Pending detection compares the class qualified name against the _dbwarden_seeds tracking table.
File Seeds (Traditional)¶
File seeds live in a seeds/ directory and are useful for complex multi-statement SQL or when you need to hand-craft seed files.
Directory Structure¶
Each file follows the naming convention:
Creating File Seeds¶
Creates a file like seeds/V0001__seed_initial_users.sql:
Python File Seeds¶
Creates seeds/V0001__generate_sample_data.py with a seed(connection, session) function.
The seed() function receives both a raw SQLAlchemy Connection and an ORM Session bound to the same transaction:
# Using raw connection
def seed(connection, session):
for i in range(100):
connection.execute(
"INSERT INTO users (name) VALUES (:name)",
{"name": f"user_{i}"},
)
# Using ORM session
def seed(connection, session):
for i in range(100):
session.add(User(name=f"user_{i}"))
session.flush()
Applying Seeds¶
Apply all pending seeds (file + code seeds are both discovered):
Apply a specific version:
Apply to all databases:
Dry Run¶
Preview what would be applied without executing:
Auto-Apply After Migrations¶
Configure seeds to be applied automatically after each dbwarden migrate:
database_config(
database_name="primary",
default=True,
database_type="sqlite",
database_url_sync="sqlite:///./app.db",
model_paths=["models"],
auto_apply_seeds=True,
)
Or apply seeds once after a migration without changing config:
Listing Seeds¶
Output:
Seeds for database 'primary':
V0001 seed_initial_users applied 2025-06-01 10:00:00
C0001 initial countries pending (code seed)
List across all databases:
Pruning Orphaned Records¶
Remove tracking records for seed files that no longer exist on disk:
Rolling Back Seeds¶
Rollback removes the applied tracking record, allowing the seed to be re-applied. It does not reverse data changes.
# Rollback the most recent seed
$ dbwarden seed rollback --database primary
# Rollback a specific number
$ dbwarden seed rollback --database primary --count 2
# Rollback to a specific version
$ dbwarden seed rollback --database primary --to-version 0002
Seed Tracking Table¶
DBWarden tracks applied seeds in _dbwarden_seeds (configurable via seed_table):
| Column | Description |
|---|---|
version |
4-digit seed version (V0001) or code seed ID (C0001) |
description |
Human-readable description |
filename |
File path or code seed identifier |
seed_type |
sql, python, or code |
checksum |
SHA-256 hash of file/class source |
applied_at |
Timestamp of application |
The tracking table is created automatically on first seed apply. Each version can only be applied once until rolled back.
Checksum Drift¶
When a seed file has been modified since it was last applied, DBWarden emits a warning:
This helps detect accidental changes to already-applied seeds.
Exporting Seeds for Production¶
Code seeds require your full application environment to execute. For Dockerized deployments where you don't want to copy the application code into a container just to seed data, use dbwarden seed export to produce stateless ROC (runs-on-change) SQL files.
seeds/ROC__clickhouse__code_seeds.sql containing INSERT ... ON CONFLICT statements rendered in the target database dialect. In production, apply with:
Because the file is ROC, updating the code seed and re-exporting produces a new content checksum, which triggers re-application. The ON CONFLICT DO UPDATE clause handles updating existing rows; no need to delete and recreate.
Non-handled problems:
- Rows removed from a code seed are not automatically deleted in the target database
- Logic seeds that depend on other logic seeds' output are not supported (preceding row-based seeds are pre-loaded, but logic-to-logic ordering is not)
- Non-deterministic
generate()methods (e.g. usingdatetime.now()) produce a new checksum every export, causing re-apply on every deploy: acceptable for idempotent upserts, wasteful for pure inserts. Use deterministicgenerate()where possible
Dialect requirement: Exporting requires the same dialect packages as connecting to that database. For ClickHouse, install clickhouse-sqlalchemy. Missing packages produce a clear error at export time.
Seeds and Migrations¶
Seeds are independent from migrations. You can:
- Apply migrations without seeds
- Apply seeds without migrations
- Mix both in your workflow
The dbwarden status command and the FastAPI GET /status endpoint report both pending migrations and pending seeds.
Seeds in FastAPI¶
The DBWardenRouter includes seed status in its GET /status response:
{
"databases": {
"primary": {
"status": "ok",
"connected": true,
"pending_migrations": 0,
"applied_migrations": 5,
"pending_seeds": 2,
"applied_seeds": 1,
"lock_active": false,
"error": null
}
}
}
See FastAPI Reference for details.
See also: Cookbook: Seeds