ClickHouse¶
DBWarden treats ClickHouse as a first-class backend: every natively supported feature is reverse-engineered, diffed, and emitted as correct DDL.
First-Class Features¶
"First-class" means the round-trip is verified: reverse-engineer a live database with generate-models, feed the output back into make-migrations, and get zero diff.
# Step 1: reverse-engineer your live ClickHouse database
$ dbwarden generate-models -d analytics
# Step 2: feed the generated models back in, zero diff
$ dbwarden make-migrations -d analytics
# -> "No new migrations to generate" (output is empty; your models match the DB exactly)
The following ClickHouse features are fully supported in this round-trip:
| Category | Features |
|---|---|
| Engine Spec | MergeTree, ReplicatedMergeTree, SummingMergeTree, AggregatingMergeTree, CollapsingMergeTree, VersionedCollapsingMergeTree, ReplacingMergeTree, Distributed via ChEngineSpec(name, args, zookeeper_path, replica_name, settings) |
| Ordering | ORDER BY (col1, col2) via ch_order_by (string or list) |
| Primary Key | PRIMARY KEY (col1) via ch_primary_key (must be prefix of order by) |
| Partitioning | PARTITION BY toYYYYMM(col) via ch_partition_by |
| Sampling | SAMPLE BY intHash64(col) via ch_sample_by |
| TTL | Table and column TTL via ch_ttl as list of expressions |
| Settings | SETTINGS index_granularity=8192 via ch_settings dict |
| Materialized Views | CREATE MATERIALIZED VIEW ... TO target AS SELECT ... via ch_select_statement, ch_to_table |
| Projections | PROJECTION name (SELECT ...) via ProjectionSpec list |
| Dictionaries | CREATE DICTIONARY ... SOURCE(...) LIFETIME(...) LAYOUT(...) via ch_dict_* fields |
| Skip Indexes | ALTER TABLE ... ADD INDEX ... TYPE bloom_filter GRANULARITY N via ChIndexSpec entries in ch_indexes |
| Column Codecs | CODEC(ZSTD(3)) via ch.field(codec=...) on CHColumnMeta |
| LowCardinality / Nullable | Type wrappers via ch.field(low_cardinality=..., nullable=...) on CHColumnMeta |
| Column Defaults | DEFAULT expr, MATERIALIZED expr, ALIAS expr via column Meta |
| Type Normalization | VARCHAR -> String, INTEGER -> Int32, BIGINT -> Int64, FLOAT(53) -> Float64, NUMERIC(p,s) -> Decimal(p,s), BOOLEAN -> Bool, ARRAY(Integer) -> Array(Int32), Enum -> Enum8/Enum16, UUID -> UUID, JSON -> JSON, DATETIME -> DateTime / DateTime64 |
| Auto-detect | generate-models auto-enables ClickHouse engine metadata when database_type="clickhouse" (no --clickhouse-engines flag needed) |
| Snapshot | Full system.tables / system.columns extraction with CH metadata in ch_options and ch_column |
Declaring Metadata¶
ClickHouse metadata is declared in a class Meta inner class on the model. This is the only supported surface. Pass options via mapped_column(info=...) raises DBWardenConfigError.
Table-Level Meta¶
Inherit from CHTableMeta on your class Meta:
from datetime import date, datetime
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
from dbwarden.databases.clickhouse import CHTableMeta, ChEngineSpec, ProjectionSpec
class Base(DeclarativeBase):
pass
class Event(Base):
__tablename__ = "events"
id: Mapped[int] = mapped_column(Int64, primary_key=True)
event_date: Mapped[date] = mapped_column(Date)
payload: Mapped[str] = mapped_column(String)
class Meta(CHTableMeta):
ch_engine = ChEngineSpec("MergeTree")
ch_order_by = ["event_date", "id"]
ch_primary_key = "event_date"
ch_partition_by = "toYYYYMM(event_date)"
ch_sample_by = "intHash64(id)"
ch_ttl = ["event_date + toIntervalYear(1)"]
ch_settings = {"index_granularity": "8192"}
comment = "Core event store"
CHTableMeta inherits from TableMeta, which provides common attributes shared across all backends:
| Attribute | Type | SQL |
|---|---|---|
comment |
str |
COMMENT ON TABLE t IS '...' |
indexes |
list[dict] |
Common cross-database indexes |
checks |
list[dict] |
ALTER TABLE t ADD CONSTRAINT ... CHECK (...) |
uniques |
list[dict] |
ALTER TABLE t ADD CONSTRAINT ... UNIQUE (...) |
ClickHouse-specific CHTableMeta attributes:
| Attribute | Type | SQL |
|---|---|---|
ch_engine |
ChEngineSpec |
ENGINE = MergeTree() (or other engine) |
ch_order_by |
str or list[str] |
ORDER BY (col1, col2) |
ch_primary_key |
str or list[str] |
PRIMARY KEY (col1) (must be prefix of order by) |
ch_partition_by |
str |
PARTITION BY toYYYYMM(col) |
ch_sample_by |
str |
SAMPLE BY intHash64(col) |
ch_ttl |
list[str] |
TTL expr1, expr2 |
ch_settings |
dict[str, str] |
SETTINGS key=value (emitted last) |
ch_object_type |
str |
"table", "materialized_view", or "dictionary" (auto-detected) |
ch_select_statement |
str |
AS SELECT ... for materialized views |
ch_to_table |
str |
TO target_table for materialized views |
ch_dictionary |
bool |
Whether this model is a dictionary |
ch_dict_layout |
str |
Dictionary layout (e.g., "FLAT()") |
ch_dict_source |
str |
Dictionary source (e.g., "CLICKHOUSE(TABLE 'src')") |
ch_dict_lifetime |
int or str |
Dictionary cache lifetime |
ch_dict_primary_key |
str or list[str] |
Dictionary primary key |
ch_indexes |
list[ChIndexSpec] |
Skip indexes via ALTER TABLE ... ADD INDEX ... |
ch_projections |
list[ProjectionSpec] |
Named projections |
ch_zookeeper_path |
str |
ZooKeeper path for replicated engines |
ch_replica_name |
str |
Replica name for replicated engines |
The ChTableSpec dataclass (from dbwarden.databases.clickhouse or dbwarden) mirrors these attributes for programmatic access:
from dbwarden.databases import ChTableSpec
spec = ChTableSpec(
engine="MergeTree",
order_by=["event_date", "id"],
partition_by="toYYYYMM(event_date)",
ttl="event_date + toIntervalYear(1)",
settings={"index_granularity": "8192"},
)
Engine Spec¶
Use ChEngineSpec to define the table engine:
from dbwarden.databases.clickhouse import ChEngineSpec
# Simple engine
ch_engine = ChEngineSpec("MergeTree")
# Engine with positional arguments
ch_engine = ChEngineSpec("ReplacingMergeTree", args=("version_column",))
# Replicated engine with ZooKeeper path and replica name
ch_engine = ChEngineSpec("ReplicatedMergeTree",
zookeeper_path="/clickhouse/tables/shard1/events",
replica_name="{replica}")
# Distributed engine with settings
ch_engine = ChEngineSpec("Distributed",
args=("cluster", "db", "events", "rand()"),
settings={"insert_distributed_sync": "1"})
The ChEngineSpec constructor fields:
| Field | Type | Description |
|---|---|---|
name |
str |
Engine name (e.g., "MergeTree", "ReplicatedMergeTree") |
args |
tuple[str, ...] |
Positional engine arguments |
zookeeper_path |
str or None |
ZooKeeper path (injected as first engine arg) |
replica_name |
str or None |
Replica name (injected as second engine arg) |
settings |
dict[str, str] or None |
SETTINGS key=value pairs |
Projections¶
Use ProjectionSpec in ch_projections to attach named projections:
from dbwarden.databases.clickhouse import ProjectionSpec
class Meta(CHTableMeta):
ch_order_by = ["author", "created_at"]
ch_projections = [
ProjectionSpec("by_author", "SELECT * ORDER BY author"),
ProjectionSpec("daily_stats",
"SELECT toDate(created_at) AS day, count() GROUP BY day"),
]
Skip Indexes¶
Use ChIndexSpec in ch_indexes:
from dbwarden.databases.clickhouse import ChIndexSpec
class Meta(CHTableMeta):
ch_indexes = [
ChIndexSpec("ix_payload", ["payload"],
type="bloom_filter", granularity=1),
ChIndexSpec("ix_url", ["url"],
type="minmax", granularity=3),
]
The ChIndexSpec constructor fields:
| Field | Type | Description |
|---|---|---|
name |
str |
Index name |
columns |
list[str] |
Column names for the index |
type |
str |
Index type (bloom_filter, minmax, set, etc.) |
granularity |
int |
Index granularity (default: 1) |
expr |
str or None |
Optional expression override for the index definition |
Generated SQL:
ALTER TABLE events ADD INDEX ix_payload (payload) TYPE bloom_filter GRANULARITY 1
ALTER TABLE events ADD INDEX ix_url (url) TYPE minmax GRANULARITY 3
Column-Level Meta¶
Use CHColumnMeta inner classes for per-column metadata. The inner class must be named after the column. Use ch = ch.field(...) to set column-level options:
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
from dbwarden.databases.clickhouse import CHTableMeta, CHColumnMeta, ChEngineSpec, ch
class Base(DeclarativeBase):
pass
class Event(Base):
__tablename__ = "events"
id: Mapped[int] = mapped_column(Int64, primary_key=True)
payload: Mapped[str] = mapped_column(String)
event_time: Mapped[datetime] = mapped_column(DateTime)
tags: Mapped[list[str]] = mapped_column(ARRAY(String))
class Meta(CHTableMeta):
ch_engine = ChEngineSpec("MergeTree")
ch_order_by = "event_time"
class payload(CHColumnMeta):
ch = ch.field(codec="ZSTD(3)", nullable=False)
class tags(CHColumnMeta):
ch = ch.field(low_cardinality=True)
class event_time(CHColumnMeta):
ch = ch.field(default_expression="now()")
CHColumnMeta includes common column attributes shared across all backends:
| Attribute | Type | SQL |
|---|---|---|
comment |
str |
COMMENT ON COLUMN t.c IS '...' |
public |
bool |
Controls field visibility in schemap auto-schema |
ch |
ChFieldSpec |
ClickHouse-specific column options (see table below) |
ClickHouse-specific ChFieldSpec fields (set via ch.field(...)):
| Keyword | Type | SQL |
|---|---|---|
codec |
str |
CODEC(ZSTD(3)) |
default_expression |
str |
DEFAULT expr |
materialized |
str |
MATERIALIZED expr |
alias |
str |
ALIAS expr |
ttl |
str |
Column-level TTL expression |
low_cardinality |
bool |
Wrap type in LowCardinality(...) |
nullable |
bool |
Wrap type in Nullable(...) |
Materialized Views¶
Materialized views use ch_select_statement instead of ch_engine for the target:
class EventRollup(Base):
__tablename__ = "event_rollup_mv"
event_date: Mapped[date] = mapped_column(Date)
total: Mapped[int] = mapped_column(Int64)
class Meta(CHTableMeta):
ch_object_type = "materialized_view"
ch_select_statement = (
"SELECT toDate(event_time) AS event_date, count() AS total "
"FROM events GROUP BY event_date"
)
ch_to_table = "mv_target"
When ch_to_table is set, the generated SQL omits the ENGINE clause (ClickHouse rejects ENGINE with TO clause):
CREATE MATERIALIZED VIEW IF NOT EXISTS event_rollup_mv TO dbwarden.mv_target (
event_date Date NOT NULL,
total Int64 NOT NULL
)
AS SELECT toDate(event_time) AS event_date, count() AS total FROM events GROUP BY event_date
Dictionaries¶
ClickHouse dictionaries use ch_dictionary = True and related ch_dict_* fields:
class CountryCode(Base):
__tablename__ = "country_codes"
code: Mapped[str] = mapped_column(String)
name: Mapped[str] = mapped_column(String)
class Meta(CHTableMeta):
ch_dictionary = True
ch_dict_layout = "FLAT()"
ch_dict_source = "CLICKHOUSE(HOST 'localhost' TABLE 'countries')"
ch_dict_lifetime = "MIN 0 MAX 3600"
ch_dict_primary_key = "code"
Required fields when ch_dictionary = True:
| Field | Description | Example |
|---|---|---|
ch_dict_layout |
Dictionary layout | "FLAT()", "COMPLEX_KEY_HASHED()" |
ch_dict_source |
Source configuration | "CLICKHOUSE(HOST '...' TABLE '...')" |
ch_dict_lifetime |
Cache lifetime | "MIN 0 MAX 3600" or 3600 |
Optional field:
| Field | Description | Default |
|---|---|---|
ch_dict_primary_key |
Primary key expression | First column |
Generated SQL:
CREATE DICTIONARY IF NOT EXISTS country_codes (
code Int64,
name String
)
PRIMARY KEY code
SOURCE(CLICKHOUSE(HOST 'localhost' TABLE 'countries'))
LIFETIME(MIN 0 MAX 3600)
LAYOUT(FLAT())
Column types render as CH-native types (Int64, String) rather than SQLAlchemy types (BIGINT, VARCHAR).
DDL Behavior¶
ADD COLUMN / DROP COLUMN¶
Standard ALTER TABLE ... ADD COLUMN and ALTER TABLE ... DROP COLUMN work for ClickHouse. Column types render as CH-native type names (e.g., ALTER TABLE events ADD COLUMN value Float64 NOT NULL).
ALTER COLUMN DEFAULT¶
ClickHouse supports ALTER TABLE ... MODIFY COLUMN ... DEFAULT ... (and DROP DEFAULT) which maps to the standard SET DEFAULT / DROP DEFAULT pattern.
ALTER COLUMN TYPE¶
ClickHouse supports in-place ALTER TABLE ... MODIFY COLUMN ... type for compatible type changes (e.g., Int32 to Int64). Incompatible type changes (e.g., String to Int64) require table recreation.
ALTER TABLE MODIFY SETTING¶
Not all SETTINGS are dynamic. Some settings require table recreation. Review the ClickHouse documentation for your version to confirm which settings are MODIFY SETTING compatible.
DDL Transactional Behavior¶
ClickHouse executes DDL statements individually. There is no transactional DDL; partial failure during a multi-statement migration can leave the schema in an inconsistent state. Each statement is applied atomically, but there is no rollback across statements.
Statement Ordering¶
The standard statement ordering applies to ClickHouse. Operations that emit comments (table rename, safe type change) produce zero-effect placeholders. The ordering ensures the upgrade script remains structurally consistent even when backends skip operations.
Previously Manual Operations Now Auto-Generated¶
These operations were previously comment-only placeholders. DBWarden now auto-generates SQL for them:
Table Rename¶
ClickHouse supports RENAME TABLE ... TO .... DBWarden emits real SQL:
Nullable / LowCardinality Type Changes¶
DBWarden computes the target type by stripping Nullable/LowCardinality wrappers from the base type and re-wrapping with the new flags:
Projections¶
DBWarden diffs the projection lists by name and emits native DROP/ADD PROJECTION:
ALTER TABLE events DROP PROJECTION by_date;
ALTER TABLE events ADD PROJECTION by_date SELECT event_date, sum(amount) GROUP BY event_date;
What Emits Comments Only¶
These operations are not supported by ClickHouse DDL. DBWarden emits comment placeholders or points to available flags:
Safe Type Change¶
--safe-type-change is not supported for ClickHouse. The multi-step temp-column strategy emits a comment.
Foreign Keys¶
ClickHouse does not enforce foreign key constraints, and DBWarden prohibits
ForeignKey() on models configured for a ClickHouse-backed database. Using
ForeignKey() raises DBWardenConfigError at model discovery time:
DBWardenConfigError: Column 'sync.repo' uses ForeignKey constraint
referencing 'repos(name)', but ClickHouse does not support foreign key
constraints. Remove ForeignKey() and use a plain mapped_column instead;
the relationship is logical only. If this model is shared across multiple
databases, move it to its own module and configure separate model_paths
per database.
Instead, use a plain mapped_column for columns that reference other tables;
the relationship is logical only and should be enforced at the application layer.
If you need to share model code between a ClickHouse database and another
backend, keep the shared models in a module that is only included in the
non-ClickHouse database's model_paths. Each database_config() entry has
independent model_paths, so a model file with ForeignKey() will never be
discovered for a database that shouldn't see it.
Indexes (Standard)¶
Standard SQL indexes are not supported. Only ClickHouse skip indexes declared via ch_indexes and ChIndexSpec generate real DDL. See Skip Indexes above.
Snapshot Format¶
The snapshot JSON captures all ClickHouse-specific metadata. Key sections:
Column Extras¶
{
"name": "payload",
"type": "String",
"ch_column": {
"ch_codec": "ZSTD(3)",
"ch_default_expression": null,
"ch_materialized": null,
"ch_alias": null,
"ch_ttl": null,
"ch_low_cardinality": false,
"ch_nullable": false,
"ch_type": "String"
}
}
Table Extras¶
{
"ch_options": {
"ch_engine_raw": {"name": "MergeTree", "args": [], "zookeeper_path": null, "replica_name": null, "settings": null},
"ch_engine": ["MergeTree"],
"ch_order_by": ["event_date", "id"],
"ch_primary_key": ["event_date"],
"ch_partition_by": "toYYYYMM(event_date)",
"ch_sample_by": null,
"ch_ttl": ["event_date + toIntervalYear(1)"],
"ch_settings": {"index_granularity": "8192"},
"ch_object_type": "table",
"ch_projections": [{"name": "by_date", "query": "SELECT event_date, sum(amount) GROUP BY event_date"}],
"ch_zookeeper_path": null,
"ch_replica_name": null
}
}
Reverse Engineering¶
generate-models queries system.tables, system.columns, and system.data_skipping_indices to reverse-engineer all ClickHouse metadata. The emitted model uses class Meta with CHTableMeta, CHColumnMeta, ChEngineSpec, and ProjectionSpec.
Auto-detection is the default: when database_type="clickhouse", engine metadata is included automatically. The --clickhouse-engines flag is no longer required.
Generated output for a table with engine, ordering, partitioning, codec, and projections:
from datetime import date
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
from dbwarden.databases.clickhouse import CHTableMeta, CHColumnMeta, ChEngineSpec, ProjectionSpec, ch
class Base(DeclarativeBase):
pass
class Event(Base):
__tablename__ = "events"
id: Mapped[int] = mapped_column(Int64, primary_key=True)
event_date: Mapped[date] = mapped_column(Date)
payload: Mapped[str] = mapped_column(String)
class Meta(CHTableMeta):
ch_engine = ChEngineSpec("MergeTree")
ch_order_by = ["event_date", "id"]
ch_partition_by = "toYYYYMM(event_date)"
ch_ttl = ["event_date + toIntervalYear(1)"]
ch_settings = {"index_granularity": "8192"}
ch_projections = [
ProjectionSpec("by_date", "SELECT event_date, sum(amount) GROUP BY event_date"),
]
class payload(CHColumnMeta):
ch = ch.field(codec="ZSTD(3)")
Safety Classification¶
DBWarden classifies migration changes using the Safety enum:
from dbwarden.engine.safety import Safety
assert Safety.SAFE == "SAFE"
assert Safety.INFO == "INFO"
assert Safety.WARN == "WARN"
assert Safety.CRITICAL == "CRITICAL"
The following ClickHouse-specific safety classifiers are available for custom analysis:
from dbwarden.engine.safety import (
classify_ch_column_change,
classify_ch_options_change,
classify_ch_safety,
)
| Change Type | Severity | Flag Required |
|---|---|---|
| Add column | INFO |
None |
| Drop column | WARNING |
--force |
| Change column type (ch_type) | CRITICAL |
--force |
| Change column codec | WARN |
None |
| Change column default | WARN |
None |
| Change column TTL | WARN |
None |
| Change LowCardinality / Nullable | CRITICAL |
--force |
| Change engine | CRITICAL |
--force |
| Change order by | CRITICAL |
--force |
| Change partition by | WARN |
--force |
| Change sample by | INFO |
None |
| Change TTL | INFO |
None |
| Change settings | WARN |
--force |
| Change object type | CRITICAL |
--force |
| Change dictionary layout / source / lifetime | WARN |
--force |
| Add projection | INFO |
None |
| Drop projection | WARNING |
--force |
| Add skip index | INFO |
None |
| Drop skip index | WARNING |
--force |
| Add materialized view | INFO |
None |
| Drop materialized view | CRITICAL |
--force |
Verification Workflow¶
Because ClickHouse diverges from standard SQL DDL, always review auto-generated migrations before applying:
Use dbwarden make-migrations --plan -d analytics to preview the ops without writing files.
See models.md for detailed ClickHouse model examples and Modeling Guide for a complete walkthrough.