Skip to content

Commit 05d3ab1

Browse files
authored
docs(clickhouse): require max+1 numbering and idempotent DDL (#3633)
## Summary Codify two rules for ClickHouse migration authors that came out of the 029/030 ordering incident on the TRI-9367 test cloud deploy: 1. **Number files to `max(existing) + 1`, never slot in below the latest.** Goose runs in strict mode in the cloud deploy pipeline and refuses to apply a missing version below the current version — slotting a file in below an already-applied number blocks the next deploy. 2. **DDL must be idempotent** (`ADD COLUMN IF NOT EXISTS`, `DROP COLUMN IF EXISTS`, `CREATE TABLE IF NOT EXISTS`, etc.) so a retry or out-of-order apply (`goose up --allow-missing` for local recovery, manual fixups) is a no-op rather than an error. ## Where the rules live - `internal-packages/clickhouse/CLAUDE.md` — full rules + example for migration authors (and AI agents writing migrations). - `.claude/REVIEW.md` — added a 🔴 finding under "What makes a 🔴 Important finding" so PR reviewers flag either fault as blocking. The existing migration files are left untouched; the idempotency requirement applies going forward. ## Test plan - [ ] Next ClickHouse migration PR uses `IF NOT EXISTS` / `IF EXISTS` forms - [ ] No new migration files numbered below an already-applied version on test/prod
1 parent 032b5a1 commit 05d3ab1

2 files changed

Lines changed: 32 additions & 3 deletions

File tree

.claude/REVIEW.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ Reserve 🔴 for things that would page someone or block a rollback. In this cod
1111
- A Redis data shape used by both versions changes in place. New shapes need a new key namespace.
1212
- A migration is not backward-compatible with the prior image.
1313
- **Schema / migration safety.** Prisma migrations must be backward-compatible with the prior deploy. Adding NOT NULL without a default, dropping a column an old image still reads, renaming a column — all 🔴.
14+
- **ClickHouse migration ordering + idempotency.** Goose runs in strict mode in the deploy pipeline and refuses to apply a missing version below the current version — slotting a new file in below the latest already-applied version blocks the deploy. New ClickHouse migration files MUST use the next available number (`max(files in internal-packages/clickhouse/schema/) + 1`); if main has added migrations while you've been on a branch, renumber yours. DDL must also be idempotent (`ADD COLUMN IF NOT EXISTS`, `DROP COLUMN IF EXISTS`, `CREATE TABLE IF NOT EXISTS`, `ADD INDEX IF NOT EXISTS`) so a partial / `--allow-missing` apply elsewhere doesn't fail on retry. Either fault is 🔴 — both break test/prod deploys. Rules live in `internal-packages/clickhouse/CLAUDE.md`.
1415
- **Queue / concurrency correctness.** RunQueue, MarQS (V1, legacy), redis-worker — any change to enqueue / dequeue / locking semantics. Re-derive the invariant on paper before flagging or accepting.
1516
- **Missing index on a hot table.** New Prisma queries against `TaskRun`, `TaskRunExecutionSnapshot`, `JobRun`, `Project`, etc. must use an existing index. Check `internal-packages/database/prisma/schema.prisma` for the relevant `@@index` lines — don't guess and don't propose `EXPLAIN`.
1617
- **Recovery-path queries.** Any `TaskRun.findFirst` / `findMany` added to a schedule, run-recovery, or restart loop. Recovery fan-outs (Redis crash, restart storms) turn "rare indexed query" into a DB incident. 🔴 even if indexed.

internal-packages/clickhouse/CLAUDE.md

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,46 @@
44

55
## Migrations
66

7-
Goose-format SQL migrations live in `schema/`. Create new numbered files:
7+
Goose-format SQL migrations live in `schema/`. Two rules below are load-bearing — both can block a deploy.
8+
9+
### Rule 1: number to `max + 1`, never slot in
10+
11+
Goose runs in strict mode in the deploy pipeline. If a migration file numbered *below* the version currently recorded in `goose_db_version` ever shows up, goose refuses to apply it and the deploy fails:
12+
13+
```
14+
goose run: error: found 1 missing migrations before current version 30:
15+
version 29: 029_add_task_kind_to_task_runs_v2.sql
16+
```
17+
18+
When adding a migration:
19+
20+
1. Look at `schema/` and take the largest existing number, call it `N`.
21+
2. Name your file `0(N+1)_descriptive_name.sql`.
22+
3. If you've been on a branch while main added migrations, **rebase and renumber** before opening the PR — a file numbered below the new max will block the next deploy after your PR merges.
23+
24+
### Rule 2: DDL must be idempotent
25+
26+
Migrations can be applied out of order in some environments (`goose up --allow-missing` for local recovery, manual fixups, etc.) and may be retried. Always use idempotent forms so a re-apply is a no-op:
827

928
```sql
1029
-- +goose Up
1130
ALTER TABLE trigger_dev.your_table
12-
ADD COLUMN new_column String DEFAULT '';
31+
ADD COLUMN IF NOT EXISTS new_column String DEFAULT '';
1332

1433
-- +goose Down
1534
ALTER TABLE trigger_dev.your_table
16-
DROP COLUMN new_column;
35+
DROP COLUMN IF EXISTS new_column;
1736
```
1837

38+
Equivalent forms for other DDL:
39+
40+
- `CREATE TABLE IF NOT EXISTS …`
41+
- `DROP TABLE IF EXISTS …`
42+
- `ADD INDEX IF NOT EXISTS …` / `DROP INDEX IF EXISTS …`
43+
- `CREATE MATERIALIZED VIEW IF NOT EXISTS …` / `DROP VIEW IF EXISTS …`
44+
45+
ClickHouse supports `IF [NOT] EXISTS` on all of the above. Older migrations in this directory predate the rule and are not idempotent — leave them as-is unless you're explicitly hardening one.
46+
1947
## Naming Conventions
2048

2149
- `raw_` prefix for input tables (where data lands first)

0 commit comments

Comments
 (0)