Add dbtool fold: emit self-contained baseline from registered migrations by christianparpart · Pull Request #481 · LASTRADA-Software/Lightweight

christianparpart · 2026-04-30T05:36:35Z

Adds a new offline dbtool fold subcommand that walks all registered migrations and emits a single self-contained baseline — either a .cpp migration plugin or a .sql script — that reproduces the post-migration schema and schema_migrations rows from an empty database. Useful for collapsing a long migration history into a fast-to-apply starting point, or for shipping a snapshot baseline alongside a release.

The command is purely offline: it never opens a DB connection, never queries a live schema. It loads plugins, walks each migration's Up() plan in timestamp order, folds the cumulative effect into a per-table view + chronological data steps, and emits via the existing ToSql() formatter path so each dialect's CREATE TABLE / INSERT codegen stays the single source of truth.

Changes

New MigrationManager::FoldRegisteredMigrations(formatter, upToInclusive) primitive — pure plan-walk, returns PlanFoldingResult (per-table state, creation order, indexes, chronological data steps, in-range releases). Used by the new module and available to any future caller.
New Lightweight::MigrationFold library module under src/Lightweight/MigrationFold/:
- Folder — thin facade plus ResolveUpTo() which accepts an empty string (latest registered release), a numeric timestamp, or a release version string.
- SqlEmitter — emits a flat dialect-specific .sql script, including a CREATE TABLE schema_migrations and a stamping INSERT per folded timestamp so a freshly-loaded DB looks identical to a real apply-all run.
- CppEmitter — emits a .cpp baseline plugin wrapped in LIGHTWEIGHT_SQL_MIGRATION, with optional --emit-cmake and --max-lines-per-file for splitting very large baselines across multiple files.
Shared CodeGen/SplitFileWriter helper that bin-packs blocks within a per-file line budget; used by CppEmitter and intentionally factored out for reuse.
dbtool fold --output FILE [--up-to X] [--dialect D] [--emit-cmake] [--plugin-name N] [--max-lines-per-file N] — output format is picked from the file extension. .sql requires --dialect (sqlite, postgres, mssql, mysql); .cpp is dialect-agnostic. Dispatched before SetupConnectionString since fold never touches a DB; uses a connection-less GetMigrationManagerOffline variant.
Unit tests: 10 fold cases (create + altercolumn, drop-table cleanup, chronological ordering, --up-to truncation, RawSql passthrough, column rename FK propagation, release-range filtering, ResolveUpTo parsing) + 4 SplitFileWriter cases + 2 emitter round-trip cases. Green against sqlite3, mssql2022, and postgres.

Yaraslaut

Thanks, left few small comments, mostly nitpicks

Yaraslaut · 2026-04-30T19:33:30Z

+    if (!options.formatter)
+        throw std::runtime_error("EmitSqlBaseline: formatter is required");
+
+    std::ofstream out(options.outputPath);


can we use std::string and flush it only once in a file, when everything is done, i am not the biggest fan of << and streams. Also this will remove the need for WriteSchemaMigrationsSeed and other functions to have first argument

Yaraslaut · 2026-04-30T19:37:16Z

+    std::filesystem::path outputPath;
+    /// Threshold for splitting the body across multiple `.cpp` files. Zero
+    /// disables splitting and emits a single file.
+    std::size_t maxLinesPerFile = 5000;


Hm, i guess i am missing something, our migrations are declerative, and if we are folding migrations, then we have only one point of declaration, how can we split this into multiple files?

because that file may become quite big (imagine having 500+ tables with many of them having really a lot of columns), then the one migration TU can be too big for a single .cpp file to be compiled on your machine. In my case, it did let clang-tidy OOM-kill my 64GB laptop :)
The approach then is to split the single migration across multiple functions that are invoked from the single folded migration.

Yaraslaut · 2026-04-30T19:40:00Z

+                using T = std::decay_t<decltype(t)>;
+                if constexpr (std::is_same_v<T, Bigint>)
+                    return std::string(kPrefix) + "Bigint {}";
+                else if constexpr (std::is_same_v<T, Bool>)
+                    return std::string(kPrefix) + "Bool {}";
+                else if constexpr (std::is_same_v<T, Date>)
+                    return std::string(kPrefix) + "Date {}";
+                else if constexpr (std::is_same_v<T, DateTime>)
+                    return std::string(kPrefix) + "DateTime {}";
+                else if constexpr (std::is_same_v<T, Guid>)
+                    return std::string(kPrefix) + "Guid {}";
+                else if constexpr (std::is_same_v<T, Integer>)
+                    return std::string(kPrefix) + "Integer {}";
+                else if constexpr (std::is_same_v<T, Real>)
+                    return std::format("{}Real {{ {} }}", kPrefix, t.precision);
+                else if constexpr (std::is_same_v<T, Smallint>)
+                    return std::string(kPrefix) + "Smallint {}";
+                else if constexpr (std::is_same_v<T, Tinyint>)
+                    return std::string(kPrefix) + "Tinyint {}";
+                else if constexpr (std::is_same_v<T, Time>)
+                    return std::string(kPrefix) + "Time {}";
+                else if constexpr (std::is_same_v<T, Timestamp>)
+                    return std::string(kPrefix) + "Timestamp {}";
+                else if constexpr (std::is_same_v<T, Char>)
+                    return std::format("{}Char {{ {} }}", kPrefix, t.size);
+                else if constexpr (std::is_same_v<T, NChar>)
+                    return std::format("{}NChar {{ {} }}", kPrefix, t.size);
+                else if constexpr (std::is_same_v<T, Varchar>)
+                    return std::format("{}Varchar {{ {} }}", kPrefix, t.size);


i think here it is easier to use overloaded with the visitor, not one lambda

Yaraslaut · 2026-04-30T19:41:14Z

+    // timestamp slot. (For the LUP plugin the typical baseline body is well under
+    // any reasonable threshold, so this rarely fires.)
+    auto const body = BuildSingleFileBody(fold);
+    std::ofstream out(options.outputPath);


here as well, I think it is cleaner to create strings and then flush everything at once

dbtool fold --output FILE emits a self-contained baseline (.cpp plugin or .sql script) that reproduces the post-migration state from an empty DB. .sql output requires --dialect (sqlite, postgres, mssql, mysql); .cpp output is dialect-agnostic. Runs without any DB connection - loads plugins, walks migrations in memory, writes a file. Built on a new pure plan-walk primitive MigrationManager::FoldRegisteredMigrations(formatter, upToInclusive) that folds every registered migration into a per-table view of the final shape plus a chronological list of data steps, indexes, and releases. The fold module (src/Lightweight/MigrationFold/{Folder,CppEmitter, SqlEmitter}.{hpp,cpp}) emits via the existing ToSql() formatter path so each dialect's CREATE TABLE / CREATE INDEX / INSERT codegen stays the single source of truth. The .cpp emitter wraps the body in LIGHTWEIGHT_SQL_MIGRATION; the .sql emitter additionally emits CREATE TABLE schema_migrations and a stamping INSERT for every folded timestamp so the post-fold DB looks identical to a real apply-all run. Also pulls in CodeGen/SplitFileWriter shared codegen helper used by the .cpp emitter to bin-pack large baselines across multiple files. Tests: fold unit tests cover create/altercolumn/drop-table cleanup, data-step chronological order, --up-to truncation, RawSql passthrough, column rename FK propagation, release-range filtering, ResolveUpTo parsing. SqlEmitter/CppEmitter round-trip tests verify the emitted artifacts match the expected shape. SplitFileWriter tests cover bin- packing, single-chunk, zero-budget, and oversize-block boundaries. All [Fold] and [SplitFileWriter] tests pass against sqlite3, mssql2022, and postgres. Full SqlMigration suite (44 cases / 210 assertions) green on all three. Signed-off-by: Christian Parpart <christian@parpart.family>

christianparpart requested a review from a team as a code owner April 30, 2026 05:36

github-actions Bot added CLI command line interface tools tests Core API labels Apr 30, 2026

christianparpart force-pushed the feature/dbtool-fold branch 2 times, most recently from 1c305bc to d8bf4e2 Compare April 30, 2026 08:57

github-actions Bot added Query Builder Data Binder SQL Data Binder support Query Formatter SQL dialect implementations labels Apr 30, 2026

christianparpart force-pushed the feature/dbtool-fold branch 4 times, most recently from 1bf3c43 to 35bcc7e Compare April 30, 2026 10:50

github-actions Bot removed Query Builder Data Binder SQL Data Binder support Query Formatter SQL dialect implementations labels Apr 30, 2026

christianparpart force-pushed the feature/dbtool-fold branch 2 times, most recently from 35cbced to 83bd900 Compare April 30, 2026 17:58

Yaraslaut approved these changes Apr 30, 2026

View reviewed changes

christianparpart force-pushed the feature/dbtool-fold branch from 83bd900 to 7a78773 Compare April 30, 2026 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dbtool fold: emit self-contained baseline from registered migrations#481

Add dbtool fold: emit self-contained baseline from registered migrations#481
christianparpart wants to merge 1 commit intomasterfrom
feature/dbtool-fold

christianparpart commented Apr 30, 2026

Uh oh!

Yaraslaut left a comment

Uh oh!

Yaraslaut Apr 30, 2026

Uh oh!

Yaraslaut Apr 30, 2026

Uh oh!

christianparpart Apr 30, 2026

Uh oh!

Yaraslaut Apr 30, 2026

Uh oh!

Yaraslaut Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

christianparpart commented Apr 30, 2026

Changes

Uh oh!

Yaraslaut left a comment

Choose a reason for hiding this comment

Uh oh!

Yaraslaut Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Yaraslaut Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

christianparpart Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Yaraslaut Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Yaraslaut Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants