Add dbtool fold: emit self-contained baseline from registered migrations#481
Add dbtool fold: emit self-contained baseline from registered migrations#481christianparpart wants to merge 1 commit intomasterfrom
Conversation
1c305bc to
d8bf4e2
Compare
1bf3c43 to
35bcc7e
Compare
35cbced to
83bd900
Compare
Yaraslaut
left a comment
There was a problem hiding this comment.
Thanks, left few small comments, mostly nitpicks
| if (!options.formatter) | ||
| throw std::runtime_error("EmitSqlBaseline: formatter is required"); | ||
|
|
||
| std::ofstream out(options.outputPath); |
There was a problem hiding this comment.
can we use std::string and flush it only once in a file, when everything is done, i am not the biggest fan of << and streams. Also this will remove the need for WriteSchemaMigrationsSeed and other functions to have first argument
| std::filesystem::path outputPath; | ||
| /// Threshold for splitting the body across multiple `.cpp` files. Zero | ||
| /// disables splitting and emits a single file. | ||
| std::size_t maxLinesPerFile = 5000; |
There was a problem hiding this comment.
Hm, i guess i am missing something, our migrations are declerative, and if we are folding migrations, then we have only one point of declaration, how can we split this into multiple files?
There was a problem hiding this comment.
because that file may become quite big (imagine having 500+ tables with many of them having really a lot of columns), then the one migration TU can be too big for a single .cpp file to be compiled on your machine. In my case, it did let clang-tidy OOM-kill my 64GB laptop :)
The approach then is to split the single migration across multiple functions that are invoked from the single folded migration.
| using T = std::decay_t<decltype(t)>; | ||
| if constexpr (std::is_same_v<T, Bigint>) | ||
| return std::string(kPrefix) + "Bigint {}"; | ||
| else if constexpr (std::is_same_v<T, Bool>) | ||
| return std::string(kPrefix) + "Bool {}"; | ||
| else if constexpr (std::is_same_v<T, Date>) | ||
| return std::string(kPrefix) + "Date {}"; | ||
| else if constexpr (std::is_same_v<T, DateTime>) | ||
| return std::string(kPrefix) + "DateTime {}"; | ||
| else if constexpr (std::is_same_v<T, Guid>) | ||
| return std::string(kPrefix) + "Guid {}"; | ||
| else if constexpr (std::is_same_v<T, Integer>) | ||
| return std::string(kPrefix) + "Integer {}"; | ||
| else if constexpr (std::is_same_v<T, Real>) | ||
| return std::format("{}Real {{ {} }}", kPrefix, t.precision); | ||
| else if constexpr (std::is_same_v<T, Smallint>) | ||
| return std::string(kPrefix) + "Smallint {}"; | ||
| else if constexpr (std::is_same_v<T, Tinyint>) | ||
| return std::string(kPrefix) + "Tinyint {}"; | ||
| else if constexpr (std::is_same_v<T, Time>) | ||
| return std::string(kPrefix) + "Time {}"; | ||
| else if constexpr (std::is_same_v<T, Timestamp>) | ||
| return std::string(kPrefix) + "Timestamp {}"; | ||
| else if constexpr (std::is_same_v<T, Char>) | ||
| return std::format("{}Char {{ {} }}", kPrefix, t.size); | ||
| else if constexpr (std::is_same_v<T, NChar>) | ||
| return std::format("{}NChar {{ {} }}", kPrefix, t.size); | ||
| else if constexpr (std::is_same_v<T, Varchar>) | ||
| return std::format("{}Varchar {{ {} }}", kPrefix, t.size); |
There was a problem hiding this comment.
i think here it is easier to use overloaded with the visitor, not one lambda
| // timestamp slot. (For the LUP plugin the typical baseline body is well under | ||
| // any reasonable threshold, so this rarely fires.) | ||
| auto const body = BuildSingleFileBody(fold); | ||
| std::ofstream out(options.outputPath); |
There was a problem hiding this comment.
here as well, I think it is cleaner to create strings and then flush everything at once
dbtool fold --output FILE emits a self-contained baseline (.cpp plugin
or .sql script) that reproduces the post-migration state from an empty
DB. .sql output requires --dialect (sqlite, postgres, mssql, mysql);
.cpp output is dialect-agnostic. Runs without any DB connection - loads
plugins, walks migrations in memory, writes a file.
Built on a new pure plan-walk primitive
MigrationManager::FoldRegisteredMigrations(formatter, upToInclusive)
that folds every registered migration into a per-table view of the
final shape plus a chronological list of data steps, indexes, and
releases.
The fold module (src/Lightweight/MigrationFold/{Folder,CppEmitter,
SqlEmitter}.{hpp,cpp}) emits via the existing ToSql() formatter path so
each dialect's CREATE TABLE / CREATE INDEX / INSERT codegen stays the
single source of truth. The .cpp emitter wraps the body in
LIGHTWEIGHT_SQL_MIGRATION; the .sql emitter additionally emits CREATE
TABLE schema_migrations and a stamping INSERT for every folded
timestamp so the post-fold DB looks identical to a real apply-all run.
Also pulls in CodeGen/SplitFileWriter shared codegen helper used by the
.cpp emitter to bin-pack large baselines across multiple files.
Tests: fold unit tests cover create/altercolumn/drop-table cleanup,
data-step chronological order, --up-to truncation, RawSql passthrough,
column rename FK propagation, release-range filtering, ResolveUpTo
parsing. SqlEmitter/CppEmitter round-trip tests verify the emitted
artifacts match the expected shape. SplitFileWriter tests cover bin-
packing, single-chunk, zero-budget, and oversize-block boundaries.
All [Fold] and [SplitFileWriter] tests pass against sqlite3,
mssql2022, and postgres. Full SqlMigration suite (44 cases / 210
assertions) green on all three.
Signed-off-by: Christian Parpart <christian@parpart.family>
83bd900 to
7a78773
Compare
Adds a new offline
dbtool foldsubcommand that walks all registered migrations and emits a single self-contained baseline — either a.cppmigration plugin or a.sqlscript — that reproduces the post-migration schema andschema_migrationsrows from an empty database. Useful for collapsing a long migration history into a fast-to-apply starting point, or for shipping a snapshot baseline alongside a release.The command is purely offline: it never opens a DB connection, never queries a live schema. It loads plugins, walks each migration's
Up()plan in timestamp order, folds the cumulative effect into a per-table view + chronological data steps, and emits via the existingToSql()formatter path so each dialect's CREATE TABLE / INSERT codegen stays the single source of truth.Changes
MigrationManager::FoldRegisteredMigrations(formatter, upToInclusive)primitive — pure plan-walk, returnsPlanFoldingResult(per-table state, creation order, indexes, chronological data steps, in-range releases). Used by the new module and available to any future caller.Lightweight::MigrationFoldlibrary module undersrc/Lightweight/MigrationFold/:Folder— thin facade plusResolveUpTo()which accepts an empty string (latest registered release), a numeric timestamp, or a release version string.SqlEmitter— emits a flat dialect-specific.sqlscript, including aCREATE TABLE schema_migrationsand a stampingINSERTper folded timestamp so a freshly-loaded DB looks identical to a real apply-all run.CppEmitter— emits a.cppbaseline plugin wrapped inLIGHTWEIGHT_SQL_MIGRATION, with optional--emit-cmakeand--max-lines-per-filefor splitting very large baselines across multiple files.CodeGen/SplitFileWriterhelper that bin-packs blocks within a per-file line budget; used byCppEmitterand intentionally factored out for reuse.dbtool fold --output FILE [--up-to X] [--dialect D] [--emit-cmake] [--plugin-name N] [--max-lines-per-file N]— output format is picked from the file extension..sqlrequires--dialect(sqlite, postgres, mssql, mysql);.cppis dialect-agnostic. Dispatched beforeSetupConnectionStringsince fold never touches a DB; uses a connection-lessGetMigrationManagerOfflinevariant.--up-totruncation, RawSql passthrough, column rename FK propagation, release-range filtering,ResolveUpToparsing) + 4SplitFileWritercases + 2 emitter round-trip cases. Green against sqlite3, mssql2022, and postgres.