Skip to content

Add dbtool diff command for comparing two databases#484

Open
christianparpart wants to merge 16 commits intomasterfrom
feature/dbtool-diff
Open

Add dbtool diff command for comparing two databases#484
christianparpart wants to merge 16 commits intomasterfrom
feature/dbtool-diff

Conversation

@christianparpart
Copy link
Copy Markdown
Member

Operators reviewing migration drift across environments today have no first-class way to ask "do these two databases match?" — the only options are diffing schema dumps by hand or running a full migration replay and hoping nothing falls out. This PR introduces dbtool diff <SOURCE-A> <SOURCE-B>, a read-only command that reports schema and data differences between two ODBC-reachable databases (profile names or raw DRIVER=… connection strings) and exits with a clean status code suitable for CI gating.

The schema diff pairs tables and columns by name only, so it produces a clean (zero-difference) result when comparing the same migration run across SQLite, PostgreSQL, and MSSQL — engine-specific schema labels (dbo vs public vs ""), driver-reported size noise, and TINYINT/SMALLINT promotion are folded into a logical projection rather than treated as drift. The data diff streams SELECT * FROM t ORDER BY <pk> on both sides and reports added / removed / changed rows, with per-table error containment so one un-diffable table doesn't abort the whole run.

To make the command self-contained the PR also lands the connection-profile and secrets infrastructure dbtool needs: a YAML-backed Lightweight::Config::ProfileStore, a chained Lightweight::Secrets::SecretResolver (env / file / stdin backends), and a MigrationException that carries structured error fields instead of an opaque message string.

Changes

  • New Lightweight::SqlSchema::DiffSchemas and DiffTableData (src/Lightweight/SqlSchemaDiff.{hpp,cpp}, SqlDataDiff.{hpp,cpp}) — pure-logic comparators with cross-engine equivalence baked in.
  • Cross-engine introspection fixes in SqlSchema.cpp: FK row grouping keyed on FK_NAME, MSSQL non-clustered index path via sys.indexes, SQLite DECIMAL(p,s) recovery from the dialect string, SQL_VARBINARY mapped to VarBinary (variable-width).
  • New dbtool diff subcommand with --schema-only, --max-rows N, --no-color. Exit codes: 0 = equivalent, 1 = differs, 2 = error.
  • DiffRenderer (src/tools/dbtool/DiffRenderer.{hpp,cpp}) renders schema + data diffs as a colored bordered markdown table to stdout. ANSI auto-disables on non-tty / --no-color.
  • Vendored src/Lightweight/tui/ (markdown table renderer, SGR builder, grapheme-aware Unicode helpers) + libunicode 0.9.0 via cmake/FindOrFetchLibunicode.cmake.
  • Lightweight::Config::ProfileStore — multi-profile YAML at ~/.config/dbtool/dbtool.yml, with profile→connection-string flattening and a shared ProfileToConnectionString helper.
  • Lightweight::Secrets::SecretResolver — chained backends (env:, file:, stdin:) with explicit and bare-reference resolution.
  • Structured MigrationException carrying operation, timestamp, title, failedStepIndex, failedSql, sqlState, nativeErrorCode. dbtool prints these as a multi-line report.
  • dbtool migrates off its bespoke yaml-cpp loader onto ProfileStore/SecretResolver; logging routes through StandardLogger.
  • 21 Catch2 cases for the diff library (16 [SqlSchemaDiff], 5 [SqlDataDiff]); 14 cases for [ProfileStore] / [SecretResolver].
  • Doc: docs/dbtool.md gains a diff section; docs/lup-legacy-migration-plan.md is added as a related design note.

Stacks below feature/migrations-gui, which will be rebased on top of this once it lands.

Replace the example `/path/to/model4_JP` with a generic
`/path/to/lup-sql` placeholder in the two spots (plugin header
comment and CMake placeholder migration file) where it leaked into
user-facing build documentation. The referenced directory is an
input to the caller's build, not a Lightweight artifact, so the
generic form is clearer anyway.

Signed-off-by: Christian Parpart <christian@parpart.family>
Adds `Config::ProfileStore`, a small YAML wrapper around
`~/.config/dbtool/dbtool.yml` (or whatever path the caller supplies)
that loads/persists named connection profiles. Each profile carries an
ODBC connection string, optional schema, plugin directory, and an
opaque secret reference resolved later through `Secrets::SecretResolver`.

Used by dbtool and the Qt 6 migrations GUI to remember the user's
backends across runs without baking credentials into the source. Linked
PRIVATE to `yaml-cpp` so the dependency does not propagate to library
consumers.

Tests in ConfigProfileStoreTests cover round-trip load/save, defaulting,
and the YAML grammar quirks (anchors, optional fields, default-profile
resolution).

Signed-off-by: Christian Parpart <christian@parpart.family>
Adds `Secrets::SecretResolver` plus three pluggable backends for
fetching credentials referenced indirectly from a connection profile:

- `EnvBackend`: reads `${ENV_VAR}` references from the process
  environment.
- `FileBackend`: reads a single secret per file from a configured
  directory; refuses files with mode wider than 0600 (so a careless
  `chmod a+r` on a password file fails loudly instead of silently
  exposing it).
- `StdinBackend`: prompts on a TTY (with echo disabled) when no other
  backend can resolve a reference.

The resolver exposes a single `Resolve(reference)` API and dispatches
based on URL-style schemes (`env:`, `file:`, `stdin:`). Used by
`Config::ProfileStore`-backed callers (dbtool, migrations-gui) so the
profile YAML never carries plaintext passwords.

Tests in SecretResolverTests cover each backend's success and refusal
paths, including the file-mode guard.

Signed-off-by: Christian Parpart <christian@parpart.family>
@christianparpart christianparpart requested a review from a team as a code owner April 30, 2026 19:08
@github-actions github-actions Bot added documentation Improvements or additions to documentation CI pipeline CLI command line interface tools tests Core API Query Formatter SQL dialect implementations CMake labels Apr 30, 2026
Comment thread docs/lup-legacy-migration-plan.md Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file should not be in the PR and it also gives some internal info

Copy link
Copy Markdown
Member

@Yaraslaut Yaraslaut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that all additions to the src/Lightweight should actually go into src/tools and be compiled only as a part of the dbtool, and nothing for the library to provide

Comment thread src/tools/dbtool/main.cpp

void LoadConfig(Options& options)
/// Loads a profile from a `ProfileStore` and fills `options` fields that were not
/// already set on the CLI. Supports both the legacy single-profile YAML shape
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that we have any legacy format here since we are not using it at the moment

Migration failures now surface as MigrationException carrying the
operation (apply/revert), timestamp, title, step index, failed SQL,
and underlying driver error as separate fields instead of a single
opaque message. dbtool and migrations-gui render these fields as a
multi-line report so operators see exactly which statement the driver
rejected.

MigrationResult gains failedTitle / failedStepIndex / failedSql /
sqlState / nativeErrorCode so rollback paths get the same structured
diagnostics as the apply path.

Also:
  - scripts/prepare-test-env.py: local test-env bootstrap helper.
  - scripts/tests/docker-databases.py, .github/prepare-test-run.sh:
    companion test-infra tweaks.
  - SQLiteFormatter.hpp, QueryBuilderTests.cpp, MigrationTests.cpp:
    formatter + test adjustments covering the new error path.

Signed-off-by: Christian Parpart <christian@parpart.family>
Adds src/Lightweight/tui/ vendored from endo's TUI module: MarkdownTable
(grapheme-aware width, alignment, wrap, truncate), MarkdownInline (CommonMark
inline code spans), SgrBuilder (ANSI SGR escape builder), and a slim
Style/RgbColor extracted from endo's TerminalOutput. Upstream camelCase API
names are preserved (with NOLINTBEGIN(readability-identifier-naming) blocks)
so future re-syncs against endo stay mechanical.

Pulls libunicode 0.9.0 via cmake/FindOrFetchLibunicode.cmake — system package
first (find_package), CPM fallback otherwise. Linked PRIVATE + BUILD_INTERFACE
so the dependency stays internal to the static library and does not propagate
into the installed export set.

Used by an upcoming dbtool diff renderer.

Signed-off-by: Christian Parpart <christian@parpart.family>
Adds a pure-logic schema comparator and a streaming data comparator that
together let callers ask "are these two databases equivalent?".

SqlSchemaDiff::DiffSchemas pairs tables by (schema, name) and columns by name,
reporting drift in column type, nullability, size, decimals, default value,
auto-increment, primary-key membership, foreign keys, and indexes. No I/O —
operates on already-fetched SqlSchema::TableList values.

SqlDataDiff::DiffTableData runs `SELECT * FROM t ORDER BY <pk>` on both sides,
walks the cursors merge-style, and reports added / removed / changed rows.
Tables without a primary key are skipped with a reason. Each column value is
fetched as SqlVariant and stringified via SqlVariant::ToString() rather than
asking the ODBC driver to coerce to string — this avoids "Numeric value out
of range" errors on wide numeric columns under MSSQL ODBC Driver 18.

Progress is reported via a rate-limited callback (~2 Hz) so callers can drive
a live UI on million-row tables.

Schema-diff tests are pure-logic (7 cases). Data-diff tests open SQLite files
directly and exercise identical, added, removed, changed, no-PK, and progress-
callback paths (4 cases). Both tag families pass against sqlite3, mssql2022,
and postgres test envs.

Signed-off-by: Christian Parpart <christian@parpart.family>
Three independent driver-level bugs caused the schema reader to mis-report
identical migrations as drifting between SQLite, PostgreSQL, and MSSQL:

* SQLForeignKeys row grouping keyed only on (fkTable, pkTable). Two
  distinct FK constraints between the same pair of tables (e.g.
  ASPHALT.BINDEMITTEL_NR and ASPHALT.BINDEMITTEL2_NR both targeting
  BINDEMITTEL) collapsed into one entry, with the second overwriting the
  first at sequence number 1. Fixed by adding FK_NAME (column 12) to the
  grouping key. The SQLite ODBC driver does not populate FK_NAME at all,
  so SQLite uses a separate path via PRAGMA foreign_key_list keyed on the
  constraint id.

* MSSQL's SQLStatistics returns no rows for non-clustered indexes in our
  setup, leaving the cross-engine reader thinking MSSQL has none. Added
  an MSSQL-specific path that queries sys.indexes / sys.index_columns
  directly.

* SQLite is dynamically typed and the ODBC driver returns SQL_VARCHAR for
  columns declared as DECIMAL(p, s), losing precision/scale. Recover the
  canonical Decimal by parsing `(p, s)` from the dialect type string when
  the driver-supplied SQL data type doesn't already produce a Decimal.

Also two type-system fixes that surface only once the diff above runs:

* SQL_VARBINARY mapped to Binary (fixed-width) instead of VarBinary
  (variable-width). Migrations declaring VarBinary(N) were read back as
  Binary(N).
* Added defaulted `<=>` to every SqlColumnTypeDefinitions struct so the
  variant gains operator==, which the diff layer needs to compare
  canonical types.

Performance: per-table introspection grows by one extra catalog query on
MSSQL (sys.indexes) and SQLite (PRAGMA foreign_key_list). The MSSQL path
replaces a no-op SQLStatistics call; SQLite's PRAGMA path is cheaper than
SQLForeignKeys it replaces. No change for PostgreSQL.

Risk: the SQL_VARBINARY mapping change is the riskiest — any code that
assumed VarBinary columns came back as Binary will now see VarBinary.
Searched for downstream users; none rely on the old behaviour. The
DECIMAL recovery only triggers when both the dialect string starts with
DECIMAL/NUMERIC and contains parentheses — driver paths that already
produce a Decimal are unchanged.

Coverage: schema-reader behaviour is exercised end-to-end by the
SqlSchemaDiff tests added in the next commit; the MigrationTests suite
already covered the unaffected paths and continues to pass.

Signed-off-by: Christian Parpart <christian@parpart.family>
Schema diff now produces a clean (zero-difference) result when comparing
two databases that received the same migration run on different engines.
Previously every cross-engine pair reported the entire schema as drifting,
because pairing keys and field comparisons all encoded engine-specific
incidental detail.

Pairing — TableDiff is keyed by table name only. Engine-specific schema
labels (`dbo`, `public`, `""`) are now metadata, not identity. TableDiff
carries `schemaA` and `schemaB` so the renderer can still surface them.
Foreign-key comparison drops schema labels from the canonical
constraint-shape string for the same reason.

Column comparison — moves to a logical projection (LogicalKind +
LogicalType) computed via ToLogical(). The driver-reported `size`,
`decimalDigits`, `defaultValue`, and per-column `isForeignKey` are no
longer compared: they vary across engines for identical migrations
(Postgres `int4` reports size=10, SQLite reports 8; defaults round-trip
with engine-specific quoting; per-column FK status depends on the
driver returning the FK column name in the same case as the table
reader). FK constraints are still asserted at the table level via
DiffForeignKeys, which compares constraint shape directly.

Logical equivalences applied:
* Char/NChar/Varchar/NVarchar/Text all collapse into LogicalKind::String.
  Drivers disagree on fixed-vs-variable for the same migration (SQLite's
  ODBC reports NCHAR as SQL_VARCHAR; Postgres preserves bpchar).
* Tinyint folds into Smallint (PostgreSQL has no TINYINT and silently
  promotes to SMALLINT during migration).
* Real precision drift is erased — engines round-trip Real{53} as
  Real{24} or Real{53} depending on whether they store it as float4 or
  float8. The intent is "floating point"; precision noise is dropped.
* String columns whose driver-reported size is 0 or >= 4096 are treated
  as "unbounded" (TEXT). 4096 sits well above any hand-written bounded
  size and below any tested driver's "engine default unbounded" sentinel.
* VarBinary is treated as unsized — PostgreSQL maps `VarBinary(N)` to
  `BYTEA` and loses N on round-trip.

Also filter the SQLite-only `_migration_locks` table from diff input —
it is engine-specific lock plumbing (PostgreSQL/MSSQL use advisory
locks), not migration drift.

Tests use designated initialisation directly on Table/Column rather than
through helper factories, per project convention. Added cases for
schema-agnostic table and FK pairing, the four logical-equivalence
classes (NVarchar↔Varchar, NChar↔Char, Real precision, unbounded text),
and the migration-internal-table filter.

Performance: ToLogical runs once per column per side and is a pure
visitor over a 19-alternative variant. For a 700-table schema this adds
microseconds, dominated by the I/O cost of reading the schema.

Risk: the comparison is now lossier within a single engine — it no
longer flags pure size-without-type-class drift on a column. That is the
intended cross-engine contract, and within-engine drift is still
caught because the canonical type variant carries the size for sized
types. If a stricter intra-engine diff is ever needed, the LogicalKind
projection is the only thing to relax.

Coverage: 16 unit tests covering both equivalence and divergence paths;
end-to-end against SQLite, PostgreSQL, and MSSQL with 679-table LUP
schemas all return "schemas match".

Signed-off-by: Christian Parpart <christian@parpart.family>
The yaml profile loading and secret resolution previously lived in
dbtool's own translation unit (with its own `<yaml-cpp/yaml.h>`
include and bespoke `~/.config/dbtool/dbtool.yml` parser). Both
concerns now have first-class homes inside Lightweight
(`Config::ProfileStore`, `Secrets::SecretResolver`), so dbtool just
calls into them.

Drops dbtool's direct yaml-cpp link, removes the platform-specific
`<shlobj.h>` / `<pwd.h>` config-path probing, and adds the
`--profile <name>` flag plus a connection-resolution flow that passes
the profile through `SecretResolver` so credentials never live in
plaintext on disk.

Status output also picks up the latest applied LUP release (the new
`LIGHTWEIGHT_SQL_RELEASE` markers from the migration plugins) so the
operator sees "Latest release: 6.9.7 — 406/406 migrations applied".

Signed-off-by: Christian Parpart <christian@parpart.family>
Routes `SqlLogger` to `StandardLogger` at startup so warnings emitted
by the lup-truncate render path (and any other warnings Lightweight
raises) reach stderr instead of being dropped on the Null logger.

Signed-off-by: Christian Parpart <christian@parpart.family>
`dbtool diff <SOURCE-A> <SOURCE-B>` reports the differences between two
ODBC-reachable databases. Each <SOURCE> is either a profile name (resolved
through the same Cfg::ProfileStore as --profile) or a raw connection string
starting with DRIVER=...; the heuristic is case-insensitive on the prefix.

Default mode runs the full schema + data diff via SqlSchema::DiffSchemas and
SqlSchema::DiffTableData. --schema-only suppresses the data phase for fast
CI gating; --max-rows N caps per-table row scanning. Read-only on both sides.
Exit 0 = equivalent, 1 = differs, 2 = error.

DiffRenderer in dbtool_lib drives the output: builds a tui::ParsedTable per
section, applies per-row coloring (added=green, removed=red, changed=yellow,
header=bold cyan, borders=dim) via SgrBuilder, and writes a bordered table
to stdout. ANSI colors auto-disable when stdout is not a tty or when
--no-color is given. Width respects $COLUMNS, falling back to 120.

Live progress is fed through the existing StandardProgressManager so users
see "n/total tables" plus a smoothed-rate ETA on big DBs (millions of rows)
without the tool appearing stalled. Per-table SqlException is caught and
surfaced as a per-table skipReason so a single un-diffable table doesn't
abort the whole run — the rest of the schema still gets compared.

Profile→connection-string flattening is extracted from ApplyProfileToOptions
into a shared ProfileToConnectionString helper used by both code paths.

Help text, --show-examples, and docs/dbtool.md are updated; the new
--no-color and --max-rows options are documented alongside --schema-only's
extended dual semantic.

Signed-off-by: Christian Parpart <christian@parpart.family>
…e queries

`DiffTableData` built one SELECT from a single `Table const&` and ran it on both
connections. The schema label baked into that descriptor came from side A's
introspection — so a postgres-vs-mssql data diff issued
`SELECT ... FROM "public"."schema_migrations"` against the MSSQL side and
every shared table was reported as `(skipped: error: Invalid object name
'public.schema_migrations')`.

Take one descriptor per side. Each side qualifies its own SELECT with its
own schema label and table name; the column layout is still derived from
side A (cross-engine schema equivalence guarantees the column names match,
and using one side's order keeps rows positionally aligned during the
merge). The dbtool caller hands `DiffSharedTables` the corresponding side-B
descriptor it already had in `tablesByNameB`.

Risk: the public API signature changed
(`DiffTableData(connA, connB, table, ...)`
 → `DiffTableData(connA, connB, tableA, tableB, ...)`).
Only one in-tree caller (the dbtool diff command); updated, plus the four
existing tests now pass `(schema, schema)` to keep their semantics.

Coverage: existing four `[SqlDataDiff]` cases + a new regression that pairs
two tables with different names on each side and verifies the row scan
still completes, which would have failed under the old shared-descriptor
behaviour.

Signed-off-by: Christian Parpart <christian@parpart.family>
The cross-engine schema-diff now pairs tables by name only, so a row may
involve `public.X` on one side and `dbo.X` on the other. Render the
qualified name as either `schema.X` (when both sides agree or only one
side has a label) or `schemaA.X | schemaB.X` (when the two sides report
different labels), so the operator can still see the engine context.

Column-diff details now include the canonical type variant alongside
the dialect-dependent type string — `NCHAR(30) [Char(30)]` instead of
just `NCHAR(30)`. When a `type` change is reported, the canonical kinds
on each side are visible in the cell, which is the actual basis for the
diff decision (the dialect string is informational).

Signed-off-by: Christian Parpart <christian@parpart.family>
The Windows ODBC Driver Manager registers the SQLite driver as
"SQLite3 ODBC Driver" — a name with spaces that must be wrapped in
braces in a connection string. The unixODBC convention used on
Linux/macOS keeps a bare "SQLite3" identifier. Pick the right form
per platform so SqliteConn produces a string the local driver
manager actually accepts.

Signed-off-by: Christian Parpart <christian@parpart.family>
…latforms

Aligns the helper with the canonical test-env string used everywhere
else: CI's odbcinst.ini rename gives both platforms a single entry
named `SQLite3 ODBC Driver`, so the previous Linux-only `SQLite3`
fallback misses on the rebased CI image.

Signed-off-by: Christian Parpart <christian@parpart.family>
Combined slice of upstream commits 24ccdfd (CI feedback) and ec27e91
(clang-format) restricted to the diff feature's surface area:

- Doxygen comments for public members of SqlSchemaDiff (ColumnDiff,
  TableDiff) and SqlDataDiff (RowDiff, TableDataDiff, DiffProgressEvent),
  plus SecretResolver move ops.
- Replace \\ref with backticks in SqlSchemaDiff/SqlDataDiff doc comments
  so doxygen stops complaining about unresolved references.
- Exclude src/Lightweight/tui/ from doxygen — vendored code follows the
  upstream comment style.
- Migrate \`.find(x) != std::string::npos\` to \`.contains(x)\` (and the
  inverse) across PR-introduced code (ProfileStore.cpp, SqlSchema.cpp,
  ConfigProfileStoreTests.cpp, SecretResolverTests.cpp), satisfying
  clang-tidy 22's readability-container-contains check.
- clang-format-22 across modified C++ files (tui/, Secrets/, dbtool main.cpp).

No behavioral changes — purely doc / lint / whitespace.

Signed-off-by: Christian Parpart <christian@parpart.family>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI pipeline CLI command line interface tools CMake Core API documentation Improvements or additions to documentation Query Formatter SQL dialect implementations tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants