Skip to content

fix(migration): preserve blob file references during LMDB-to-RocksDB migration#468

Open
kriszyp wants to merge 4 commits intomainfrom
fix/migrate-preserve-blob-references
Open

fix(migration): preserve blob file references during LMDB-to-RocksDB migration#468
kriszyp wants to merge 4 commits intomainfrom
fix/migrate-preserve-blob-references

Conversation

@kriszyp
Copy link
Copy Markdown
Member

@kriszyp kriszyp commented May 4, 2026

Summary

  • copyDbToRocks was calling targetDbi.put() without encodeBlobsWithFilePath context
  • Without it, the msgpack pack() extension falls through to reading blob files and embedding their content inline in RocksDB records
  • This caused storage bloat and orphaned filesystem blob files post-migration
  • Fix wraps the primary-record put with encodeBlobsWithFilePath, which causes saveBlob() to short-circuit on the existing fileId and encode [storageIndex, fileId] as a reference — no file copy required

Why this works

Blob files live on the filesystem at {basePath}/blobs/{databaseName}/, separate from both LMDB and RocksDB. LMDB records hold only (storageIndex, fileId) references. The encodeBlobsWithFilePath context makes the msgpack pack() function call saveBlob(), which early-returns when storageInfo.fileId is already set, and then packs the original reference tuple — preserving the existing blob file with zero copying.

The contrast with copyDb (LMDB→LMDB) is instructive: that function uses binary-mode copying so references survive verbatim. copyDbToRocks decodes and re-encodes, which is why the explicit context is needed.

Attention

  • HAS_BLOBS metadata flag: Gemini review confirmed this is correctly handled — blobsWereEncoded is set by encodeBlobsWithFilePath and RecordEncoder uses it to set the flag. However, in this migration targetDbi is a raw RocksDatabase (not RecordEncoder-wrapped), so whether the HAS_BLOBS flag is written into the record header for migrated records is worth verifying.
  • No unit test: No test infrastructure currently exists for copyDbToRocks; existing migration tests only cover LMDB compaction. An integration test that writes blob records to LMDB and verifies RocksDB records contain file references (not inline content) would be the appropriate coverage.
  • Audit log: Not migrated (per the existing comment), so no blob fix needed there.

🤖 Generated with Claude Code

…migration

Without encodeBlobsWithFilePath context, the msgpack blob pack() function
falls through to reading the blob file and embedding its content inline in
the RocksDB record. This caused storage bloat and orphaned filesystem blob
files after migration.

Wrapping targetDbi.put() with encodeBlobsWithFilePath() ensures the pack()
function calls saveBlob(), which short-circuits when the blob already has a
fileId (as all migrated blobs do), and encodes the original [storageIndex,
fileId] reference instead of the file content.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kriszyp kriszyp requested a review from a team as a code owner May 4, 2026 23:18
Comment thread bin/copyDb.ts
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 4, 2026

Reviewed; no blockers found.

…ords

Without a RecordEncoder on the target DBI, primary records were written as
plain msgpack with no metadata header, so HAS_BLOBS was never set.
cleanupOrphans and dropTable both gate blob scans on HAS_BLOBS, meaning
migrated records with blobs would have their filesystem blob files deleted
as orphans.

Fix: assign a fresh RecordEncoder as targetDbi.encoder for primary stores,
and call setNextEncoding(version, 0) before each put. RecordEncoder.encode()
then enters the metadata path, serializes the value via superEncode (which
triggers encodeBlobsWithFilePath/blobsWereEncoded), folds in HAS_BLOBS, and
writes the binary header (8-byte timestamp + 4-byte metadata + msgpack value)
that downstream readers expect.

setNextEncoding is safe: the migration runs synchronously before any external
requests are accepted, so no concurrent code can race the module-level
timestampNextEncoding / metadataInNextEncoding variables.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kriszyp
Copy link
Copy Markdown
Member Author

kriszyp commented May 5, 2026

Follow-up commit: HAS_BLOBS metadata flag

Addressed the blocker. Without a RecordEncoder on the target DBI, records were written as plain msgpack with no binary header, so HAS_BLOBS was never set — meaning cleanupOrphans and dropTable would treat migrated blob files as orphans and delete them.

Fix: assign a fresh RecordEncoder as targetDbi.encoder for primary stores, and call setNextEncoding(version, 0) before each put. RecordEncoder.encode() then:

  1. Enters the metadata path (because timestampNextEncoding > 0 and metadataInNextEncoding >= 0)
  2. Calls superEncode which serializes the value — this triggers the blob pack() extension, setting blobsWereEncoded = true
  3. Folds in HAS_BLOBS and writes [8-byte timestamp | 4-byte metadata flags | msgpack value]

Thread safety: the migration runs synchronously before the server accepts any requests, so the module-level timestampNextEncoding/metadataInNextEncoding variables can't be raced.

Note: other LMDB metadata flags (HAS_EXPIRATION, HAS_NODE_ID, HAS_RESIDENCY_ID) are not preserved — those values aren't exposed from the LMDB getRange path used here. This is a pre-existing limitation of the migration, not introduced by this change.

🤖 Generated with Claude Code

…in migration

setNextEncoding now accepts expiresAt, nodeId, and residencyId and sets the
corresponding module-level variables so RecordEncoder.encode() writes them
into the RocksDB record header alongside HAS_BLOBS.

In the migration loop, both code paths for source metadata are handled:
- lastMetadata (set by RecordEncoder.decode for unpatched sourceDbi stores)
- entry fields (set by handleLocalTimeForGets for patched stores)
The first non-undefined value wins via nullish coalescing.

Without this, TTL expirations, node IDs, and residency IDs from LMDB records
would be silently dropped during migration to RocksDB.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kriszyp
Copy link
Copy Markdown
Member Author

kriszyp commented May 5, 2026

Follow-up: preserve expiresAt, nodeId, residencyId during migration

Yes, expiration dates and node IDs were not being copied — confirmed. setNextEncoding was hardcoded to metadata=0 with no expiry/node/residency data.

Fix: expanded setNextEncoding to accept those fields and set the corresponding module-level variables (expiresAtNextEncoding, nodeIdAtNextEncoding, residencyIdAtNextEncoding), which RecordEncoder.encode() writes into the binary header.

The migration loop now reads source metadata via a dual-path approach:

  • entry.expiresAt / .nodeId / .residencyId — populated by handleLocalTimeForGets on patched stores
  • lastMetadata?.expiresAt / .nodeId / .residencyId — set by RecordEncoder.decode directly for unpatched stores (which is what the migration's sourceDbi likely is)

Nullish coalescing (entryExpiresAt ?? sourceMeta?.expiresAt ?? -1) means whichever path has data wins, and the default is the correct "not present" sentinel for each field.

Note: additionalAuditRefs (for CRDT audit chains) is not yet preserved — that would need a separate setter and is a more complex case.

🤖 Generated with Claude Code

@kriszyp kriszyp added the patch label May 5, 2026
1. TypeError: encoder getter-only on RocksDatabase — patch existing encoder
   object's encode/saveStructures/getStructures methods and set isRocksDB/rootStore
   instead of replacing the getter-only encoder property.

2. Full replication copy after migration — preserve REMOTE_NODE_IDS binary
   data from the LMDB audit store into the new RocksDB root store so
   getIdOfRemoteNode returns consistent node IDs and the existing sequence
   tracking entries (already migrated in dbisDB) still match.

3. Analytics ENOENT errors after migration — resetDatabases() was gated on
   table.primaryStore.path === path to clean up database entries, which is
   unreliable for LMDB sub-databases; simplified to always remove the database
   entry when an LMDB store is cleaned up, so getDatabases() re-creates it
   with the new RocksDB-backed tables.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kriszyp
Copy link
Copy Markdown
Member Author

kriszyp commented May 6, 2026

Follow-up: Three runtime fixes from actual migration testing (AI-generated)

Additional commit fixes three errors discovered while running migrateOnStart against a live system.

Fix 1 — TypeError: Cannot set property encoder of #<RocksDatabase> which has only a getter

encoder is a getter-only property on RocksDatabase, so targetDbi.encoder = new RecordEncoder(...) throws at runtime. Fixed by patching the existing encoder object instead of replacing it: sets isRocksDB, rootStore, and copies encode/saveStructures/getStructures from a temporary RecordEncoder instance. The closure semantics are safe — superEncode.call(this, ...) delegates to Encoder.prototype.encode with existingEncoder as this.

Fix 2 — Full table copy on replication reconnect after migration

Root cause: the new empty RocksDB audit store has no REMOTE_NODE_IDS entry, so getIdOfRemoteNode assigns fresh node IDs. The sequence tracking entries already migrated in dbisDB were keyed with the old IDs → dbisDB.get([Symbol.for('seq'), newId]) returns null → startTime = 1 → full copy from leader.

Fix: at the end of copyDbToRocks, read REMOTE_NODE_IDS binary data from the LMDB audit store and write it into the new targetRootStore via putSync(key, asBinary(bytes)) — the same pattern used by RocksTransactionLogStore.putSync for symbol keys.

Fix 3 — Analytics ENOENT: no such file or directory, stat '...HPSTORE.mdb'

resetDatabases() was conditionally deleting the database entry only if table.primaryStore.path === lmdbEnvPath — but LMDB sub-databases (from openDB) may not expose the same path as the environment root, making the check unreliable. Simplified to always remove databases[store.databaseName] when an LMDB store is cleaned up. This is safe because compactOnStart resets needsDeletion via readMetaDb() before the cleanup loop, so compact-on-start tables are unaffected. After deletion, the next table() or database() call re-creates the entry using the now-open RocksDB store.

Areas for reviewers to focus on

  • The encoder this-context correctness when encode is called on the patched existingEncoder object (line ~408 in copyDb.ts)
  • Whether targetRootStore.putSync(symbolKey, asBinary(bytes)) is the right API call — mirrors existing RocksTransactionLogStore usage exactly
  • The resetDatabases() simplification: is there any code path where an LMDB store with needsDeletion=true should NOT have its database removed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants