fix(migration): preserve blob file references during LMDB-to-RocksDB migration by kriszyp · Pull Request #468 · HarperFast/harper

kriszyp · 2026-05-04T23:18:44Z

Summary

copyDbToRocks was calling targetDbi.put() without encodeBlobsWithFilePath context
Without it, the msgpack pack() extension falls through to reading blob files and embedding their content inline in RocksDB records
This caused storage bloat and orphaned filesystem blob files post-migration
Fix wraps the primary-record put with encodeBlobsWithFilePath, which causes saveBlob() to short-circuit on the existing fileId and encode [storageIndex, fileId] as a reference — no file copy required

Why this works

Blob files live on the filesystem at {basePath}/blobs/{databaseName}/, separate from both LMDB and RocksDB. LMDB records hold only (storageIndex, fileId) references. The encodeBlobsWithFilePath context makes the msgpack pack() function call saveBlob(), which early-returns when storageInfo.fileId is already set, and then packs the original reference tuple — preserving the existing blob file with zero copying.

The contrast with copyDb (LMDB→LMDB) is instructive: that function uses binary-mode copying so references survive verbatim. copyDbToRocks decodes and re-encodes, which is why the explicit context is needed.

Attention

HAS_BLOBS metadata flag: Gemini review confirmed this is correctly handled — blobsWereEncoded is set by encodeBlobsWithFilePath and RecordEncoder uses it to set the flag. However, in this migration targetDbi is a raw RocksDatabase (not RecordEncoder-wrapped), so whether the HAS_BLOBS flag is written into the record header for migrated records is worth verifying.
No unit test: No test infrastructure currently exists for copyDbToRocks; existing migration tests only cover LMDB compaction. An integration test that writes blob records to LMDB and verifies RocksDB records contain file references (not inline content) would be the appropriate coverage.
Audit log: Not migrated (per the existing comment), so no blob fix needed there.

🤖 Generated with Claude Code

…migration Without encodeBlobsWithFilePath context, the msgpack blob pack() function falls through to reading the blob file and embedding its content inline in the RocksDB record. This caused storage bloat and orphaned filesystem blob files after migration. Wrapping targetDbi.put() with encodeBlobsWithFilePath() ensures the pack() function calls saveBlob(), which short-circuits when the blob already has a fileId (as all migrated blobs do), and encodes the original [storageIndex, fileId] reference instead of the file content. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude · 2026-05-04T23:23:18Z

Reviewed; no blockers found.

…ords Without a RecordEncoder on the target DBI, primary records were written as plain msgpack with no metadata header, so HAS_BLOBS was never set. cleanupOrphans and dropTable both gate blob scans on HAS_BLOBS, meaning migrated records with blobs would have their filesystem blob files deleted as orphans. Fix: assign a fresh RecordEncoder as targetDbi.encoder for primary stores, and call setNextEncoding(version, 0) before each put. RecordEncoder.encode() then enters the metadata path, serializes the value via superEncode (which triggers encodeBlobsWithFilePath/blobsWereEncoded), folds in HAS_BLOBS, and writes the binary header (8-byte timestamp + 4-byte metadata + msgpack value) that downstream readers expect. setNextEncoding is safe: the migration runs synchronously before any external requests are accepted, so no concurrent code can race the module-level timestampNextEncoding / metadataInNextEncoding variables. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kriszyp · 2026-05-05T03:36:18Z

Follow-up commit: HAS_BLOBS metadata flag

Addressed the blocker. Without a RecordEncoder on the target DBI, records were written as plain msgpack with no binary header, so HAS_BLOBS was never set — meaning cleanupOrphans and dropTable would treat migrated blob files as orphans and delete them.

Fix: assign a fresh RecordEncoder as targetDbi.encoder for primary stores, and call setNextEncoding(version, 0) before each put. RecordEncoder.encode() then:

Enters the metadata path (because timestampNextEncoding > 0 and metadataInNextEncoding >= 0)
Calls superEncode which serializes the value — this triggers the blob pack() extension, setting blobsWereEncoded = true
Folds in HAS_BLOBS and writes [8-byte timestamp | 4-byte metadata flags | msgpack value]

Thread safety: the migration runs synchronously before the server accepts any requests, so the module-level timestampNextEncoding/metadataInNextEncoding variables can't be raced.

Note: other LMDB metadata flags (HAS_EXPIRATION, HAS_NODE_ID, HAS_RESIDENCY_ID) are not preserved — those values aren't exposed from the LMDB getRange path used here. This is a pre-existing limitation of the migration, not introduced by this change.

🤖 Generated with Claude Code

…in migration setNextEncoding now accepts expiresAt, nodeId, and residencyId and sets the corresponding module-level variables so RecordEncoder.encode() writes them into the RocksDB record header alongside HAS_BLOBS. In the migration loop, both code paths for source metadata are handled: - lastMetadata (set by RecordEncoder.decode for unpatched sourceDbi stores) - entry fields (set by handleLocalTimeForGets for patched stores) The first non-undefined value wins via nullish coalescing. Without this, TTL expirations, node IDs, and residency IDs from LMDB records would be silently dropped during migration to RocksDB. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kriszyp · 2026-05-05T03:56:38Z

Follow-up: preserve expiresAt, nodeId, residencyId during migration

Yes, expiration dates and node IDs were not being copied — confirmed. setNextEncoding was hardcoded to metadata=0 with no expiry/node/residency data.

Fix: expanded setNextEncoding to accept those fields and set the corresponding module-level variables (expiresAtNextEncoding, nodeIdAtNextEncoding, residencyIdAtNextEncoding), which RecordEncoder.encode() writes into the binary header.

The migration loop now reads source metadata via a dual-path approach:

entry.expiresAt / .nodeId / .residencyId — populated by handleLocalTimeForGets on patched stores
lastMetadata?.expiresAt / .nodeId / .residencyId — set by RecordEncoder.decode directly for unpatched stores (which is what the migration's sourceDbi likely is)

Nullish coalescing (entryExpiresAt ?? sourceMeta?.expiresAt ?? -1) means whichever path has data wins, and the default is the correct "not present" sentinel for each field.

Note: additionalAuditRefs (for CRDT audit chains) is not yet preserved — that would need a separate setter and is a more complex case.

🤖 Generated with Claude Code

1. TypeError: encoder getter-only on RocksDatabase — patch existing encoder object's encode/saveStructures/getStructures methods and set isRocksDB/rootStore instead of replacing the getter-only encoder property. 2. Full replication copy after migration — preserve REMOTE_NODE_IDS binary data from the LMDB audit store into the new RocksDB root store so getIdOfRemoteNode returns consistent node IDs and the existing sequence tracking entries (already migrated in dbisDB) still match. 3. Analytics ENOENT errors after migration — resetDatabases() was gated on table.primaryStore.path === path to clean up database entries, which is unreliable for LMDB sub-databases; simplified to always remove the database entry when an LMDB store is cleaned up, so getDatabases() re-creates it with the new RocksDB-backed tables. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kriszyp · 2026-05-06T23:44:32Z

Follow-up: Three runtime fixes from actual migration testing (AI-generated)

Additional commit fixes three errors discovered while running migrateOnStart against a live system.

Fix 1 — `TypeError: Cannot set property encoder of #<RocksDatabase> which has only a getter`

encoder is a getter-only property on RocksDatabase, so targetDbi.encoder = new RecordEncoder(...) throws at runtime. Fixed by patching the existing encoder object instead of replacing it: sets isRocksDB, rootStore, and copies encode/saveStructures/getStructures from a temporary RecordEncoder instance. The closure semantics are safe — superEncode.call(this, ...) delegates to Encoder.prototype.encode with existingEncoder as this.

Fix 2 — Full table copy on replication reconnect after migration

Root cause: the new empty RocksDB audit store has no REMOTE_NODE_IDS entry, so getIdOfRemoteNode assigns fresh node IDs. The sequence tracking entries already migrated in dbisDB were keyed with the old IDs → dbisDB.get([Symbol.for('seq'), newId]) returns null → startTime = 1 → full copy from leader.

Fix: at the end of copyDbToRocks, read REMOTE_NODE_IDS binary data from the LMDB audit store and write it into the new targetRootStore via putSync(key, asBinary(bytes)) — the same pattern used by RocksTransactionLogStore.putSync for symbol keys.

Fix 3 — Analytics `ENOENT: no such file or directory, stat '...HPSTORE.mdb'`

resetDatabases() was conditionally deleting the database entry only if table.primaryStore.path === lmdbEnvPath — but LMDB sub-databases (from openDB) may not expose the same path as the environment root, making the check unreliable. Simplified to always remove databases[store.databaseName] when an LMDB store is cleaned up. This is safe because compactOnStart resets needsDeletion via readMetaDb() before the cleanup loop, so compact-on-start tables are unaffected. After deletion, the next table() or database() call re-creates the entry using the now-open RocksDB store.

Areas for reviewers to focus on

The encoder this-context correctness when encode is called on the patched existingEncoder object (line ~408 in copyDb.ts)
Whether targetRootStore.putSync(symbolKey, asBinary(bytes)) is the right API call — mirrors existing RocksTransactionLogStore usage exactly
The resetDatabases() simplification: is there any code path where an LMDB store with needsDeletion=true should NOT have its database removed?

kriszyp requested a review from a team as a code owner May 4, 2026 23:18

claude Bot reviewed May 4, 2026

View reviewed changes

Comment thread bin/copyDb.ts

kriszyp added the patch label May 5, 2026

cb1kenobi approved these changes May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(migration): preserve blob file references during LMDB-to-RocksDB migration#468

fix(migration): preserve blob file references during LMDB-to-RocksDB migration#468
kriszyp wants to merge 4 commits intomainfrom
fix/migrate-preserve-blob-references

kriszyp commented May 4, 2026

Uh oh!

Uh oh!

claude Bot commented May 4, 2026 •

edited

Loading

Uh oh!

kriszyp commented May 5, 2026

Uh oh!

kriszyp commented May 5, 2026

Uh oh!

kriszyp commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kriszyp commented May 4, 2026

Summary

Why this works

Attention

Uh oh!

Uh oh!

claude Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kriszyp commented May 5, 2026

Follow-up commit: HAS_BLOBS metadata flag

Uh oh!

kriszyp commented May 5, 2026

Follow-up: preserve expiresAt, nodeId, residencyId during migration

Uh oh!

kriszyp commented May 6, 2026

Follow-up: Three runtime fixes from actual migration testing (AI-generated)

Fix 1 — TypeError: Cannot set property encoder of #<RocksDatabase> which has only a getter

Fix 2 — Full table copy on replication reconnect after migration

Fix 3 — Analytics ENOENT: no such file or directory, stat '...HPSTORE.mdb'

Areas for reviewers to focus on

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented May 4, 2026 •

edited

Loading

Fix 1 — `TypeError: Cannot set property encoder of #<RocksDatabase> which has only a getter`

Fix 3 — Analytics `ENOENT: no such file or directory, stat '...HPSTORE.mdb'`