fix(migration): preserve blob file references during LMDB-to-RocksDB migration#468
fix(migration): preserve blob file references during LMDB-to-RocksDB migration#468
Conversation
…migration Without encodeBlobsWithFilePath context, the msgpack blob pack() function falls through to reading the blob file and embedding its content inline in the RocksDB record. This caused storage bloat and orphaned filesystem blob files after migration. Wrapping targetDbi.put() with encodeBlobsWithFilePath() ensures the pack() function calls saveBlob(), which short-circuits when the blob already has a fileId (as all migrated blobs do), and encodes the original [storageIndex, fileId] reference instead of the file content. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Reviewed; no blockers found. |
…ords Without a RecordEncoder on the target DBI, primary records were written as plain msgpack with no metadata header, so HAS_BLOBS was never set. cleanupOrphans and dropTable both gate blob scans on HAS_BLOBS, meaning migrated records with blobs would have their filesystem blob files deleted as orphans. Fix: assign a fresh RecordEncoder as targetDbi.encoder for primary stores, and call setNextEncoding(version, 0) before each put. RecordEncoder.encode() then enters the metadata path, serializes the value via superEncode (which triggers encodeBlobsWithFilePath/blobsWereEncoded), folds in HAS_BLOBS, and writes the binary header (8-byte timestamp + 4-byte metadata + msgpack value) that downstream readers expect. setNextEncoding is safe: the migration runs synchronously before any external requests are accepted, so no concurrent code can race the module-level timestampNextEncoding / metadataInNextEncoding variables. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Follow-up commit: HAS_BLOBS metadata flagAddressed the blocker. Without a Fix: assign a fresh
Thread safety: the migration runs synchronously before the server accepts any requests, so the module-level Note: other LMDB metadata flags ( 🤖 Generated with Claude Code |
…in migration setNextEncoding now accepts expiresAt, nodeId, and residencyId and sets the corresponding module-level variables so RecordEncoder.encode() writes them into the RocksDB record header alongside HAS_BLOBS. In the migration loop, both code paths for source metadata are handled: - lastMetadata (set by RecordEncoder.decode for unpatched sourceDbi stores) - entry fields (set by handleLocalTimeForGets for patched stores) The first non-undefined value wins via nullish coalescing. Without this, TTL expirations, node IDs, and residency IDs from LMDB records would be silently dropped during migration to RocksDB. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Follow-up: preserve expiresAt, nodeId, residencyId during migrationYes, expiration dates and node IDs were not being copied — confirmed. Fix: expanded The migration loop now reads source metadata via a dual-path approach:
Nullish coalescing ( Note: 🤖 Generated with Claude Code |
1. TypeError: encoder getter-only on RocksDatabase — patch existing encoder object's encode/saveStructures/getStructures methods and set isRocksDB/rootStore instead of replacing the getter-only encoder property. 2. Full replication copy after migration — preserve REMOTE_NODE_IDS binary data from the LMDB audit store into the new RocksDB root store so getIdOfRemoteNode returns consistent node IDs and the existing sequence tracking entries (already migrated in dbisDB) still match. 3. Analytics ENOENT errors after migration — resetDatabases() was gated on table.primaryStore.path === path to clean up database entries, which is unreliable for LMDB sub-databases; simplified to always remove the database entry when an LMDB store is cleaned up, so getDatabases() re-creates it with the new RocksDB-backed tables. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Follow-up: Three runtime fixes from actual migration testing (AI-generated)Additional commit fixes three errors discovered while running Fix 1 —
|
Summary
copyDbToRockswas callingtargetDbi.put()withoutencodeBlobsWithFilePathcontextpack()extension falls through to reading blob files and embedding their content inline in RocksDB recordsputwithencodeBlobsWithFilePath, which causessaveBlob()to short-circuit on the existingfileIdand encode[storageIndex, fileId]as a reference — no file copy requiredWhy this works
Blob files live on the filesystem at
{basePath}/blobs/{databaseName}/, separate from both LMDB and RocksDB. LMDB records hold only(storageIndex, fileId)references. TheencodeBlobsWithFilePathcontext makes the msgpackpack()function callsaveBlob(), which early-returns whenstorageInfo.fileIdis already set, and then packs the original reference tuple — preserving the existing blob file with zero copying.The contrast with
copyDb(LMDB→LMDB) is instructive: that function uses binary-mode copying so references survive verbatim.copyDbToRocksdecodes and re-encodes, which is why the explicit context is needed.Attention
HAS_BLOBSmetadata flag: Gemini review confirmed this is correctly handled —blobsWereEncodedis set byencodeBlobsWithFilePathand RecordEncoder uses it to set the flag. However, in this migrationtargetDbiis a rawRocksDatabase(not RecordEncoder-wrapped), so whether theHAS_BLOBSflag is written into the record header for migrated records is worth verifying.copyDbToRocks; existing migration tests only cover LMDB compaction. An integration test that writes blob records to LMDB and verifies RocksDB records contain file references (not inline content) would be the appropriate coverage.🤖 Generated with Claude Code