Skip to content

Record caching#410

Draft
kriszyp wants to merge 7 commits intomainfrom
record-caching
Draft

Record caching#410
kriszyp wants to merge 7 commits intomainfrom
record-caching

Conversation

@kriszyp
Copy link
Copy Markdown
Member

@kriszyp kriszyp commented Apr 26, 2026

Introduces CachingRocksDatabase, a RocksDatabase subclass that layers a
per-instance WeakLRUCache on top of the process-wide VerificationTable.
Cache freshness is checked via entry.version (Harper's localTime timestamp)
rather than a txnId, so reads hitting a fresh VT slot skip disk entirely.

Primary RocksDB stores now use CachingRocksDatabase instead of RocksDatabase.

DatabaseTransaction creates transactions with coordinatedRetry: true, so
IsBusy conflicts are signalled as RETRY_NOW_VALUE (immediate retry, no
backoff) rather than an ERR_BUSY rejection. The old quadratic-backoff retry
handler is removed.

kriszyp and others added 2 commits April 25, 2026 20:56
Introduces CachingRocksDatabase, a RocksDatabase subclass that layers a
per-instance WeakLRUCache on top of the process-wide VerificationTable.
Cache freshness is checked via entry.version (Harper's localTime timestamp)
rather than a txnId, so reads hitting a fresh VT slot skip disk entirely.

Primary RocksDB stores now use CachingRocksDatabase instead of RocksDatabase.

DatabaseTransaction creates transactions with coordinatedRetry: true, so
IsBusy conflicts are signalled as RETRY_NOW_VALUE (immediate retry, no
backoff) rather than an ERR_BUSY rejection. The old quadratic-backoff retry
handler is removed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tests cover:
- primaryStore is a CachingRocksDatabase (and RocksDatabase) instance
- Basic read/write correctness
- VT slot population after two reads enabling the fast-path
- Third read returns value without DB access (VT fast-path)
- Write clears VT slot so stale version cannot verify
- Remove clears cache; subsequent read returns undefined
- Transaction-scoped reads bypass cache and return correct value
- Concurrent writes complete without error via coordinatedRetry
- Write-then-read returns updated value, not stale cache

Tests are skipped when HARPER_STORAGE_ENGINE=lmdb.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread resources/CachingRocksDatabase.ts Outdated

getSync(id: any, options?: any): any {
if (options?.transaction) {
return super.getSync(id, options);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Untested branch (blocker): The if (options?.transaction) bypass is never exercised by the test suite. The test labelled "Read with a transaction bypasses the cache" passes context = {}, which has no transaction property — options?.transaction is falsy and the cache path is taken, not the bypass. Both legs of this production-vs-fallback split need coverage. An actual RocksTransaction (or a Harper DatabaseTransaction-backed context) needs to be passed to verify the bypass reaches super.getSync without touching the cache.

if (commitResult === RETRY_NOW_VALUE) {
this.retries++;
harperLogger.debug?.('coordinated retry', transaction.id, this.retries);
return this.commit({ transaction });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No application-level retry bound: this.retries++ is incremented on each RETRY_NOW_VALUE but never checked against a limit. The old ERR_BUSY handler had MAX_RETRIES = 40 (now dead code at line 23) that threw ServerError after 40 attempts. If @harperfast/rocksdb-js guarantees it will eventually stop returning RETRY_NOW_VALUE under any contention scenario, please document that guarantee. Otherwise, reinstate an application-level cap so sustained conflicts don't produce unbounded recursion.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 26, 2026

Review: Record caching (PR #410)

1. weak-lru-cache missing from dependencies.md — BLOCKER

File: package.json

What: A new runtime dependency (weak-lru-cache) is added but has no entry in dependencies.md.

Why it matters: The repo policy (documented in dependencies.md itself) requires every new runtime dep to be reviewed and documented there — size, security track record, memory cost, eventual-removal plan, etc. This is enforced in code review.

Suggested fix: Add a ## weak-lru-cache section to dependencies.md answering the standard questions. (The author, Kris Zyp, is the original author of weak-lru-cache, so the section can be brief, but it still needs to exist so the policy holds for future reviewers.)


2. Transaction-bypass branch untested — BLOCKER

File: resources/CachingRocksDatabase.ts:22–23 (also get, line 44)

What: Both getSync and get have an if (options?.transaction) branch that bypasses the cache and delegates directly to super. The test labelled "Read with a transaction bypasses the cache" passes context = {} — an empty object with no transaction property — so options?.transaction is falsy and the non-bypass (cache) path is taken. The bypass branch itself is never exercised.

Why it matters: Per the testing rules, a PR-introduced production-vs-fallback split requires both legs to have coverage. A read executed inside an active transaction is a real production path; silently falling through to cached data when a transaction is present would be a correctness bug.

Suggested fix: Add a test that creates an actual transaction (e.g. via DatabaseTransaction or a direct RocksTransaction) and passes it in the options, then asserts the result is correct and that no WeakLRUCache entry was written for that key.


3. MAX_RETRIES is dead code; coordinated-retry loop has no application-level bound — finding

File: resources/DatabaseTransaction.ts:23 / 269

What: const MAX_RETRIES = 40 is unreferenced — its only consumer was the removed ERR_BUSY handler. The new RETRY_NOW_VALUE path increments this.retries on each retry but never checks it against any limit.

Why it matters: The old code would throw a ServerError after 40 attempts, preventing unbounded recursion under sustained contention. If the coordinatedRetry: true mechanism in @harperfast/rocksdb-js internally bounds the number of RETRY_NOW_VALUE returns (i.e. it eventually throws an error rather than looping forever), the application-level guard isn't needed — but that guarantee should be stated explicitly. If there's no such guarantee, the removal is a regression in resilience.

Suggested fix: Either (a) remove MAX_RETRIES and add a comment explaining that the C++ layer bounds retries and will throw after N attempts, or (b) keep an application-level cap and throw ServerError if this.retries exceeds it.


Surfaces verified

  • CachingRocksDatabase: cache invalidation on putSync/removeSync ✓; put/remove patched to putSync/removeSync in openRocksDatabase ✓; verificationTable: true forced in constructor ✓; FRESH_VERSION_FLAG fast-path logic ✓; dupSort stores (RocksIndexStore) correctly bypass caching ✓
  • DatabaseTransaction: coordinatedRetry: true wiring ✓; success-path retry loop ✓; old ERR_BUSY/backoff removal ✓
  • databases.ts: CachingRocksDatabase replaces RocksDatabase only for non-dupSort primary stores ✓
  • dependencies.md: weak-lru-cache absent ✗ (finding 1)
  • Test file: happy-path CRUD, VT population, write-clears-cache, concurrent-writes ✓; transaction-bypass branch ✗ (finding 2)

kriszyp and others added 4 commits April 25, 2026 22:19
Benchmarks six scenarios with 2,000 records (~80-byte values each):

  1. getSync cold          — first read, cache miss (both go to DB)
  2. getSync soft VT miss  — 2nd read: cache warm, expectedVersion passed,
                             DB read happens but slot is populated + FRESH returned
  3. getSync VT fast-path  — 3rd+ read: VT slot matches, no DB access
  4. getSync hot key       — repeated single-key reads
  5. get (async) VT hit    — async path with warm cache
  6. putSync               — write throughput (cache invalidation cost)

Each scenario reports ops/sec for plain RocksDatabase and CachingRocksDatabase
with a speedup ratio, printed as a formatted table.

Adds `npm run bench` script. Also fixes `test:unit:resources` to exclude
*.bench.js files so they don't run as part of the normal test suite.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…integration

Renames CachingRocksDatabase to PrimaryRocksDatabase and moves all RocksDB-specific
entry handling (previously done by handleLocalTimeForGets instance patches) into real
class methods. Caching now operates at the getEntry level using full Entry objects with
`version`, using verifyVersion/populateVersion for VT-based freshness checks rather
than expectedVersion. handleLocalTimeForGets delegates to initStore() for
PrimaryRocksDatabase instances via the isPrimaryRocksDatabase marker. All 10 new tests
and 1720 existing tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rsion) call

The rocksdb-js store.getSync/get natively accept expectedVersion: on a VT hit they
return FRESH_VERSION_FLAG without touching disk; on a VT miss they read from the DB
and auto-populate the VT slot. Replacing the two-step verifyVersion+populateVersion
pattern with a single native call removes one round-trip per read on the warm cache
path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nchmark

- PrimaryRocksDatabase.getEntry now uses populateVersion option for cold
  reads (single native call, no separate populateVersion()) and merges
  options when passing expectedVersion so transaction snapshot is preserved
- handleLocalTimeForGets: remove all isRocksDB branches — RocksDB stores
  now use the isPrimaryRocksDatabase early-return path exclusively; the
  remaining code is LMDB-only, so isRocksDB is always false there
- Benchmark: use RecordEncoder + recordUpdater with Date.now()-range
  versions so values carry encoded version bytes and caching works correctly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
if (commitResult === RETRY_NOW_VALUE) {
this.retries++;
harperLogger.debug?.('coordinated retry', transaction.id, this.retries);
return this.commit({ transaction });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No retry cap on the coordinatedRetry path.

MAX_RETRIES = 40 is still defined at the top of the file but is now unreachable dead code — nothing checks it. If RocksDB keeps returning RETRY_NOW_VALUE (sustained write contention, edge-case in the native layer, etc.) this recurses without bound until stack exhaustion.

Suggested change
return this.commit({ transaction });
if (commitResult === RETRY_NOW_VALUE) {
this.retries++;
if (this.retries > MAX_RETRIES) {
throw new ServerError(
`After ${MAX_RETRIES} retries, unable to commit transaction, transaction is in conflict with ongoing writes`
);
}
harperLogger.debug?.('coordinated retry', transaction.id, this.retries);
return this.commit({ transaction });
}

Comment thread package.json
@@ -196,6 +197,7 @@
"jsonata": "1.8.7",
"jsonwebtoken": "9.0.3",
"lmdb": "3.5.4",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weak-lru-cache is not documented in dependencies.md.

Per this repo's own policy, every new runtime dependency requires an entry in dependencies.md answering the standard checklist (size, security track record, overlap, memory cost, removal plan, etc.). Please add the entry before merging.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 26, 2026

Review: Record caching (PR #410)

2 blockers found.


1. No retry cap on the coordinatedRetry path

File: resources/DatabaseTransaction.ts:266-270
What: The old ERR_BUSY handler enforced MAX_RETRIES = 40 before throwing a ServerError. The new RETRY_NOW_VALUE path increments this.retries and recurses unconditionally — MAX_RETRIES is now dead code (defined but never read).
Why it matters: If RocksDB keeps returning RETRY_NOW_VALUE under sustained write contention or any edge case in the native layer, commit recurses without bound until stack exhaustion or process hang. The protection existed before this PR and was silently dropped.
Suggested fix: Add if (this.retries > MAX_RETRIES) throw new ServerError(...) inside the RETRY_NOW_VALUE branch (inline suggestion posted).


2. weak-lru-cache not documented in dependencies.md

File: package.json:199
What: weak-lru-cache: ^1.2.2 is added as a runtime dependency. dependencies.md has no entry for it.
Why it matters: The repo's own policy (documented at the top of dependencies.md) requires every new runtime dependency to answer the standard checklist: size + transitive deps, security track record, memory cost, overlap with existing packages, removal plan, etc. That checklist exists to keep the dependency surface intentional and reviewable.
Suggested fix: Add a ## weak-lru-cache section to dependencies.md before merging.


What I traced (for spot-check)

  • PrimaryRocksDatabasegetEntry cache path correct: FRESH_VERSION_FLAG returns cached entry, cold reads seed the VT slot via populateVersion, putSync/removeSync both invalidate the cache. databases.ts assigns db.put = db.putSync so the async surface also invalidates. ✓
  • handleLocalTimeForGets early-return — all callers of handleLocalTimeForGets on RocksDB stores go through openRocksDatabase which now returns a PrimaryRocksDatabase; the isPrimaryRocksDatabase guard correctly short-circuits to initStore. LMDB stores fall through to the unchanged path. ✓
  • encoder.isRocksDB for LMDB — previously set to false explicitly; now never set for LMDB stores. undefined is falsy so all if (this.isRocksDB) branches in RecordEncoder behave identically. ✓
  • coordinatedRetry testcaching-rocks-database.test.js has a concurrent-write test that exercises the retry path; it skips under LMDB. Primary (RocksDB) path is covered. ✓
  • getRange not caching — intentional; range reads bypass the per-key cache. No correctness issue. ✓

attributes: [
{ name: 'id', isPrimaryKey: true },
{ name: 'embedding', indexed: { type: 'HNSW' }, type: 'Array' },
],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocker — missing worker script

vectorIndex-thread.js does not exist anywhere in this repository (confirmed with git log and directory listing). Every Worker instantiation here will throw ENOENT immediately, causing the entire before() hook to fail and the test suite to error out without running.

Please add the missing worker script (it should live at unitTests/resources/vectorIndex-thread.js) that receives the { type: 'insert', start, count, dims } message and posts back a result/error reply.

Comment thread package.json
"jsonata": "1.8.7",
"jsonwebtoken": "9.0.3",
"lmdb": "3.5.4",
"weak-lru-cache": "^1.2.2",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocker — runtime dependency not documented in dependencies.md

weak-lru-cache is a new runtime dependency but has no entry in dependencies.md. Per repo policy, every runtime dependency addition must answer the questions listed there (size, security track record, overlap with existing packages, memory cost, removal plan, etc.) so the full engineering team can review the decision alongside the code.

Please add an entry for weak-lru-cache to dependencies.md before merging.

import { setTimeout as delay } from 'node:timers/promises';
import { Transaction as RocksTransaction, type Store as RocksStore } from '@harperfast/rocksdb-js';
import { Transaction as RocksTransaction, type Store as RocksStore, constants } from '@harperfast/rocksdb-js';
const RETRY_NOW_VALUE = constants.RETRY_NOW_VALUE;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit — MAX_RETRIES (line 23) is now dead code

The constant MAX_RETRIES = 40 is still defined but no longer used — the ERR_BUSY backoff guard that consumed it was removed when switching to coordinatedRetry. Consider removing it, or reinstate it as a cap inside the new RETRY_NOW_VALUE branch to preserve the JS-side safety net if the native layer ever loops unexpectedly.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 27, 2026

Review: Record caching (PR #410)

What I traced

Surface Verified
PrimaryRocksDatabase — cache hit/miss paths, getEntry/getSync/get/getRange/putSync/removeSync
DatabaseTransactioncoordinatedRetry wiring, RETRY_NOW_VALUE branch, old ERR_BUSY removal
RecordEncoder.handleLocalTimeForGets — early-return for isPrimaryRocksDatabase, LMDB path equivalence
databases.tsopenRocksDatabase swap from RocksDatabase.open to PrimaryRocksDatabase
Cache-write safety: put/remove redirection to putSync/removeSync in databases.ts:129-130
LMDB encoder behavioural parity after removing store.encoder.isRocksDB = false assignment ✓ (all uses are truthy-checks; undefinedfalse)
dependencies.md for new runtime dep ✗ — see finding 1
Worker script referenced by new HNSW concurrent test ✗ — see finding 2

1. weak-lru-cache not documented in dependencies.mdblocker

File: package.json
What: weak-lru-cache is added as a runtime dependency with no entry in dependencies.md.
Why it matters: Per repo policy (dependencies.md intro), every new runtime dependency must answer the documented checklist (size, security history, transitive deps, overlap, memory cost, removal plan) so the whole engineering team can review it alongside the diff. The inline comment on package.json:200 has more detail.
Suggested fix: Add the entry to dependencies.md before merging.


2. Missing vectorIndex-thread.js worker script — blocker

File: unitTests/resources/vectorIndex.test.js:313
What: The new HNSW concurrent-PUT test (describe('HNSW concurrent PUT race condition…')) creates Worker instances pointing at __dirname + '/vectorIndex-thread.js', but that file does not exist in the repository.
Why it matters: The before() hook will throw ENOENT the moment workers are spawned, causing the entire test suite to fail before any assertion runs. The new test that is supposed to validate the coordinatedRetry fix for issue #386 effectively does not run.
Suggested fix: Add unitTests/resources/vectorIndex-thread.js implementing the { type: 'insert', start, count, dims } protocol the test expects.


Minor notes (not blockers)

  • MAX_RETRIES is dead code. const MAX_RETRIES = 40 in DatabaseTransaction.ts:23 is no longer referenced — the backoff guard that used it was removed. Remove it, or consider wiring it back as an upper bound inside the new RETRY_NOW_VALUE recursion so there is a JS-side safety net if the native layer ever loops unexpectedly (inline comment on DatabaseTransaction.ts has more detail).

  • this.retries is incremented but never checked in the new path. The old logic raised ServerError after 40 retries; now the count grows without consequence. Probably fine if coordinatedRetry is designed to converge, but worth a comment in the code explaining that invariant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant