Skip to content

Wire the live synapse graph into retrieval (budget-neutral)#142

Merged
dfrostar merged 4 commits into
mainfrom
claude/analyze-openclaw-tokens-ypkup
May 20, 2026
Merged

Wire the live synapse graph into retrieval (budget-neutral)#142
dfrostar merged 4 commits into
mainfrom
claude/analyze-openclaw-tokens-ypkup

Conversation

@dfrostar
Copy link
Copy Markdown
Owner

@dfrostar dfrostar commented May 20, 2026

Summary

The Hebbian synapse store was written to on every query (_reinforce_from_query) but never read during retrieval — selection ranked on vector similarity plus a separate static learned_patterns.json, and synaptic recall only reached the UserPromptSubmit hook (never the MCP/programmatic query() path). The advertised "brain that learns your codebase" had no effect on what the agent actually received.

This branch closes that gap, then proves it helps:

  • b6aeb06 — L3 reranking. Seed spreading activation from the top L3 hits and boost any other result the synapse graph co-activates, so learned association — not just vector similarity — shapes ranking. Reaches every retrieval consumer, not just the hook.
  • 6d4a105 — L2/L3 selection, budget-neutral. Surfaces context vector search missed without spending extra tokens:
    • L3: swap the weakest vector hits for the strongest absent co-activated neighbors (displacement, result count fixed) — not append.
    • L2: a co-activated community can win a slot by outscoring a vector one, but cannot grow how many communities load past what vector search alone surfaced.
    • Adds GraphEmbedder.get_nodes_by_ids.
  • 20d4976 — quality proof. A synapse-recall A/B phase in the self-benchmark (below).

All behind the existing NEURALMIND_SYNAPSE_INJECT kill switch and a no-op on a cold graph, so cold-start output is byte-identical to a build without a synapse store.

Does it actually help? (Phase 3 — synapse A/B)

Same warm graph, same query set, only NEURALMIND_SYNAPSE_INJECT differs:

metric recall off recall on Δ
top-k hit rate 72% 83% +12 pts
reduction ratio 6.1× 6.1× ~0 (budget-neutral)

Associative recall surfaces co-edited modules (e.g. users/crud.py on an auth query) that a purely textual search ranks lower — and does it at no token cost, because recalled nodes displace the weakest hits rather than adding to them.

Why budget-neutral

An earlier additive draft (append recalled nodes) improved recall but dropped reduction from 6.0× to 4.8× — against the headline metric the product is sold on. Displacement keeps the budget fixed: same tokens, better picks.

build reduction avg context
baseline (pre-change) 6.0× 783 tok
additive draft (rejected) 4.8× 979 tok
this branch (cold) 6.0× 783 tok
this branch (warm) 5.9× 804 tok

Test plan

  • tests/test_context_selector.py — 13 synapse tests: L3 reorder/displacement, L2 community displacement + count cap, safety no-ops (no recall / cold graph / kill switch / recall raising).
  • tests/test_benchmark_regression.py — 5 gates, incl. recall must never lower hit rate (catches displacement dropping a relevant hit) and reduction stays budget-neutral.
  • Full suite green (only the firewall-blocked ONNX S3 integration test skips).
  • bash scripts/demo.sh holds at ~6× cold and warm.

Note: this wires the synapse store in alongside the existing learned_patterns.json reranker. Collapsing that now-partly-redundant dual source is a deliberate, riskier follow-up — not in this PR.

https://claude.ai/code/session_01DRbKLVDX9PNyNdXwuNqTDp

claude added 2 commits May 20, 2026 14:24
L3 search ranked results via vector similarity plus a separate static
learned_patterns.json reranker; the Hebbian synapse store was written to
on every query but never consulted during retrieval, and synaptic recall
only reached the UserPromptSubmit hook (never the MCP/programmatic query
path).

Seed spreading activation from the top L3 hits and boost any other result
the synapse graph co-activates, so learned association shapes ranking for
every retrieval consumer. Behind the existing NEURALMIND_SYNAPSE_INJECT
switch and a no-op on a cold graph, so cold-start output is byte-identical.

https://claude.ai/code/session_01DRbKLVDX9PNyNdXwuNqTDp
PR1 wired the synapse graph into L3 reranking. This extends it so learned
co-activation also surfaces context the agent missed — without spending
extra tokens:

- L3: swap the weakest vector hits for the strongest absent neighbors
  (displacement, result count fixed) instead of appending them.
- L2: a co-activated community can win a slot by outscoring a vector one,
  but cannot grow how many communities load past what vector search alone
  surfaced.

Adds GraphEmbedder.get_nodes_by_ids to fetch recalled neighbors. All
behind NEURALMIND_SYNAPSE_INJECT and a no-op on a cold graph. Fixture demo
holds at ~6x reduction warm and cold (was 4.8x with an additive draft).

https://claude.ai/code/session_01DRbKLVDX9PNyNdXwuNqTDp
Copilot AI review requested due to automatic review settings May 20, 2026 15:26
@github-actions github-actions Bot added enhancement New feature or request question Further information is requested labels May 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

NeuralMind self-benchmark

Status: PASS — floor , measured 5.9×.

Phase 1 — Reduction on committed fixture

  • Average reduction: 5.9×
  • Top-k retrieval hit rate: 71.7%
  • Naive baseline: 47,360 tokens (all fixture files concatenated)
  • NeuralMind total: 8,185 tokens across 10 queries
  • Estimated monthly savings @ 100 queries/day on Claude 3.5 Sonnet: ~$35.26
# Query Shape Naive NeuralMind Ratio Hit
1 auth-flow cross-file 4,736 815 5.8× 33.3%
2 api-endpoints focused 4,736 809 5.9× 100.0%
3 billing-flow cross-file 4,736 846 5.6× 33.3%
4 user-storage cross-file 4,736 672 7.0× 50.0%
5 jwt-verify focused 4,736 681 7.0× 100.0%
6 stripe-webhook focused 4,736 838 5.7× 100.0%
7 create-user cross-file 4,736 822 5.8× 50.0%
8 refund focused 4,736 827 5.7× 100.0%
9 db-choice identity 4,736 899 5.3× 100.0%
10 invoice-send cross-file 4,736 976 4.9× 50.0%

Phase 2 — Learning uplift

  • Memory events logged: 20
  • Learned patterns: 20
  • Reduction ratio after neuralmind learn: 5.8× (Δ -0.07× vs. cold)
  • Top-k hit rate after learning: 75.0% (Δ +3.3 points vs. cold)

Note: uplift numbers on a 500-line fixture are intentionally modest — the point is to
verify the learning mechanism persists and applies. On real production repos the lift
is larger; this test only catches regressions in persistence.

Phase 3 — Synapse recall A/B (same warm graph, recall off vs on)

  • Synapse edges after seeding co-editing sessions: 2793
  • Top-k hit rate: 71.7% off → 83.3% on (Δ +11.7 points)
  • Reduction ratio: 5.9× off → 5.8× on (Δ -0.06× — budget-neutral by design)

This isolates the Hebbian synapse layer from the learned_patterns reranker in
Phase 2. The hit-rate delta shows associative recall surfacing co-edited modules a
purely textual search ranks lower; the near-zero reduction delta confirms it does so
without spending extra tokens (recalled nodes displace the weakest hits, not add to them).

Assumptions

  • Baseline: every .py file in tests/fixtures/sample_project/ concatenated.
  • Tokenizer: tiktoken GPT-4o encoding (per-model breakdown in multi_model.json if generated).
  • Pricing: Claude 3.5 Sonnet input @ $3.0/MTok.
  • Regression floor: — well below NeuralMind's typical 40–70× on real repos.

Per-model token reduction

Model Tokenizer Naive NeuralMind Ratio Source
GPT-4o / GPT-4o-mini tiktoken o200k_base 4,739 927 5.1× measured
GPT-4 / GPT-3.5-turbo tiktoken cl100k_base 4,710 918 5.1× measured
Claude 3.5 Sonnet estimated: GPT-4o × 1.08 — install anthropic for an exact count 5,118 1,001 5.1× estimated
Llama 3 (70B) estimated: GPT-4o × 1.22 — Llama tokenizer requires model weights; estimate based on published vocab ratios 5,781 1,130 5.1× estimated

Rows marked measured use the provider's real tokenizer. Rows marked
estimated apply a published vocab-size correction to the GPT-4o count —
honest approximations, not hardcoded claims.


Automated by .github/workflows/ci-benchmark.yml — regenerate locally with python -m tests.benchmark.run.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6d4a1056e0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread neuralmind/context_selector.py Outdated
if num_swap <= 0:
return results
energy_by_id = dict(candidates[:num_swap])
fetched = self.embedder.get_nodes_by_ids(list(energy_by_id))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard synapse pull-in for backends without id lookup

This call assumes every embedding backend implements get_nodes_by_ids, but only GraphEmbedder gained that method in this change. NeuralMind can run with InMemoryEmbeddingBackend (via backend switching/config), and once the synapse graph is warm enough to recall an ID not already in results, this path raises AttributeError and breaks query() instead of degrading gracefully. Add a backend capability check/fallback here (or extend the backend interface and all implementations) before attempting pull-in.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in eb22466. Pull-in now guards get_nodes_by_ids with a callable(getattr(...)) check and degrades to boost-only when the embedder doesn't implement it, instead of raising. Added a regression test (test_pull_in_degrades_without_id_lookup).


Generated by Claude Code

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR wires the live Hebbian synapse graph into retrieval so learned co-activation can influence both L3 reranking and (budget-neutral) L2/L3 selection, rather than only being written during reinforcement.

Changes:

  • Add synapse-driven boosting + displacement to L3 search results (with kill switch / cold-graph no-ops).
  • Allow L2 community selection to be influenced by synapse-recalled community_<id> pseudo-nodes without increasing the community budget.
  • Add GraphEmbedder.get_nodes_by_ids() so L3 can pull in recalled neighbors not returned by vector search.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
neuralmind/context_selector.py Implements synapse recall integration for L2 community scoring and L3 reranking/displacement; adds output labels for synapse boosts and recalled nodes.
neuralmind/core.py Wires ContextSelector.synapse_recall to a synapse-store spreading activation method during build.
neuralmind/embedder.py Adds get_nodes_by_ids() to fetch recalled nodes by id from the vector store for L3 displacement.
tests/test_context_selector.py Adds targeted tests covering L3 boost/reorder/displacement and L2 community displacement/budget caps, including cold-graph and kill-switch behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +468 to +475
for r in results:
nid = r.get("id")
if nid in seed_set or nid not in energy:
continue
boost = self.SYNAPSE_BOOST_WEIGHT * energy[nid]
r["score"] = r.get("score", 0.0) + boost
r["_synapse_boost"] = boost
boosted = True
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in eb22466. _apply_synapse_boost now operates on shallow copies of the result dicts, so it never mutates the objects _fetch_search caches — the boost is idempotent and the cached vector scores stay clean. Added a regression test (test_boost_does_not_mutate_cached_results) asserting a repeated call is identical and the cached dict keeps its original score.


Generated by Claude Code

Phase 2 measures the learned_patterns reranker; nothing isolated the
Hebbian synapse layer's effect on retrieval quality. Add Phase 3: reinforce
realistic co-editing sessions, then measure the same query set with synapse
recall off vs on (same warm graph, only NEURALMIND_SYNAPSE_INJECT differs).

On the fixture, recall lifts top-k hit rate 72% -> 83% (+12 points) while
the reduction ratio holds at 6.1x -> 6.1x — associative recall surfaces
co-edited modules a textual search ranks lower, at no token cost.

Two regression gates: recall must never lower hit rate (catches budget-
neutral displacement dropping a relevant hit) and reduction must stay
budget-neutral.

https://claude.ai/code/session_01DRbKLVDX9PNyNdXwuNqTDp
Address two review findings on the synapse retrieval path:

- The boost incremented score on result dicts that _fetch_search caches and
  reuses, so repeated calls compounded the boost and corrupted cached vector
  scores. Operate on shallow copies — the boost is now idempotent and leaves
  the cache clean.
- Pull-in called get_nodes_by_ids unconditionally, which only GraphEmbedder
  implements; an embedder without it would raise AttributeError mid-query.
  Guard with a capability check and degrade to boost-only.

https://claude.ai/code/session_01DRbKLVDX9PNyNdXwuNqTDp
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment on lines +496 to +506
if nid not in present
and not nid.startswith("community_")
and e >= self.SYNAPSE_PULL_IN_MIN_ENERGY
),
key=lambda x: x[1],
reverse=True,
)[: self.SYNAPSE_PULL_IN_MAX]
if not candidates:
return results

# Keep at least one vector hit; only displace as many as we can fetch.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed in eb22466 (this review was generated against the earlier commit). The pull-in now does exactly your option (2): get_nodes_by_ids = getattr(self.embedder, "get_nodes_by_ids", None) and returns boost-only if it isn't callable, so a backend without id lookup degrades gracefully instead of raising. Covered by test_pull_in_degrades_without_id_lookup.


Generated by Claude Code

@dfrostar dfrostar merged commit 98cfc51 into main May 20, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request question Further information is requested

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants