Skip to content

feat: memory retrieval quality sprint — metadata strip, trivial augmentation, extraction window, preamble#2

Open
100yenadmin wants to merge 4 commits into
mainfrom
feat/memory-retrieval-quality-sprint
Open

feat: memory retrieval quality sprint — metadata strip, trivial augmentation, extraction window, preamble#2
100yenadmin wants to merge 4 commits into
mainfrom
feat/memory-retrieval-quality-sprint

Conversation

@100yenadmin
Copy link
Copy Markdown
Member

@100yenadmin 100yenadmin commented Apr 2, 2026

Memory Retrieval Quality Sprint

Phase 1: Retrieval Hygiene (#1807)

  • stripInboundMetadata() strips sender blocks, System (untrusted) lines, thread history, timestamps, relevant-memories from event.prompt before retrieval decisions
  • TRIVIAL_PATTERNS expanded with 30+ acknowledgement terms
  • Retrieval gate restructured: questions always retrieve, trivial queries augment from cached last assistant turn (slice-200 + word boundary), short non-question skips
  • minRelevanceScore raised 0.25→0.30

Phase 2: Extraction Window (#1808)

  • extractMessages() now walks backwards collecting last 5 user/assistant messages
  • Avoids scanning full session history (can be 70MB+)
  • Guarantees 5 real messages regardless of tool call density

Phase 3: Memory Preamble (#1809)

  • MEMORY_PREAMBLE constant (~955 chars) inserted inside <relevant-memories> tags
  • Instructs agent to evaluate memories by category, ask clarifying questions, flag contradictions
  • Counts against maxChars budget. Guard prevents preamble-only injection
  • Auto-stripped on re-capture by extractMessages() regex

Also includes

  • isEnvInterpolation() for API key warning dedup
  • .gitignore: re-track src/ and tsconfig.json (were incorrectly ignored)
  • Schema description updated for 0.30 default

Audit trail

  • 35/35 trace tests passing
  • 3 adversarial audit rounds (2x GPT-5.4, 1x Sonnet)
  • Spec: docs/projects/cortex/spec-memory-retrieval-quality.md

Closes #1807, closes #1808, closes #1809.


Open with Devin

…ntation, extraction window, preamble

Phase 1: Strip inbound metadata (sender blocks, System (untrusted), thread history,
timestamps, relevant-memories) before retrieval decisions. Expand TRIVIAL_PATTERNS
with 30+ acknowledgement terms. Restructure retrieval gate: questions always retrieve,
trivial queries augment from cached last assistant turn, short non-question skips.
Raise minRelevanceScore 0.25→0.30.

Phase 2: Extraction window 10→5 real user/assistant messages via backwards walk.
Avoids scanning full session history (can be 70MB+). Guarantees 5 real messages
regardless of tool call density.

Phase 3: MEMORY_PREAMBLE constant (~955 chars) inserted inside <relevant-memories>
tags. Instructs agent to evaluate memories by category, ask clarifying questions,
flag contradictions. Counts against maxChars budget. Guard prevents preamble-only
injection. Auto-stripped on re-capture by extractMessages() regex.

Also includes pre-sprint fixes:
- isEnvInterpolation() for API key warning dedup
- .gitignore: re-track src/ and tsconfig.json (were incorrectly ignored)
- minRelevanceScore schema description updated to 0.30

Closes #1807, closes #1808, closes #1809.
Refs: spec-memory-retrieval-quality.md
Copilot AI review requested due to automatic review settings April 2, 2026 09:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the Cortex memory plugin’s retrieval hygiene and performance by stripping inbound metadata before retrieval decisions, tightening retrieval heuristics (including trivial-query augmentation), limiting the capture extraction window, and adding a structured memory preamble to injected context.

Changes:

  • Added stripInboundMetadata() and reworked the retrieval gate (questions always retrieve; trivial prompts optionally augment with cached last assistant turn; short non-questions can skip).
  • Optimized capture-side message extraction by walking backwards to collect the last 5 user/assistant messages (avoids scanning large histories).
  • Added a MEMORY_PREAMBLE inside <relevant-memories> and deduped hardcoded API-key warnings; updated defaults and .gitignore.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/index.ts Implements metadata stripping, retrieval gating changes, extraction window optimization, memory preamble injection, API-key warning dedup, and config/schema updates.
.gitignore Stops ignoring src/ and tsconfig.json so they are tracked again.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/index.ts
// Determine retrieval strategy
const hasQuestion = cleanedPrompt.includes("?");
const isTrivial = TRIVIAL_PATTERNS.test(cleanedPrompt.trim());
const isRelevant = cleanedPrompt && isMemoryRelevant(cleanedPrompt);
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isRelevant is assigned using cleanedPrompt && isMemoryRelevant(cleanedPrompt), which yields "" | boolean rather than a boolean. While it works in the current else if (isRelevant) check, it’s easy to trip over later (e.g., if passed somewhere that expects boolean). Consider making this explicitly boolean (e.g., const isRelevant = cleanedPrompt.length > 0 && isMemoryRelevant(cleanedPrompt)).

Suggested change
const isRelevant = cleanedPrompt && isMemoryRelevant(cleanedPrompt);
const isRelevant = cleanedPrompt.length > 0 && isMemoryRelevant(cleanedPrompt);

Copilot uses AI. Check for mistakes.
Comment thread src/index.ts Outdated
- If a memory might change your approach, ask a clarifying question before proceeding.
- If memories contradict the current request, flag the contradiction.
- Never expose raw item_ids, scores, or metadata to the user.
- To look up more context on any memory, use cortex_search with the item_id.`;
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preamble says to “use cortex_search with the item_id”, but formatMemoryContext() only includes the first 8 characters of item.item_id (e.g. [abcd1234]). That truncated ID is unlikely to be sufficient to search/retrieve the specific memory, so this guidance is not actionable. Either include the full item_id in the injected context (still within <relevant-memories>), or adjust the preamble to recommend searching by content/date/category instead of item_id.

Suggested change
- To look up more context on any memory, use cortex_search with the item_id.`;
- To look up more context on any memory, use cortex_search with a semantic query based on its content, date, salience, or category; do not rely on item_id.`;

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 6 potential issues.

Open in Devin Review

Comment thread src/index.ts Outdated
Comment thread src/index.ts
Comment on lines +1491 to +1505
} else if (isTrivial) {
// Trivial patterns: augment with last assistant turn if available
const sessionKey = ctx.sessionKey ?? "";
const lastAssistant = sessionKey ? lastAssistantTurnCache.get(sessionKey) : undefined;
if (lastAssistant) {
// Take last ~200 chars, trim to nearest word boundary on left edge
let tail = lastAssistant.slice(-200).trim();
const spaceIdx = tail.indexOf(" ");
if (spaceIdx > 0 && spaceIdx < 40) tail = tail.slice(spaceIdx + 1);
const lastTwo = tail;
retrievalQuery = lastTwo + " " + cleanedPrompt;
doRetrieve = true;
maxMemories = 3; // Cap injection for trivial queries
}
// If no cached assistant turn, skip retrieval (no signal)
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: lastAssistantTurnCache only populated when autoCapture=true

The lastAssistantTurnCache (line 1459) is read in the before_agent_start handler (line 1494, inside the autoRecall block) but only populated in the agent_end handler (line 1618-1628), which is registered only when cfg.autoCapture is true (src/index.ts:1606). If a user configures autoRecall=true but autoCapture=false, the trivial-query augmentation feature silently degrades — trivial prompts will never find a cached assistant turn, so retrieval is skipped. This isn't a bug (the system degrades gracefully), but the coupling between these two independent config flags is implicit and could surprise operators.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread src/index.ts
Comment on lines +987 to +995
// Walk backwards from end to collect last 5 user/assistant messages
// Avoids filtering the entire session history (can be 30K+ messages / 70MB+)
const tail: Record<string, unknown>[] = [];
for (let i = rawMessages.length - 1; i >= 0 && tail.length < 5; i--) {
const msg = rawMessages[i];
if (!msg || typeof msg !== "object") continue;
const m = msg as Record<string, unknown>;
if (m.role !== "user" && m.role !== "assistant") continue;
if (m.role === "user" || m.role === "assistant") tail.unshift(m);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: extractMessages reduced from 10 to 5 messages affects capture context

The extractMessages function was changed from rawMessages.slice(-10) (last 10 messages of any type, filtered for user/assistant) to walking backwards for the last 5 user/assistant messages specifically. This is an intentional optimization for large sessions (comment references 30K+ messages / 70MB+), but it reduces the conversation context sent to the Cortex remember API. In conversations with many interleaved tool calls, the new code skips further back to find user/assistant messages (good), but the reduced count means less context for memory extraction. For most conversations this is fine, but for complex multi-step workflows the 5-message window might miss important earlier context that the 10-message window would have captured.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread src/index.ts
Comment on lines +1070 to 1072
if (cfg.apiKey && !isEnvInterpolation(rawConfig?.apiKey) && !_apiKeyWarningEmitted) {
_apiKeyWarningEmitted = true;
api.logger.warn("cortex: API key appears to be hardcoded in config. Consider using environment variable: apiKey: '${CORTEX_API_KEY}'");
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: API key warning fix correctly addresses pre-existing bug

The old code at lines 1017-1018 (old) checked !cfg.apiKey.startsWith("${") on the resolved value. Since resolveEnv replaces ${CORTEX_API_KEY} with the actual env var value (e.g., sk-abc123), the old check would incorrectly warn even when the API key was properly configured via environment variable. The new code (src/index.ts:1070) correctly checks the raw config value using isEnvInterpolation(rawConfig?.apiKey), which tests whether the original config string was a ${...} interpolation before resolution. This is a correct fix. The old code also had a duplicate warning (lines 1017 and 1044 in old code); the new code consolidates into a single check with _apiKeyWarningEmitted deduplication.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread src/index.ts
Comment on lines 952 to +954
const lines: string[] = ["<relevant-memories>"];
let charCount = 0;
lines.push(MEMORY_PREAMBLE);
let charCount = MEMORY_PREAMBLE.length;
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: MEMORY_PREAMBLE consumes ~1211 of 8000 char budget

The MEMORY_PREAMBLE is 1,211 characters long, consuming roughly 15% of the default maxInjectionChars budget (8000). The charCount in formatMemoryContext at src/index.ts:958 is initialized to MEMORY_PREAMBLE.length, and the overhead of the <relevant-memories> tag (~20 chars), newlines between elements, the optional footer line, and the closing tag (~21 chars) are not included in charCount. This means the actual injected text slightly exceeds maxInjectionChars by roughly 60-100 characters. With the default 8000 budget this is negligible (<2%), but users who set a tight maxInjectionChars value close to the preamble length would get no memories injected without understanding why.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread src/index.ts

const MEMORY_KEYWORDS = /\b(remember|forgot|recall|last time|previously|before|earlier|you said|you told|we discussed|we decided|my preference|my name|who am i|what do i|do you know|history|past|memory|memorize|don'?t forget)\b/i;
const TRIVIAL_PATTERNS = /^(hi|hello|hey|thanks|ok|yes|no|sure|bye|good morning|good night|👍|😊)[\s!?.]*$/i;
const TRIVIAL_PATTERNS = /^(hi|hello|hey|thanks|thank you|ok|okay|yes|no|sure|bye|good morning|good night|go ahead|sounds good|do it|got it|lets do it|let's do it|lets go|let's go|cool|nice|great|agreed|done|will do|on it|yep|nope|nah|alright|perfect|roger|absolutely|definitely|for sure|ship it|merge it|deploy it|👍|😊|👌|🙏|✅)[\s!?.]*$/i;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Expanded TRIVIAL_PATTERNS includes actionable commands like 'deploy it' and 'ship it'

The TRIVIAL_PATTERNS regex at src/index.ts:812 was expanded to include actionable phrases like "do it", "ship it", "merge it", "deploy it". These patterns are used in both isMemoryRelevant (skips retrieval) and the new isTrivial branch (attempts augmentation with last assistant turn). In the old code, these short prompts without memory keywords would also skip retrieval (via the prompt.length < 40 check in isMemoryRelevant). The new code is marginally better: it at least attempts to augment with the last assistant turn for context. The key observation is that these actionable commands are typically confirmations of something the assistant proposed, so the assistant turn context is the most relevant signal for retrieval.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

- Fixed install path (plugins/ not extensions/)
- Updated config table with all schema options
- Rewrote How It Works for new three-gate retrieval pipeline
- Added Local Cache section (FTS5, hybrid search, fallback)
- Updated benchmarks (LoCoMo, AMB, AMA-Bench, MemoryBench)
- Updated self-hosting to point at backend repo
- Created CHANGELOG.md with v1.0.0 through v1.1.0
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

Open in Devin Review

Comment thread src/index.ts
Comment on lines 1552 to 1558
// Server down — fall back to local cache if available
if (memoryCache) {
try {
const fallbackResults = memoryCache.search(event.prompt!, 20);
const fallbackResults = memoryCache.search(retrievalQuery, 20);
if (fallbackResults.length) {
memoryItems = fallbackResults;
api.logger.info(`cortex: using local cache fallback (${fallbackResults.length} results)`);
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Server HTTP error (non-200) silently skips local cache fallback

When client.retrieve() gets a non-200 HTTP response (e.g. 500), the post() method at src/index.ts:178 returns null — the Promise resolves to null rather than rejecting. In the Promise.allSettled result at src/index.ts:1542, this means retrieveResult.status === "fulfilled" with value === null, so retrieveResult.value?.items?.length is falsy and the items branch is skipped. The else if (retrieveResult.status === "rejected") branch at src/index.ts:1550 also doesn't trigger because the status is "fulfilled". The net effect: a server 500 error silently produces zero memories with no fallback to local cache and no warning logged. Only true network errors (fetch rejection, abort signal) trigger the fallback path. This is a pre-existing issue not introduced by this PR, but the new three-gate retrieval logic routes more queries through this path (e.g. trivial-augmented queries), making it slightly more impactful.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread src/index.ts
Comment on lines +1622 to +1627
lastAssistantTurnCache.set(cacheKey, lastAssistantMsg.content.slice(0, 2000));
// Limit cache entries to prevent memory leak
if (lastAssistantTurnCache.size > 100) {
const firstKey = lastAssistantTurnCache.keys().next().value;
if (firstKey) lastAssistantTurnCache.delete(firstKey);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: lastAssistantTurnCache eviction is FIFO not LRU — oldest inserted session evicted even if recently active

The eviction logic at src/index.ts:1624-1627 deletes the first key from the Map when size exceeds 100. JavaScript Maps iterate in insertion order, and Map.set() on an existing key updates the value but does NOT change its insertion order. So if session A was inserted first and is still very active (being .set() on every turn), it remains in its original insertion position and would be evicted first once the cache reaches 101 entries, despite being the most recently updated. For true LRU behavior, the code would need to delete then set on each update to move the key to the end. With 100 slots this is unlikely to cause visible problems, but it's worth noting if the cache size is ever reduced.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Preamble v2 (1,150 chars):
- Branded as 'Cortex memory system'
- Anti-poisoning: 'verify against observed behavior' replaces 'trust as stable'
- Memory-vs-memory contradiction: 'prefer the more recent one or ask'
- LTM/LCM hierarchy: 'current conversation takes priority'
- Cross-session framing: 'signposts' not 'primary source'
- Score interpretation: '>70% strong match, 30-50% tangential'
- Category guidance split: identity separate from preferences/decisions
- Stale goals: 'past deadlines may mean completed or abandoned'

Injection format changes:
- Score percentage added: [78%] as first field
- Item ID moved to end in braces: {49f39413}
- Footer added: 'N of M memories shown — use cortex_search for more'
- Score, date, salience/category, content, {id} ordering

All 35 trace tests pass. Strip regex handles new format cleanly.
Copilot AI review requested due to automatic review settings April 2, 2026 10:36
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

Open in Devin Review

Comment thread src/index.ts
Comment on lines +989 to +990
if (items.length > injectedCount) {
lines.push(`[${injectedCount} of ${items.length} memories shown — use cortex_search for more]`);
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Footer memory count includes score-filtered items, producing misleading agent guidance

The footer in formatMemoryContext compares items.length (the full unfiltered input) against injectedCount (memories that passed score filter + char budget). When some items are below minRelevanceScore, the footer still shows them in the total — e.g. "3 of 10 memories shown — use cortex_search for more" even when all 3 above-threshold memories were injected. This misleads the agent into thinking there are 7 more useful memories to retrieve, potentially wasting a tool call on cortex_search only to find below-threshold results. The comparison should use relevant.length (the score-filtered array at src/index.ts:951-953) instead of items.length.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread src/index.ts
Comment on lines +1495 to +1505
const hasQuestion = cleanedPrompt.includes("?");
const isTrivial = TRIVIAL_PATTERNS.test(cleanedPrompt.trim());
const isRelevant = cleanedPrompt && isMemoryRelevant(cleanedPrompt);

let retrievalQuery = cleanedPrompt;
let maxMemories = cfg.maxInjectedMemories; // default (8)
let doRetrieve = false;

if (hasQuestion) {
// Questions always retrieve
doRetrieve = true;
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Question mark gate is very broad — URLs and code in prompts trigger full retrieval

The hasQuestion gate at src/index.ts:1495 uses cleanedPrompt.includes('?') which matches any ? character, including those inside URLs (e.g. https://example.com?key=val), code snippets, or escaped characters. Since hasQuestion takes priority over the trivial and relevance gates, prompts containing URLs with query strings would always trigger full memory retrieval with the maximum memory count. This is unlikely to cause harm (extra retrieval is just a small latency cost), but it's worth noting as a heuristic gap.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/index.ts:953

  • formatMemoryContext() filters injected items using (item.score ?? 1.0) >= minScore, which assumes scores are normalized where higher is better. Local-cache fallback results use FTS rank / BM25-style values (lower-is-better and often not in 0–1), so the current filter can drop all fallback items and effectively disable injection when the server is down. Consider normalizing local-cache scores to a 0–1 relevance score before they reach this function, or making the score filter conditional on the score semantics/source (e.g., skip minScore filtering for local_cache* sources or invert/normalize those scores).
  // Filter by relevance score and cap count
  const relevant = items
    .filter(item => (item.score ?? 1.0) >= minScore)
    .slice(0, maxCount);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/index.ts
Comment on lines 956 to 960
const lines: string[] = ["<relevant-memories>"];
let charCount = 0;
lines.push(MEMORY_PREAMBLE);
let charCount = MEMORY_PREAMBLE.length;
let injectedCount = 0;

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maxChars budget accounting starts at MEMORY_PREAMBLE.length and then adds line.length, but it does not include the <relevant-memories> header line or newline separators between lines. With the new ~1KB preamble this can overshoot the intended maxInjectionChars budget by a non-trivial amount. Consider including the header + newline characters in charCount (or compute the final string length before returning).

Copilot uses AI. Check for mistakes.
Comment thread README.md
Comment on lines +77 to +79
| `cortexUrl` | `string` | `http://localhost:8000` | Cortex API base URL. |
| `apiKey` | `string` | `""` | API key. Optional for local setups, typically required for hosted deployments. Environment interpolation like `${CORTEX_API_KEY}` is supported. |
| `ownerId` | `string` | `default` | Memory namespace. Isolates memories per user or agent. |
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the configuration table, the default value for apiKey is shown as \"\" (escaped quotes). Inside backticks this renders literally and is confusing; it should be "" without backslashes (or just leave the cell empty and describe the default in the text).

Copilot uses AI. Check for mistakes.
Comment thread README.md
Comment on lines +86 to +88
| `minRelevanceScore` | `number` | `0.30` | Minimum relevance score required for a memory to be injected. |
| `retrievalMode` | `string` | `fast` | Retrieval mode: `auto`, `fast`, or `thorough`. |
| `recencyFilterMinutes` | `number` | `15` | Filters out very recent memories to reduce same-session echo. Set to `0` to disable. |
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README lists retrievalMode default as fast, but the plugin's configSchema.jsonSchema description still says "default: auto" (see src/index.ts around the schema definitions). Please align the schema description and README (either update the schema description to fast or change the actual default).

Copilot uses AI. Check for mistakes.
Single-word question follow-ups (why?, how?, what?, when?, where?, who?,
huh?, really?, seriously?) now use the cached last-assistant-turn augmented
path instead of naked full retrieval. This gives Voyage actual semantic
content to match against instead of a single word with near-zero embedding
information.

Behavior:
- BARE_QUESTION_FOLLOWUP regex checked BEFORE hasQuestion gate
- Augments with last 200 chars of cached assistant turn
- Caps results at 3 (same as trivial path)
- Falls through to full retrieval if no cached turn available

Tested: Q2 retest showed bare 'why?' returning irrelevant 49% matches
on naked retrieval. Augmented path would have used prior answer context.
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

Open in Devin Review

Comment thread src/index.ts
Comment on lines +993 to 996
// Footer: show retrieval count if more were available than injected
if (items.length > injectedCount) {
lines.push(`[${injectedCount} of ${items.length} memories shown — use cortex_search for more]`);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Footer count compares raw items.length with post-filter injectedCount

In formatMemoryContext (src/index.ts:994), the footer compares items.length (the total items passed in, before minScore filtering and maxCount cap) with injectedCount (items that actually passed score filter, count cap, AND char budget). This means the footer could say "3 of 10 memories shown" where 7 of those 10 were below minScore and would never have been shown. The message "use cortex_search for more" is still valid (the agent can search for more), but the count could be misleading about how many additional relevant memories exist. Consider comparing against relevant.length instead of items.length for a more accurate count.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread src/index.ts
Comment on lines +1508 to +1520
if (BARE_QUESTION_FOLLOWUP.test(cleanedPrompt.trim())) {
// Bare question-word follow-ups: augment with cached assistant turn
const sessionKey = ctx.sessionKey ?? "";
const lastAssistant = sessionKey ? lastAssistantTurnCache.get(sessionKey) : undefined;
if (lastAssistant) {
let tail = lastAssistant.slice(-200).trim();
const spaceIdx = tail.indexOf(" ");
if (spaceIdx > 0 && spaceIdx < 40) tail = tail.slice(spaceIdx + 1);
retrievalQuery = tail + " " + cleanedPrompt;
doRetrieve = true;
maxMemories = 3;
}
// If no cached assistant turn, skip retrieval (no signal)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Three-gate retrieval: bare question follow-ups skip retrieval without cached context

The BARE_QUESTION_FOLLOWUP pattern (src/index.ts:1508) catches inputs like "why?", "what?", "how?" — which contain ? and would otherwise trigger the hasQuestion branch for full retrieval. Because BARE_QUESTION_FOLLOWUP is checked first, these inputs only retrieve if there's a cached assistant turn from the previous conversation turn. On the first turn of a session (no cache), bare questions silently skip retrieval. This matches the old behavior (the old isMemoryRelevant returned false for short prompts without memory keywords), but could be surprising for users who expect "what?" to trigger memory lookup. The design is intentional per the CHANGELOG — bare question words have near-zero embedding value when searched alone.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants