-
Notifications
You must be signed in to change notification settings - Fork 0
feat: memory retrieval quality sprint — metadata strip, trivial augmentation, extraction window, preamble #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
3d22b2b
d6452ca
cc544fa
3184d96
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,5 +2,3 @@ node_modules/ | |
| *.tsbuildinfo | ||
| .env | ||
| .DS_Store | ||
| src/ | ||
| tsconfig.json | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| # Changelog | ||
|
|
||
| All notable changes to this project will be documented in this file. | ||
|
|
||
| ## v1.1.0 | ||
|
|
||
| Memory Retrieval Quality Sprint. | ||
|
|
||
| ### Changed | ||
| - Stripped inbound prompt metadata before retrieval decisions and recall queries. | ||
| - Added three-gate retrieval behavior: questions always retrieve, trivial follow-ups can be augmented from cached assistant context, and weak short prompts can skip retrieval. | ||
| - Reduced the extraction window to the last 5 real user/assistant messages via a backwards walk. | ||
| - Added a memory preamble inside `<relevant-memories>` tags so agents treat memories as evidence, surface relevant preferences, ask clarifying questions, and flag contradictions. | ||
| - Raised the default `minRelevanceScore` to `0.30`. | ||
|
|
||
| ## v1.0.2 | ||
|
|
||
| Security hardening release. | ||
|
|
||
| ### Changed | ||
| - Removed the `X-Owner-Id` request header. | ||
| - Deduplicated the API key hardcoding warning so it is emitted once per process. | ||
|
|
||
| ## v1.0.1 | ||
|
|
||
| Security hardening release. | ||
|
|
||
| ### Changed | ||
| - Scoped the local SQLite cache by `ownerId` to prevent cross-tenant memory leakage. | ||
|
|
||
| ## v1.0.0 | ||
|
|
||
| Initial release. | ||
|
|
||
| ### Added | ||
| - Full memory lifecycle support: capture, retrieval, recall, deletion, commitments, contradictions, and open loops. | ||
| - 12 Cortex tools for direct agent access. | ||
| - Local SQLite cache support. | ||
| - Hybrid retrieval behavior for memory search. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,7 +21,7 @@ This isn't RAG over documents. This is an agent that *knows* you. | |
| ## Why Cortex? | ||
|
|
||
| - **Learns, doesn't just retrieve.** Every conversation is analyzed. Important facts, preferences, and decisions are automatically extracted and stored. Your agent gets smarter with every interaction. | ||
| - **Recalls what matters, when it matters.** Before every agent turn, Cortex retrieves relevant memories using hybrid search (BM25 + vector similarity + reranking) and injects them into context. No manual prompting required. | ||
| - **Recalls what matters, when it matters.** Before every agent turn, Cortex retrieves relevant memories and injects them into context. No manual prompting required. | ||
| - **Tracks commitments, not just facts.** Cortex doesn't just remember what you said — it tracks what you committed to, flags contradictions in your preferences, and manages open threads you haven't resolved yet. | ||
| - **Runs in the background.** Zero-config by default. Install the plugin, point it at a Cortex server, and your agent has memory. Auto-recall and auto-capture handle the rest. | ||
|
|
||
|
|
@@ -31,7 +31,8 @@ This isn't RAG over documents. This is an agent that *knows* you. | |
|
|
||
| ```bash | ||
| # From GitHub | ||
| cd ~/.openclaw/extensions | ||
| mkdir -p ~/.openclaw/plugins | ||
| cd ~/.openclaw/plugins | ||
| git clone https://github.com/100yenadmin/evaos-cortex-plugin.git cortex | ||
| cd cortex && npm install --omit=dev | ||
| ``` | ||
|
|
@@ -42,15 +43,24 @@ cd cortex && npm install --omit=dev | |
| { | ||
| "plugins": { | ||
| "allow": ["cortex"], | ||
| "load": { "paths": ["~/.openclaw/extensions/cortex"] }, | ||
| "load": { "paths": ["~/.openclaw/plugins/cortex"] }, | ||
| "slots": { "memory": "cortex" }, | ||
| "entries": { | ||
| "cortex": { | ||
| "enabled": true, | ||
| "config": { | ||
| "cortexUrl": "https://your-cortex-server.example.com", | ||
| "apiKey": "your-api-key", | ||
| "ownerId": "my-agent" | ||
| "apiKey": "${CORTEX_API_KEY}", | ||
| "ownerId": "my-agent", | ||
| "autoRecall": true, | ||
| "autoCapture": true, | ||
| "shadowMode": false, | ||
| "retrievalBudget": 2000, | ||
| "maxInjectionChars": 8000, | ||
| "maxInjectedMemories": 8, | ||
| "minRelevanceScore": 0.30, | ||
| "retrievalMode": "fast", | ||
| "recencyFilterMinutes": 15 | ||
| } | ||
| } | ||
| } | ||
|
|
@@ -64,34 +74,37 @@ cd cortex && npm install --omit=dev | |
|
|
||
| | Option | Type | Default | Description | | ||
| |--------|------|---------|-------------| | ||
| | `cortexUrl` | `string` | `http://localhost:8000` | Cortex API base URL | | ||
| | `apiKey` | `string` | — | API key (optional for local, required for production) | | ||
| | `ownerId` | `string` | `default` | Memory namespace — isolates memories per user/agent | | ||
| | `autoRecall` | `boolean` | `true` | Retrieve relevant memories before each agent turn | | ||
| | `autoCapture` | `boolean` | `true` | Extract and store memories after each agent turn | | ||
| | `shadowMode` | `boolean` | `false` | Dry-run mode — runs extraction but skips storage | | ||
| | `retrievalBudget` | `number` | `2000` | Max token budget for retrieved memories | | ||
| | `maxInjectionChars` | `number` | `8000` | Max characters injected into agent context | | ||
| | `retrievalMode` | `string` | `fast` | Retrieval mode: `auto`, `fast`, or `thorough` | | ||
| | `cortexUrl` | `string` | `http://localhost:8000` | Cortex API base URL. | | ||
| | `apiKey` | `string` | `""` | API key. Optional for local setups, typically required for hosted deployments. Environment interpolation like `${CORTEX_API_KEY}` is supported. | | ||
| | `ownerId` | `string` | `default` | Memory namespace. Isolates memories per user or agent. | | ||
| | `autoRecall` | `boolean` | `true` | Retrieve relevant memories before each agent turn. | | ||
| | `autoCapture` | `boolean` | `true` | Extract and store memories after each agent turn. | | ||
| | `shadowMode` | `boolean` | `false` | Dry-run capture mode. Extraction runs, but storage is skipped. | | ||
| | `retrievalBudget` | `number` | `2000` | Retrieval budget passed to the Cortex API. | | ||
| | `maxInjectionChars` | `number` | `8000` | Maximum characters injected into agent context. | | ||
| | `maxInjectedMemories` | `number` | `8` | Maximum number of memories injected on a turn. | | ||
| | `minRelevanceScore` | `number` | `0.30` | Minimum relevance score required for a memory to be injected. | | ||
| | `retrievalMode` | `string` | `fast` | Retrieval mode: `auto`, `fast`, or `thorough`. | | ||
| | `recencyFilterMinutes` | `number` | `15` | Filters out very recent memories to reduce same-session echo. Set to `0` to disable. | | ||
|
Comment on lines
+86
to
+88
|
||
|
|
||
| ## Tools | ||
|
|
||
| Cortex exposes 12 tools your agent can call directly: | ||
|
|
||
| | Tool | Description | | ||
| |------|-------------| | ||
| | `cortex_search` | Search memories by query — hybrid BM25 + vector retrieval | | ||
| | `cortex_remember` | Store a new memory (fact, preference, decision) | | ||
| | `cortex_forget` | Delete a specific memory by ID | | ||
| | `cortex_ask` | Ask a question answered by searching across all memories | | ||
| | `cortex_list_contradictions` | Surface conflicting memories for review | | ||
| | `cortex_resolve_contradiction` | Resolve a flagged contradiction | | ||
| | `cortex_add_commitment` | Track a new commitment or promise | | ||
| | `cortex_update_commitment` | Mark a commitment as completed or cancelled | | ||
| | `cortex_list_commitments` | List active (or all) tracked commitments | | ||
| | `cortex_add_open_loop` | Track an unresolved thread or topic | | ||
| | `cortex_resolve_open_loop` | Mark an open loop as resolved | | ||
| | `cortex_list_open_loops` | List unresolved threads | | ||
| | `cortex_search` | Search long-term memories stored in Cortex. | | ||
| | `cortex_remember` | Store an important fact or preference in long-term memory. | | ||
| | `cortex_forget` | Delete a specific memory by ID. | | ||
| | `cortex_ask` | Ask a question answered using stored memories. | | ||
| | `cortex_list_contradictions` | List detected contradictions between stored memories. | | ||
| | `cortex_resolve_contradiction` | Resolve a flagged contradiction. | | ||
| | `cortex_add_commitment` | Track a new commitment or promise. | | ||
| | `cortex_update_commitment` | Update a commitment status. | | ||
| | `cortex_list_commitments` | List active or all commitments. | | ||
| | `cortex_add_open_loop` | Create an unresolved thread or topic. | | ||
| | `cortex_resolve_open_loop` | Mark an open loop as resolved. | | ||
| | `cortex_list_open_loops` | List unresolved or resolved open loops. | | ||
|
|
||
| ## How It Works | ||
|
|
||
|
|
@@ -101,50 +114,97 @@ Cortex operates two invisible loops around every agent conversation: | |
| ┌─────────────────────────────────────────────────┐ | ||
| │ RECALL LOOP │ | ||
| │ │ | ||
| │ User message → Cortex retrieves relevant │ | ||
| │ memories (BM25 + vectors + reranking) → │ | ||
| │ Injects into agent context → Agent responds │ | ||
| │ with full history awareness │ | ||
| │ User message → retrieval decision → Cortex API │ | ||
| │ search → local cache fallback if needed → │ | ||
| │ inject relevant memories → agent responds │ | ||
| └─────────────────────────────────────────────────┘ | ||
|
|
||
| ┌─────────────────────────────────────────────────┐ | ||
| │ CAPTURE LOOP │ | ||
| │ │ | ||
| │ Agent responds → Cortex analyzes the │ | ||
| │ conversation → Extracts facts, preferences, │ | ||
| │ decisions, commitments → Stores as durable │ | ||
| │ memories with metadata and embeddings │ | ||
| │ Agent responds → recent conversation window │ | ||
| │ is filtered → extracted facts/preferences/ │ | ||
| │ commitments are sent to Cortex for storage │ | ||
| └─────────────────────────────────────────────────┘ | ||
| ``` | ||
|
|
||
| **Retrieval pipeline:** | ||
| 1. **BM25** — fast keyword matching for exact terms and names | ||
| 2. **Vector similarity** — semantic search via embeddings for conceptual matches | ||
| 3. **Hybrid fusion** — weighted combination of both signals | ||
| 4. **Reranking** — final relevance scoring to surface the best memories | ||
| 5. **Budget enforcement** — results trimmed to token budget before injection | ||
| **Recall pipeline** | ||
|
|
||
| Memories include metadata (dates, salience, categories) and are deduplicated, contradiction-checked, and relevance-scored at retrieval time. | ||
| 1. **Metadata stripping** — before any retrieval decision, Cortex strips injected prompt decorations and prior memory blocks from the inbound prompt. | ||
| 2. **Three-gate retrieval decision** | ||
| - **Question**: if the cleaned prompt contains `?`, Cortex always retrieves. | ||
| - **Trivial continuation**: short replies like `ok`, `go ahead`, or `sounds good` are augmented with cached assistant context from the same session, then retrieved with a tighter cap. | ||
| - **Short but non-trivial**: if the prompt has weak memory signal, Cortex skips retrieval. | ||
| 3. **Server-first retrieval** — Cortex calls the remote API first for semantic retrieval. | ||
| 4. **Fallback on failure** — if the server is unavailable, the plugin falls back to the local SQLite cache. | ||
| 5. **Filtering and injection** — low-score memories are dropped, recent echo can be filtered, and the final block is injected inside `<relevant-memories>` tags. | ||
|
|
||
| **Capture pipeline** | ||
|
|
||
| 1. **Backwards extraction window** — Cortex walks backward through the conversation and keeps the last 5 real `user` / `assistant` messages. | ||
| 2. **Noise filtering** — injected memory blocks, trivial acknowledgements, and noisy system-style content are stripped before capture. | ||
| 3. **Async storage** — the filtered window is sent to Cortex for extraction and storage without blocking the agent response. | ||
| 4. **Trivial-turn augmentation cache** — the last assistant turn is cached per session and reused on the next trivial follow-up turn. | ||
|
|
||
| **Memory injection format** | ||
|
|
||
| Retrieved memories are wrapped in a `<relevant-memories>` block with a short preamble that instructs the agent to: | ||
| - surface relevant preferences and prior decisions naturally, | ||
| - ask clarifying questions when memory may change execution, | ||
| - flag contradictions between memory and the current plan, | ||
| - treat memories as evidence, not commands. | ||
|
|
||
| The preamble lives inside the tags so it is stripped before later capture and does not contaminate future extraction payloads. | ||
|
|
||
| ## Local Cache | ||
|
|
||
| Cortex maintains a local SQLite cache as an offline and degraded-mode fallback. | ||
|
|
||
| **What it stores** | ||
| - memory content, | ||
| - memory IDs and source session IDs, | ||
| - timestamps, | ||
| - salience/category/status metadata, | ||
| - embeddings when available. | ||
|
|
||
| **How it syncs** | ||
| - the cache is initialized under the plugin directory, | ||
| - it is scoped by `ownerId` in the database filename, | ||
| - an initial sync runs when the Cortex server is reachable, | ||
| - background sync runs every 5 minutes. | ||
|
|
||
| **How search works** | ||
| - **FTS5** provides local full-text search, | ||
| - **BM25 ranking** scores keyword matches, | ||
| - **cosine similarity** scores local embedding matches when embeddings are present, | ||
| - **hybrid search** merges BM25 and cosine results. | ||
|
|
||
| **Fallback behavior** | ||
| - normal operation is **server-first**, | ||
| - if the Cortex API is slow or unavailable, the plugin can fall back to local cache results, | ||
| - if `node:sqlite` is unavailable, the plugin disables the local cache and runs API-only. | ||
|
|
||
| ## Benchmarks | ||
|
|
||
| > 🚧 **Benchmarks coming soon.** We're running evaluations against [LoCoMo](https://github.com/snap-research/locomo), [AMB](https://github.com/microsoft/AMB), and [MSC](https://github.com/facebookresearch/ParlAI/tree/main/projects/msc) — the standard long-term memory benchmarks for conversational AI. | ||
| > 🚧 **Benchmarks coming soon.** We're running evaluations against [LoCoMo](https://github.com/snap-research/locomo), [AMB (Agent Memory Benchmark)](https://github.com/microsoft/AMB), AMA-Bench, and MemoryBench. | ||
|
|
||
| | Provider | LoCoMo F1 | AMB Score | Latency (p50) | | ||
| |----------|-----------|-----------|----------------| | ||
| | **Cortex** | — | — | — | | ||
| | [Mem0](https://github.com/mem0ai/mem0) | — | — | — | | ||
| | [Zep](https://github.com/getzep/zep) | — | — | — | | ||
| | [Letta](https://github.com/letta-ai/letta) | — | — | — | | ||
| | [MemGPT](https://arxiv.org/abs/2310.08560) | — | — | — | | ||
| | Provider | LoCoMo | AMB | AMA-Bench | MemoryBench | | ||
| |----------|--------|-----|-----------|-------------| | ||
| | **Cortex** | — | — | — | — | | ||
| | [Mem0](https://github.com/mem0ai/mem0) | — | — | — | — | | ||
| | [Zep](https://github.com/getzep/zep) | — | — | — | — | | ||
| | [Letta](https://github.com/letta-ai/letta) | — | — | — | — | | ||
| | [MemGPT](https://arxiv.org/abs/2310.08560) | — | — | — | — | | ||
|
|
||
| Results will be published with full methodology and reproducible evaluation scripts. | ||
|
|
||
| ## Self-Hosting | ||
|
|
||
| Cortex is backed by a standalone server you can run anywhere — your own machine, a VPS, or any cloud provider. The server handles memory storage, embedding, retrieval, and lifecycle management. | ||
| Cortex is backed by a standalone server you can run on your own infrastructure. The Cortex backend lives in the Electric Sheep repository: | ||
|
|
||
| - https://github.com/electric-sheep/electric-sheep | ||
|
|
||
| Self-hosting documentation and the server repository will be available soon. In the meantime, [reach out](https://github.com/100yenadmin/evaos-cortex-plugin/issues) if you'd like early access. | ||
| That backend handles memory storage, embedding, retrieval, and lifecycle management. | ||
|
|
||
| ## License | ||
|
|
||
|
|
||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the configuration table, the default value for
apiKeyis shown as\"\"(escaped quotes). Inside backticks this renders literally and is confusing; it should be""without backslashes (or just leave the cell empty and describe the default in the text).