Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,3 @@ node_modules/
*.tsbuildinfo
.env
.DS_Store
src/
tsconfig.json
39 changes: 39 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Changelog

All notable changes to this project will be documented in this file.

## v1.1.0

Memory Retrieval Quality Sprint.

### Changed
- Stripped inbound prompt metadata before retrieval decisions and recall queries.
- Added three-gate retrieval behavior: questions always retrieve, trivial follow-ups can be augmented from cached assistant context, and weak short prompts can skip retrieval.
- Reduced the extraction window to the last 5 real user/assistant messages via a backwards walk.
- Added a memory preamble inside `<relevant-memories>` tags so agents treat memories as evidence, surface relevant preferences, ask clarifying questions, and flag contradictions.
- Raised the default `minRelevanceScore` to `0.30`.

## v1.0.2

Security hardening release.

### Changed
- Removed the `X-Owner-Id` request header.
- Deduplicated the API key hardcoding warning so it is emitted once per process.

## v1.0.1

Security hardening release.

### Changed
- Scoped the local SQLite cache by `ownerId` to prevent cross-tenant memory leakage.

## v1.0.0

Initial release.

### Added
- Full memory lifecycle support: capture, retrieval, recall, deletion, commitments, contradictions, and open loops.
- 12 Cortex tools for direct agent access.
- Local SQLite cache support.
- Hybrid retrieval behavior for memory search.
162 changes: 111 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ This isn't RAG over documents. This is an agent that *knows* you.
## Why Cortex?

- **Learns, doesn't just retrieve.** Every conversation is analyzed. Important facts, preferences, and decisions are automatically extracted and stored. Your agent gets smarter with every interaction.
- **Recalls what matters, when it matters.** Before every agent turn, Cortex retrieves relevant memories using hybrid search (BM25 + vector similarity + reranking) and injects them into context. No manual prompting required.
- **Recalls what matters, when it matters.** Before every agent turn, Cortex retrieves relevant memories and injects them into context. No manual prompting required.
- **Tracks commitments, not just facts.** Cortex doesn't just remember what you said — it tracks what you committed to, flags contradictions in your preferences, and manages open threads you haven't resolved yet.
- **Runs in the background.** Zero-config by default. Install the plugin, point it at a Cortex server, and your agent has memory. Auto-recall and auto-capture handle the rest.

Expand All @@ -31,7 +31,8 @@ This isn't RAG over documents. This is an agent that *knows* you.

```bash
# From GitHub
cd ~/.openclaw/extensions
mkdir -p ~/.openclaw/plugins
cd ~/.openclaw/plugins
git clone https://github.com/100yenadmin/evaos-cortex-plugin.git cortex
cd cortex && npm install --omit=dev
```
Expand All @@ -42,15 +43,24 @@ cd cortex && npm install --omit=dev
{
"plugins": {
"allow": ["cortex"],
"load": { "paths": ["~/.openclaw/extensions/cortex"] },
"load": { "paths": ["~/.openclaw/plugins/cortex"] },
"slots": { "memory": "cortex" },
"entries": {
"cortex": {
"enabled": true,
"config": {
"cortexUrl": "https://your-cortex-server.example.com",
"apiKey": "your-api-key",
"ownerId": "my-agent"
"apiKey": "${CORTEX_API_KEY}",
"ownerId": "my-agent",
"autoRecall": true,
"autoCapture": true,
"shadowMode": false,
"retrievalBudget": 2000,
"maxInjectionChars": 8000,
"maxInjectedMemories": 8,
"minRelevanceScore": 0.30,
"retrievalMode": "fast",
"recencyFilterMinutes": 15
}
}
}
Expand All @@ -64,34 +74,37 @@ cd cortex && npm install --omit=dev

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `cortexUrl` | `string` | `http://localhost:8000` | Cortex API base URL |
| `apiKey` | `string` | — | API key (optional for local, required for production) |
| `ownerId` | `string` | `default` | Memory namespace — isolates memories per user/agent |
| `autoRecall` | `boolean` | `true` | Retrieve relevant memories before each agent turn |
| `autoCapture` | `boolean` | `true` | Extract and store memories after each agent turn |
| `shadowMode` | `boolean` | `false` | Dry-run mode — runs extraction but skips storage |
| `retrievalBudget` | `number` | `2000` | Max token budget for retrieved memories |
| `maxInjectionChars` | `number` | `8000` | Max characters injected into agent context |
| `retrievalMode` | `string` | `fast` | Retrieval mode: `auto`, `fast`, or `thorough` |
| `cortexUrl` | `string` | `http://localhost:8000` | Cortex API base URL. |
| `apiKey` | `string` | `""` | API key. Optional for local setups, typically required for hosted deployments. Environment interpolation like `${CORTEX_API_KEY}` is supported. |
| `ownerId` | `string` | `default` | Memory namespace. Isolates memories per user or agent. |
Comment on lines +77 to +79
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the configuration table, the default value for apiKey is shown as \"\" (escaped quotes). Inside backticks this renders literally and is confusing; it should be "" without backslashes (or just leave the cell empty and describe the default in the text).

Copilot uses AI. Check for mistakes.
| `autoRecall` | `boolean` | `true` | Retrieve relevant memories before each agent turn. |
| `autoCapture` | `boolean` | `true` | Extract and store memories after each agent turn. |
| `shadowMode` | `boolean` | `false` | Dry-run capture mode. Extraction runs, but storage is skipped. |
| `retrievalBudget` | `number` | `2000` | Retrieval budget passed to the Cortex API. |
| `maxInjectionChars` | `number` | `8000` | Maximum characters injected into agent context. |
| `maxInjectedMemories` | `number` | `8` | Maximum number of memories injected on a turn. |
| `minRelevanceScore` | `number` | `0.30` | Minimum relevance score required for a memory to be injected. |
| `retrievalMode` | `string` | `fast` | Retrieval mode: `auto`, `fast`, or `thorough`. |
| `recencyFilterMinutes` | `number` | `15` | Filters out very recent memories to reduce same-session echo. Set to `0` to disable. |
Comment on lines +86 to +88
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README lists retrievalMode default as fast, but the plugin's configSchema.jsonSchema description still says "default: auto" (see src/index.ts around the schema definitions). Please align the schema description and README (either update the schema description to fast or change the actual default).

Copilot uses AI. Check for mistakes.

## Tools

Cortex exposes 12 tools your agent can call directly:

| Tool | Description |
|------|-------------|
| `cortex_search` | Search memories by query — hybrid BM25 + vector retrieval |
| `cortex_remember` | Store a new memory (fact, preference, decision) |
| `cortex_forget` | Delete a specific memory by ID |
| `cortex_ask` | Ask a question answered by searching across all memories |
| `cortex_list_contradictions` | Surface conflicting memories for review |
| `cortex_resolve_contradiction` | Resolve a flagged contradiction |
| `cortex_add_commitment` | Track a new commitment or promise |
| `cortex_update_commitment` | Mark a commitment as completed or cancelled |
| `cortex_list_commitments` | List active (or all) tracked commitments |
| `cortex_add_open_loop` | Track an unresolved thread or topic |
| `cortex_resolve_open_loop` | Mark an open loop as resolved |
| `cortex_list_open_loops` | List unresolved threads |
| `cortex_search` | Search long-term memories stored in Cortex. |
| `cortex_remember` | Store an important fact or preference in long-term memory. |
| `cortex_forget` | Delete a specific memory by ID. |
| `cortex_ask` | Ask a question answered using stored memories. |
| `cortex_list_contradictions` | List detected contradictions between stored memories. |
| `cortex_resolve_contradiction` | Resolve a flagged contradiction. |
| `cortex_add_commitment` | Track a new commitment or promise. |
| `cortex_update_commitment` | Update a commitment status. |
| `cortex_list_commitments` | List active or all commitments. |
| `cortex_add_open_loop` | Create an unresolved thread or topic. |
| `cortex_resolve_open_loop` | Mark an open loop as resolved. |
| `cortex_list_open_loops` | List unresolved or resolved open loops. |

## How It Works

Expand All @@ -101,50 +114,97 @@ Cortex operates two invisible loops around every agent conversation:
┌─────────────────────────────────────────────────┐
│ RECALL LOOP │
│ │
│ User message → Cortex retrieves relevant │
│ memories (BM25 + vectors + reranking) → │
│ Injects into agent context → Agent responds │
│ with full history awareness │
│ User message → retrieval decision → Cortex API │
│ search → local cache fallback if needed → │
│ inject relevant memories → agent responds │
└─────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────┐
│ CAPTURE LOOP │
│ │
│ Agent responds → Cortex analyzes the │
│ conversation → Extracts facts, preferences, │
│ decisions, commitments → Stores as durable │
│ memories with metadata and embeddings │
│ Agent responds → recent conversation window │
│ is filtered → extracted facts/preferences/ │
│ commitments are sent to Cortex for storage │
└─────────────────────────────────────────────────┘
```

**Retrieval pipeline:**
1. **BM25** — fast keyword matching for exact terms and names
2. **Vector similarity** — semantic search via embeddings for conceptual matches
3. **Hybrid fusion** — weighted combination of both signals
4. **Reranking** — final relevance scoring to surface the best memories
5. **Budget enforcement** — results trimmed to token budget before injection
**Recall pipeline**

Memories include metadata (dates, salience, categories) and are deduplicated, contradiction-checked, and relevance-scored at retrieval time.
1. **Metadata stripping** — before any retrieval decision, Cortex strips injected prompt decorations and prior memory blocks from the inbound prompt.
2. **Three-gate retrieval decision**
- **Question**: if the cleaned prompt contains `?`, Cortex always retrieves.
- **Trivial continuation**: short replies like `ok`, `go ahead`, or `sounds good` are augmented with cached assistant context from the same session, then retrieved with a tighter cap.
- **Short but non-trivial**: if the prompt has weak memory signal, Cortex skips retrieval.
3. **Server-first retrieval** — Cortex calls the remote API first for semantic retrieval.
4. **Fallback on failure** — if the server is unavailable, the plugin falls back to the local SQLite cache.
5. **Filtering and injection** — low-score memories are dropped, recent echo can be filtered, and the final block is injected inside `<relevant-memories>` tags.

**Capture pipeline**

1. **Backwards extraction window** — Cortex walks backward through the conversation and keeps the last 5 real `user` / `assistant` messages.
2. **Noise filtering** — injected memory blocks, trivial acknowledgements, and noisy system-style content are stripped before capture.
3. **Async storage** — the filtered window is sent to Cortex for extraction and storage without blocking the agent response.
4. **Trivial-turn augmentation cache** — the last assistant turn is cached per session and reused on the next trivial follow-up turn.

**Memory injection format**

Retrieved memories are wrapped in a `<relevant-memories>` block with a short preamble that instructs the agent to:
- surface relevant preferences and prior decisions naturally,
- ask clarifying questions when memory may change execution,
- flag contradictions between memory and the current plan,
- treat memories as evidence, not commands.

The preamble lives inside the tags so it is stripped before later capture and does not contaminate future extraction payloads.

## Local Cache

Cortex maintains a local SQLite cache as an offline and degraded-mode fallback.

**What it stores**
- memory content,
- memory IDs and source session IDs,
- timestamps,
- salience/category/status metadata,
- embeddings when available.

**How it syncs**
- the cache is initialized under the plugin directory,
- it is scoped by `ownerId` in the database filename,
- an initial sync runs when the Cortex server is reachable,
- background sync runs every 5 minutes.

**How search works**
- **FTS5** provides local full-text search,
- **BM25 ranking** scores keyword matches,
- **cosine similarity** scores local embedding matches when embeddings are present,
- **hybrid search** merges BM25 and cosine results.

**Fallback behavior**
- normal operation is **server-first**,
- if the Cortex API is slow or unavailable, the plugin can fall back to local cache results,
- if `node:sqlite` is unavailable, the plugin disables the local cache and runs API-only.

## Benchmarks

> 🚧 **Benchmarks coming soon.** We're running evaluations against [LoCoMo](https://github.com/snap-research/locomo), [AMB](https://github.com/microsoft/AMB), and [MSC](https://github.com/facebookresearch/ParlAI/tree/main/projects/msc) — the standard long-term memory benchmarks for conversational AI.
> 🚧 **Benchmarks coming soon.** We're running evaluations against [LoCoMo](https://github.com/snap-research/locomo), [AMB (Agent Memory Benchmark)](https://github.com/microsoft/AMB), AMA-Bench, and MemoryBench.

| Provider | LoCoMo F1 | AMB Score | Latency (p50) |
|----------|-----------|-----------|----------------|
| **Cortex** | — | — | — |
| [Mem0](https://github.com/mem0ai/mem0) | — | — | — |
| [Zep](https://github.com/getzep/zep) | — | — | — |
| [Letta](https://github.com/letta-ai/letta) | — | — | — |
| [MemGPT](https://arxiv.org/abs/2310.08560) | — | — | — |
| Provider | LoCoMo | AMB | AMA-Bench | MemoryBench |
|----------|--------|-----|-----------|-------------|
| **Cortex** | — | — | — | — |
| [Mem0](https://github.com/mem0ai/mem0) | — | — | — | — |
| [Zep](https://github.com/getzep/zep) | — | — | — | — |
| [Letta](https://github.com/letta-ai/letta) | — | — | — | — |
| [MemGPT](https://arxiv.org/abs/2310.08560) | — | — | — | — |

Results will be published with full methodology and reproducible evaluation scripts.

## Self-Hosting

Cortex is backed by a standalone server you can run anywhere — your own machine, a VPS, or any cloud provider. The server handles memory storage, embedding, retrieval, and lifecycle management.
Cortex is backed by a standalone server you can run on your own infrastructure. The Cortex backend lives in the Electric Sheep repository:

- https://github.com/electric-sheep/electric-sheep

Self-hosting documentation and the server repository will be available soon. In the meantime, [reach out](https://github.com/100yenadmin/evaos-cortex-plugin/issues) if you'd like early access.
That backend handles memory storage, embedding, retrieval, and lifecycle management.

## License

Expand Down
2 changes: 1 addition & 1 deletion dist/index.d.ts.map

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading