Skip to content

Commit 0d58152

Browse files
committed
docs: update all documentation for v0.2.1 adaptive noise filtering
- copilot-instructions.md: add pre-Haiku gates, rejection store, new config fields, new thresholds, gotchas #11 and #12 - CLAUDE.md: expand adaptive quality learning with rejection store details - README.md: update knowledge capture section, add pipeline config, roadmap - docs/INTERNALS.md: add pre-Haiku gate steps to section 8.2, new thresholds - docs/INGEST_PIPELINE.md: add pre-LLM gates and adaptive learning to comparison - website write-path: add 4 new pre-store filtering stages, adaptive learning - website quality-loop: add adaptive noise filtering section - website architecture: add content scoring, top-K, feedback loop design - website configuration: add pipeline config (ingest_min_len, content_score_pre_gate) - website intro: mention adaptive filtering in knowledge capture
1 parent 666332e commit 0d58152

13 files changed

Lines changed: 340 additions & 21 deletions

.github/copilot-instructions.md

Lines changed: 37 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,8 @@ internal/
4141
atlas.go Atlas-proper implementation — hybrid search, RRF, MMR re-ranking
4242
chunker/ Paragraph-boundary text splitting (~512 token chunks)
4343
redact/ Security scrubbing (AWS keys, API tokens, passwords, PII)
44-
quality/ Adaptive learning — tracks retrieval hits, learning threshold
44+
quality/ Adaptive learning — tracks retrieval hits, content scoring, noise prototypes
45+
rejection/ Ring-buffer of rejected exchanges, adaptive noise learning
4546
steward/ Background quality maintenance (scoring, pruning, merging)
4647
ingest/ Source ingestion: crawl → chunk → batch-embed → store
4748
crawler/ BFS web crawler with SHA256 change detection
@@ -92,13 +93,24 @@ if hs, ok := rp.store.(store.HybridSearcher); ok {
9293

9394
**Write path** (every response):
9495
1. Response buffered from SSE stream
95-
2. Text chunked at paragraph boundaries (~512 tokens)
96-
3. Noise filtered (< 20 chars, < 40% alphanumeric)
97-
4. Secrets redacted before embedding
98-
5. All chunks batch-embedded in single HTTP call to llama.cpp
99-
6. Each chunk dedup-checked against store (cosine ≥ 0.92 = skip)
100-
7. Similar-to-source chunks tagged as extensions (cosine ≥ 0.75)
101-
8. Stored. All of this runs async in a goroutine — zero latency to Claude.
96+
2. **Pre-Haiku gates** (before any LLM call):
97+
a. `QuickFilter` — pure string heuristic rejects procedural exchanges
98+
b. Length gate — responses < `ingest_min_len` (default 80 chars) skipped
99+
c. Content score gate — raw text embedded & scored against noise prototypes; below `content_score_pre_gate` (default 0.35) → skipped
100+
3. LLM synthesis gate (`SynthesizeQA`) — Haiku distills or returns "SKIP"
101+
4. Text chunked at paragraph boundaries (~512 tokens)
102+
5. Noise filtered (< 20 chars, < 40% alphanumeric)
103+
6. Secrets redacted before embedding
104+
7. All chunks batch-embedded in single HTTP call to llama.cpp
105+
8. Each chunk dedup-checked against store (cosine ≥ 0.92 = skip)
106+
9. Similar-to-source chunks tagged as extensions (cosine ≥ 0.75)
107+
10. Stored. All of this runs async in a goroutine — zero latency to Claude.
108+
109+
**Rejection store** (adaptive noise learning):
110+
- Exchanges rejected by QuickFilter or synthesizer are logged to a ring buffer (500 entries)
111+
- Every 25 rejections, assistant texts are re-embedded as noise prototypes
112+
- Hot-swapped into the ContentScorer — the system learns what noise looks like
113+
- Persisted as JSONL at `~/.memoryd/rejection_log.jsonl`
102114

103115
**Steward** (hourly background sweep):
104116
1. Score memories: `log2(hit_count + 1) / log2(maxHits + 1) × 0.5^(timeSinceRetrieval / 7d)`
@@ -174,6 +186,12 @@ type Embedder interface {
174186
| `RetrievalTopK` | 5 | config default | Memories per search |
175187
| `RetrievalMaxTokens` | 2048 | config default | Context budget for injection |
176188
| `QualityLearningThreshold` | 50 | quality/ | Retrievals before quality filtering activates |
189+
| `IngestMinLen` | 80 | config/PipelineConfig | Responses shorter than this skip Haiku entirely |
190+
| `ContentScorePreGate` | 0.35 | config/PipelineConfig | Pre-Haiku noise gate: below this → skip |
191+
| `noiseTopK` | 3 | quality/content.go | Top-K noise prototypes used in scoring (prevents dilution) |
192+
| `maxRejectionProtos` | 150 | quality/content.go | Max rejection texts used as noise prototypes |
193+
| `RebuildEvery` | 25 | rejection/store.go | Rejections between scorer rebuilds |
194+
| `DefaultMaxSize` | 500 | rejection/store.go | Ring buffer capacity |
177195
| `PruneThreshold` | 0.1 | steward config | Score below which memories get pruned |
178196
| `PruneGracePeriod` | 24h | steward config | Minimum age before pruning eligible |
179197
| `DecayHalfLife` | 90d | steward config | Unretrieved memory score half-life |
@@ -204,6 +222,10 @@ steward:
204222
decay_half_days: 90
205223
merge_threshold: 0.88
206224
batch_size: 500
225+
226+
pipeline:
227+
ingest_min_len: 80 # responses < this skip Haiku entirely
228+
content_score_pre_gate: 0.35 # pre-Haiku noise gate threshold
207229
```
208230
209231
---
@@ -324,6 +346,10 @@ export ANTHROPIC_BASE_URL=http://127.0.0.1:7432 # point Claude Code at it
324346

325347
10. **Graceful shutdown.** The daemon catches SIGINT/SIGTERM, stops the steward, stops the HTTP server, then cancels context. The order matters.
326348

349+
11. **Content score pre-gate does NOT feed rejection store.** Exchanges filtered by the content score gate are NOT added to the rejection store — only QuickFilter and synthesizer rejections feed back. This prevents a positive feedback loop where the scorer would amplify its own noise signal.
350+
351+
12. **Top-K noise scoring, not averaging.** The ContentScorer uses the top-3 most similar noise prototypes, not the average of all. When the rejection store grows to 150+ entries, averaging would converge to a constant, destroying discriminative power.
352+
327353
---
328354

329355
## Memory Data Model
@@ -370,3 +396,6 @@ Additional collections: `retrieval_events`, `sources`, `source_pages`
370396
- Write path changes go in `pipeline/write.go`
371397
- Context formatting in `pipeline/inject.go`
372398
- Keep write path async — never block the response to Claude
399+
- Pre-Haiku gate changes go in `proxy/anthropic.go` and `proxy/api.go`
400+
- Rejection store logic is in `rejection/store.go`
401+
- Content scoring prototypes and noise learning are in `quality/content.go`

CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,3 +58,5 @@ When you store a new memory that's similar (but not identical) to an existing so
5858
## Adaptive quality learning
5959

6060
The system tracks which memories get retrieved and how often. While in "learning mode" (< 50 retrieval events), it keeps everything. Use `quality_stats` to check the current learning status. Over time, memories that are never retrieved will score lower, helping the system learn what's worth keeping.
61+
62+
The system also learns what **noise** looks like. Exchanges rejected by the pre-filter or synthesizer are accumulated in a ring buffer. Every 25 rejections, the assistant texts are re-embedded as noise prototypes and hot-swapped into the content scorer. This means the system adapts to your team's specific noise patterns — the more it sees procedural chatter, the better it gets at filtering it before spending an LLM call.

README.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ See the **[Getting Started](https://memory-daemon.github.io/memoryd/getting-star
7070

7171
### [Knowledge Capture](https://memory-daemon.github.io/memoryd/how-it-works/write-path)
7272

73-
Every AI response is captured asynchronously (zero latency impact), broken into meaningful pieces, scrubbed of secrets (API keys, tokens, passwords — 13 detection patterns), deduplicated, and stored in the shared database.
73+
Every AI response is captured asynchronously (zero latency impact), passed through a multi-stage quality filter (length gate, adaptive content scoring, LLM synthesis gate), scrubbed of secrets (API keys, tokens, passwords — 13 detection patterns), deduplicated, and stored in the shared database. The system learns what noise looks like from rejected exchanges, improving filtering accuracy over time.
7474

7575
### [Context Retrieval](https://memory-daemon.github.io/memoryd/how-it-works/read-path)
7676

@@ -112,7 +112,8 @@ internal/
112112
mongo.go MongoDB implementation
113113
atlas.go Atlas hybrid search (vector + text + RRF + MMR)
114114
redact/ Secret scrubbing (13 patterns)
115-
quality/ Usage tracking and quality scoring
115+
quality/ Usage tracking, content scoring, adaptive noise learning
116+
rejection/ Rejection store — ring buffer for adaptive noise prototype learning
116117
steward/ Background maintenance (score → prune → merge)
117118
ingest/ Source ingestion and change detection
118119
crawler/ Web crawler with change detection
@@ -152,6 +153,10 @@ steward:
152153
prune_threshold: 0.1
153154
merge_threshold: 0.88
154155
decay_half_days: 90
156+
157+
pipeline:
158+
ingest_min_len: 80 # Skip short responses before LLM call
159+
content_score_pre_gate: 0.35 # Adaptive noise score threshold
155160
```
156161
157162
See the full **[Configuration Reference](https://memory-daemon.github.io/memoryd/configuration)**.
@@ -181,6 +186,7 @@ cd website && npm start
181186
- [x] Quality maintenance (scoring, pruning, merging)
182187
- [x] Atlas hybrid search (vector + text + RRF + MMR)
183188
- [x] Secret scrubbing (13 detection patterns)
189+
- [x] Adaptive noise filtering (pre-Haiku gates, rejection-based learning)
184190
- [x] Documentation site
185191
- [x] macOS menu bar app
186192
- [ ] Team-scoped knowledge (overlapping layers per team/BU)

docs/INGEST_PIPELINE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -647,6 +647,8 @@ By redacting before embedding, the vector captures the semantic meaning of the s
647647
| **Change detection** | SHA-256 per page/file | Not applicable (each response is new) |
648648
| **Redaction** | Yes — `redact.Clean()` per section | Yes — `redact.Clean()` per chunk |
649649
| **Noise filtering** | Drop sections < 30 chars | Drop chunks < 20 chars or < 40% alphanumeric |
650+
| **Pre-LLM gates** | None — ingested content is assumed worth embedding | 3-stage: QuickFilter → length gate (< 80 chars) → content score gate (< 0.35) |
651+
| **Adaptive learning** | None | Rejection store feeds noise prototypes back into content scorer |
650652
| **Embedding** | Batch per page (all sections in one call) | Single or batch per response |
651653
| **Execution** | Async goroutine, 30-min timeout | Async goroutine, fire-and-forget |
652654

docs/INTERNALS.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -880,10 +880,12 @@ When LLM synthesis is enabled, the proxy does more than raw capture:
880880

881881
**Per-exchange (`ingest()`):**
882882
1. Extract the last user message from the request
883-
2. **Pre-filter:** `QuickFilter()` checks if both user and assistant messages are procedural → reject immediately
884-
3. **LLM quality gate:** `SynthesizeQA()` asks the model to distill or return `"SKIP"` → reject if no durable value
885-
4. **Store:** Distilled entry goes through `ProcessDirect()` (no chunking, already formatted)
886-
5. **Fallback:** If no synthesizer, store raw Q&A pair
883+
2. **Pre-filter:** `QuickFilter()` checks if both user and assistant messages are procedural → reject immediately (feeds rejection store)
884+
3. **Length gate:** Responses shorter than `ingest_min_len` (default 80 chars) are skipped — no LLM call
885+
4. **Content score gate:** Raw assistant text is embedded and scored against noise prototypes via `PreScore()`. Below `content_score_pre_gate` (default 0.35) → skipped. Does NOT feed rejection store (prevents positive feedback loop)
886+
5. **LLM quality gate:** `SynthesizeQA()` asks the model to distill or return `"SKIP"` → reject if no durable value (feeds rejection store)
887+
6. **Store:** Distilled entry goes through `ProcessDirect()` (no chunking, already formatted)
888+
7. **Fallback:** If no synthesizer, store raw Q&A pair
887889

888890
**Session synthesis:**
889891
- Fired at 3 complete Q&A pairs, then every 5 pairs after
@@ -990,6 +992,10 @@ Created automatically on first run with sensible defaults. All pipeline threshol
990992
| `sessionSynthesisInterval` | 5 | proxy/anthropic.go | Pairs between subsequent summaries |
991993
| `RebuildEvery` (rejection) | 25 | rejection/store.go | Rejections between scorer rebuilds |
992994
| `DefaultMaxSize` (rejection) | 500 | rejection/store.go | Max entries in rejection ring buffer |
995+
| `IngestMinLen` | 80 | config/PipelineConfig | Responses shorter than this skip Haiku entirely |
996+
| `ContentScorePreGate` | 0.35 | config/PipelineConfig | Pre-Haiku noise gate: below this → skip |
997+
| `noiseTopK` | 3 | quality/content.go | Top-K noise prototypes used in scoring |
998+
| `maxRejectionProtos` | 150 | quality/content.go | Max rejection texts used as noise prototypes |
993999

9941000
---
9951001

scripts/analyze_hf_dataset.py

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
#!/usr/bin/env python3
2+
"""Analyze the HF dataset for diversity and create a larger benchmark dataset."""
3+
import json
4+
import random
5+
import collections
6+
import sys
7+
8+
def analyze():
9+
rows = []
10+
with open("data/dataset-hf.jsonl") as f:
11+
for line in f:
12+
rows.append(json.loads(line))
13+
14+
print(f"Total rows: {len(rows)}")
15+
16+
# Check diversity by looking at user_prompt first 50 chars
17+
prefixes = collections.Counter()
18+
for r in rows:
19+
p = r.get("user_prompt", "")[:50]
20+
prefixes[p] += 1
21+
22+
print(f"Unique prompt prefixes (50ch): {len(prefixes)}")
23+
print("Top 10:")
24+
for prefix, count in prefixes.most_common(10):
25+
print(f" {count:>5}x {repr(prefix[:60])}")
26+
27+
# Check response length distribution
28+
lens = [len(r.get("assistant_response", "")) for r in rows]
29+
lens.sort()
30+
print(f"\nResponse length: min={lens[0]}, p25={lens[len(lens)//4]}, "
31+
f"median={lens[len(lens)//2]}, p75={lens[3*len(lens)//4]}, max={lens[-1]}")
32+
33+
# Check for content type diversity in random 1000
34+
random.seed(42)
35+
sample = random.sample(rows, 1000)
36+
short = sum(1 for r in sample if len(r.get("assistant_response", "")) < 80)
37+
medium = sum(1 for r in sample if 80 <= len(r.get("assistant_response", "")) < 500)
38+
long_resp = sum(1 for r in sample if len(r.get("assistant_response", "")) >= 500)
39+
print(f"\nRandom 1000 sample: short(<80)={short}, medium(80-500)={medium}, long(500+)={long_resp}")
40+
41+
# Check for actual content patterns in the sample
42+
ack_patterns = ["Sure", "I'll", "Let me", "Here", "OK", "Done", "Got it", "Understood"]
43+
ack_count = 0
44+
code_count = 0
45+
for r in sample:
46+
resp = r.get("assistant_response", "")
47+
if any(resp.strip().startswith(p) for p in ack_patterns) and len(resp) < 200:
48+
ack_count += 1
49+
if "```" in resp or "func " in resp or "def " in resp or "class " in resp:
50+
code_count += 1
51+
52+
print(f"Acknowledgments (short + starts with ack pattern): {ack_count}")
53+
print(f"Contains code blocks/definitions: {code_count}")
54+
55+
# Show sample of short responses
56+
print("\nSample short responses:")
57+
short_samples = [r for r in sample if len(r.get("assistant_response", "")) < 100]
58+
random.shuffle(short_samples)
59+
for r in short_samples[:10]:
60+
resp = r.get("assistant_response", "").strip()[:100]
61+
print(f" [{len(r['assistant_response']):>4}ch] {repr(resp)}")
62+
63+
# Show some medium responses
64+
print("\nSample medium responses:")
65+
med_samples = [r for r in sample if 200 <= len(r.get("assistant_response", "")) < 600]
66+
random.shuffle(med_samples)
67+
for r in med_samples[:5]:
68+
resp = r.get("assistant_response", "").strip()[:120]
69+
print(f" [{len(r['assistant_response']):>4}ch] {repr(resp)}")
70+
71+
if __name__ == "__main__":
72+
analyze()

scripts/clean_analysis.py

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
#!/usr/bin/env python3
2+
"""Clean cross-run analysis excluding errors."""
3+
import json
4+
5+
dataset = []
6+
with open("data/eval-large.jsonl") as f:
7+
for line in f:
8+
dataset.append(json.loads(line))
9+
10+
for run_idx, fname in enumerate(["benchmark-1k-r1.jsonl", "benchmark-1k-r2.jsonl", "benchmark-1k-r3.jsonl"], 1):
11+
results = []
12+
with open(fname) as fh:
13+
for line in fh:
14+
results.append(json.loads(line))
15+
16+
valid = [r for r in results if r["stage"] != "error"]
17+
total_valid = len(valid)
18+
19+
stages = {}
20+
for r in valid:
21+
stages[r["stage"]] = stages.get(r["stage"], 0) + 1
22+
23+
pre_haiku = stages.get("pre_filter", 0) + stages.get("length_filter", 0) + stages.get("content_score_filter", 0)
24+
haiku_calls = stages.get("synthesizer_skip", 0) + stages.get("stored", 0)
25+
26+
mixed_sub_total = 0
27+
mixed_sub_stored = 0
28+
for r in valid:
29+
idx = r["index"]
30+
lbl = dataset[idx].get("label", "")
31+
resp = dataset[idx].get("assistant_response", "")
32+
prompt = dataset[idx].get("user_prompt", "")
33+
is_mixed = "hyperswitch" not in resp.lower() and "hyperswitch" not in prompt.lower()
34+
if lbl == "substantive" and is_mixed:
35+
mixed_sub_total += 1
36+
if r["stage"] == "stored":
37+
mixed_sub_stored += 1
38+
39+
print(f"Run {run_idx} (excl {len(results)-len(valid)} errors):")
40+
print(f" Valid entries: {total_valid}")
41+
print(f" Pre-Haiku: {pre_haiku} ({pre_haiku/total_valid*100:.0f}%)")
42+
print(f" length_filter: {stages.get('length_filter', 0)}")
43+
print(f" content_score: {stages.get('content_score_filter', 0)}")
44+
print(f" Haiku calls: {haiku_calls} ({haiku_calls/total_valid*100:.0f}%)")
45+
print(f" stored: {stages.get('stored', 0)}")
46+
print(f" synth_skip: {stages.get('synthesizer_skip', 0)}")
47+
print(f" Hand-crafted substantive recall: {mixed_sub_stored}/{mixed_sub_total} ({mixed_sub_stored/max(mixed_sub_total,1)*100:.0f}%)")
48+
print()

scripts/detailed_cross_run.py

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
#!/usr/bin/env python3
2+
"""Detailed cross-run analysis for the 1000-row benchmark."""
3+
import json, sys, collections
4+
5+
dataset_path = "data/eval-large.jsonl"
6+
run_files = ["benchmark-large1.jsonl", "benchmark-large2.jsonl", "benchmark-large3.jsonl"]
7+
8+
# Load ground-truth labels and origin
9+
labels = {}
10+
origins = {} # "mixed" or "hf"
11+
with open(dataset_path) as f:
12+
for i, line in enumerate(f):
13+
d = json.loads(line)
14+
labels[i] = d.get("label", "unknown")
15+
# Hand-crafted entries have shorter responses (< 2000 chars typically)
16+
# and specific labels, while HF responses can be very long
17+
# We can identify by checking if user_prompt is from eval-mixed patterns
18+
resp = d.get("assistant_response", "")
19+
# Simple heuristic: eval-mixed entries don't contain hyperswitch references
20+
if "hyperswitch" in resp.lower() or "hyperswitch" in d.get("user_prompt", "").lower():
21+
origins[i] = "hf"
22+
elif len(resp) < 3000 and labels[i] in ("noise", "low", "substantive"):
23+
origins[i] = "mixed" # likely hand-crafted
24+
else:
25+
origins[i] = "hf"
26+
27+
for run_idx, run_file in enumerate(run_files, 1):
28+
results = []
29+
with open(run_file) as f:
30+
for line in f:
31+
results.append(json.loads(line))
32+
33+
print(f"{'='*60}")
34+
print(f"RUN {run_idx}: {run_file}")
35+
print(f"{'='*60}")
36+
37+
# Stage counts
38+
stages = collections.Counter(r["stage"] for r in results)
39+
print(f"\nStage distribution:")
40+
for stage in ["pre_filter", "length_filter", "content_score_filter", "synthesizer_skip", "stored", "error"]:
41+
print(f" {stage:<24} {stages.get(stage, 0):>4}")
42+
43+
# Count Haiku calls = total - pre_filter - length_filter - content_score_filter
44+
pre_haiku = stages.get("pre_filter", 0) + stages.get("length_filter", 0) + stages.get("content_score_filter", 0)
45+
haiku_calls = len(results) - pre_haiku - stages.get("error", 0)
46+
print(f"\n Pre-Haiku filtered: {pre_haiku}")
47+
print(f" Haiku calls: {haiku_calls}")
48+
49+
# Substantive recall split by origin
50+
print(f"\nSubstantive recall by origin:")
51+
for origin in ["mixed", "hf"]:
52+
stored = 0
53+
total = 0
54+
for r in results:
55+
idx = r["index"]
56+
if labels.get(idx) == "substantive" and origins.get(idx) == origin:
57+
total += 1
58+
if r["stage"] == "stored":
59+
stored += 1
60+
if total > 0:
61+
print(f" {origin:>5}: {stored}/{total} stored ({stored/total*100:.0f}%)")
62+
63+
# Filtered substantive by stage, split by origin
64+
print(f"\nFiltered substantive by stage+origin:")
65+
for origin in ["mixed", "hf"]:
66+
stage_counts = collections.Counter()
67+
for r in results:
68+
idx = r["index"]
69+
if labels.get(idx) == "substantive" and origins.get(idx) == origin and r["stage"] != "stored":
70+
stage_counts[r["stage"]] += 1
71+
if stage_counts:
72+
print(f" {origin}: {dict(stage_counts)}")
73+
74+
print()

0 commit comments

Comments
 (0)