Skip to content

feat: add unified library index, pattern fingerprints, and chunk preview for smarter agent navigation#19

Merged
barkain merged 15 commits intomainfrom
claude/research-idea-improvements-4PJXx
Apr 5, 2026
Merged

feat: add unified library index, pattern fingerprints, and chunk preview for smarter agent navigation#19
barkain merged 15 commits intomainfrom
claude/research-idea-improvements-4PJXx

Conversation

@barkain
Copy link
Copy Markdown
Owner

@barkain barkain commented Apr 3, 2026

Introduces three new library-wide data structures that transform how agents
navigate knowledge:

  • library_index.json: unified cross-book/corpus concept index with aliases,
    related concepts, and pattern fingerprints — one read covers the entire library
  • pattern_index.json: reverse index from abstract structural patterns to concepts,
    enabling cross-domain associative recall ("this reminds me of...")
  • chunk_index.json: per-book chunk preview metadata (section, concepts, tokens,
    prev/next chains) so agents can assess chunks before reading them

Also adds concept relationship edges (related field) and pattern tags to the
concept extraction prompt, three new MCP tools (search_library, explore_patterns,
preview_chunks), updated agent/skill prompts for the new navigation flow, and
comprehensive tests for all new functionality.

https://claude.ai/code/session_01EwryWxbML8dmiSrDsnU5mq

claude added 2 commits April 3, 2026 15:46
…iew for smarter agent navigation

Introduces three new library-wide data structures that transform how agents
navigate knowledge:

- library_index.json: unified cross-book/corpus concept index with aliases,
  related concepts, and pattern fingerprints — one read covers the entire library
- pattern_index.json: reverse index from abstract structural patterns to concepts,
  enabling cross-domain associative recall ("this reminds me of...")
- chunk_index.json: per-book chunk preview metadata (section, concepts, tokens,
  prev/next chains) so agents can assess chunks before reading them

Also adds concept relationship edges (related field) and pattern tags to the
concept extraction prompt, three new MCP tools (search_library, explore_patterns,
preview_chunks), updated agent/skill prompts for the new navigation flow, and
comprehensive tests for all new functionality.

https://claude.ai/code/session_01EwryWxbML8dmiSrDsnU5mq
Update documentation to reflect the new unified library index, pattern
fingerprints, and chunk preview system. Updated sections include the
navigation flow diagram, metadata layers, library structure, querying
description, and ingestion command docs.

https://claude.ai/code/session_01EwryWxbML8dmiSrDsnU5mq
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Add unified library index, pattern fingerprints, and chunk preview for smarter agent navigation

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Introduces three unified library-wide data structures for intelligent cross-library navigation
  - library_index.json: unified concept index with aliases, related concepts, and pattern
  fingerprints across all books and corpora
  - pattern_index.json: reverse index enabling cross-domain discovery via abstract structural
  patterns
  - chunk_index.json: per-book chunk preview metadata (section, concepts, tokens, prev/next chains)
• Adds pattern fingerprints to concept extraction with seed vocabulary for consistency
• Implements three new MCP tools (search_library, explore_patterns, preview_chunks) for
  smarter navigation
• Updates agent and skill prompts to leverage unified indices for faster, more intelligent queries
• Adds comprehensive tests and documentation for all new navigation features
Diagram
flowchart LR
  Q["User Query"] --> LI["library_index.json<br/>ALL concepts, ALL sources"]
  LI --> M{"Match found?"}
  M -->|yes| CI["chunk_index.json<br/>preview metadata"]
  M -->|no| PI["pattern_index.json<br/>cross-domain patterns"]
  PI --> CI
  CI --> CH["chunks<br/>300-500 tokens"]
  CH --> A["Answer with citations"]
Loading

Grey Divider

File Changes

1. lib/models.py ✨ Enhancement +158/-0

Add library-wide navigation data structures

lib/models.py


2. lib/storage.py ✨ Enhancement +112/-14

Implement I/O for unified indices and chunk metadata

lib/storage.py


3. lib/summariser.py ✨ Enhancement +50/-3

Extract patterns and related concepts from content

lib/summariser.py


View more (13)
4. preprocessing/books.py ✨ Enhancement +178/-0

Build chunk index and update library indices on ingestion

preprocessing/books.py


5. preprocessing/corpus.py ✨ Enhancement +101/-1

Update corpus ingestion to populate unified indices

preprocessing/corpus.py


6. server.py ✨ Enhancement +74/-0

Add three new MCP tools for library navigation

server.py


7. tests/conftest.py 🧪 Tests +9/-2

Add pattern and related fields to test fixtures

tests/conftest.py


8. tests/test_server.py 🧪 Tests +110/-2

Add tests for new search, pattern, and preview tools

tests/test_server.py


9. tests/test_storage.py 🧪 Tests +106/-1

Add tests for chunk, library, and pattern index I/O

tests/test_storage.py


10. tests/test_summariser.py 🧪 Tests +16/-0

Add pattern and related concept parsing tests

tests/test_summariser.py


11. README.md 📝 Documentation +50/-24

Update navigation flow and document new indices

README.md


12. agents/library-researcher.md 📝 Documentation +33/-16

Revise agent prompt for unified library search workflow

agents/library-researcher.md


13. skills/agentlib-knowledge/SKILL.md 📝 Documentation +27/-28

Update skill documentation for new navigation paths

skills/agentlib-knowledge/SKILL.md


14. commands/agentlib-ingest-book.md 📝 Documentation +5/-3

Document new ingestion outputs and unified index updates

commands/agentlib-ingest-book.md


15. commands/agentlib-ingest-corpus.md 📝 Documentation +3/-2

Document corpus ingestion updates to unified indices

commands/agentlib-ingest-corpus.md


16. pyproject.toml ⚙️ Configuration changes +1/-1

Bump version to 1.8.0

pyproject.toml


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Apr 3, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX Issues (0)

Grey Divider


Action required

1. Concept metadata dropped🐞 Bug ≡ Correctness
Description
Book ingestion discards LLM-extracted patterns and cannot persist related, so concepts.json,
library_index.json, and pattern_index.json will miss the new navigation metadata and the new MCP
tools will return incomplete results.
Code

preprocessing/books.py[1]

write_concept_index(book_id, manifest.concept_index)
Evidence
extract_concepts() parses both patterns and related into ConceptMapping, but ingestion
converts those mappings into ConceptEntry objects without copying patterns and with no way to
store related because ConceptEntry has no related field. Downstream writers/updaters
(write_concept_index, _update_library_indices) explicitly look for these fields, so the
resulting indices will be empty/missing for book-derived patterns/related.

preprocessing/books.py[485-490]
lib/summariser.py[270-298]
lib/models.py[71-79]
lib/storage.py[322-366]
preprocessing/books.py[169-193]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Book ingestion currently drops the new concept navigation metadata: `ConceptMapping.patterns` is not copied into `ConceptEntry`, and `ConceptEntry` lacks a `related` field so `ConceptMapping.related` cannot be persisted. This breaks the PR’s core features because downstream outputs (concepts.json / library_index.json / pattern_index.json) are built from `Manifest.concept_index`.
### Issue Context
- `lib.summariser.extract_concepts()` now returns `ConceptMapping` objects with `patterns` and `related`.
- `preprocessing/books.py` converts these into `ConceptEntry` but only passes aliases.
- Storage/indexing code expects `patterns`/`related` to exist on entries.
### Fix Focus Areas
- preprocessing/books.py[485-490]
- lib/models.py[71-79]
### What to change
1. Add `related: list[str] = field(default_factory=list)` to `ConceptEntry` in `lib/models.py`.
2. In `preprocessing/books.py`, when constructing `ConceptEntry` from `ConceptMapping`, pass:
- `patterns=m.patterns`
- `related=m.related`
3. (Recommended) After merging/dedup in `extract_concepts`, filter `related` to only include concepts that exist in the final index to avoid dangling references from batching.
4. Ensure Manifest (de)serialization remains backward-compatible (defaults cover missing fields).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Chunk token counts lost🐞 Bug ☼ Reliability
Description
When ingestion reuses existing chunks (all_chunks == []), chunk_index.json token counts remain 0
for all chunks, degrading preview_chunks token budgeting and any logic that depends on accurate
token counts.
Code

preprocessing/books.py[R548-573]

+    all_chunk_ids = list_chunks(book_id)
+    for cid in all_chunk_ids:
+        chunk_idx_entries[cid] = ChunkIndexEntry(
+            section=section_labels.get(cid, ""),
+            concepts=chunk_concepts.get(cid, []),
+            tokens=0,  # filled below if available
+        )
+
+    # Set prev/next chains per section group and token counts
+    sec_groups: dict[str, list[str]] = defaultdict(list)
+    for cid in all_chunk_ids:
+        sec_id = _section_id_from_chunk(cid)
+        sec_groups[sec_id].append(cid)
+    for sec_id, cids in sec_groups.items():
+        for i, cid in enumerate(cids):
+            entry = chunk_idx_entries[cid]
+            if i > 0:
+                entry.prev = cids[i - 1]
+            if i < len(cids) - 1:
+                entry.next = cids[i + 1]
+
+    # Get token counts from the chunks we just created
+    if all_chunks:
+        for chunk in all_chunks:
+            if chunk.chunk_id in chunk_idx_entries:
+                chunk_idx_entries[chunk.chunk_id].tokens = chunk.meta.token_count
Evidence
If existing chunks are found and force is false, ingestion skips chunking and sets all_chunks to
an empty list. Later, chunk_index entries are initialized with tokens=0 and token counts are only
filled inside if all_chunks:; therefore, reused chunks never get token counts. The chunker already
writes token_count into each chunk’s YAML frontmatter, but ingestion does not parse it when
all_chunks is empty.

preprocessing/books.py[292-300]
preprocessing/books.py[548-574]
lib/chunker.py[266-284]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`chunk_index.json` token counts remain `0` whenever ingestion skips chunking (because chunks already exist on disk). This makes `preview_chunks` misleading and undermines the “token budgeting” feature.
### Issue Context
- When `existing_chunks and not force`, `all_chunks` is set to `[]`.
- `chunk_index` initializes `tokens=0` for each chunk and only backfills tokens when `all_chunks` is non-empty.
- Chunk markdown files include `token_count` in their YAML frontmatter.
### Fix Focus Areas
- preprocessing/books.py[292-300]
- preprocessing/books.py[548-574]
- lib/chunker.py[266-284]
### What to change
Implement a fallback when `all_chunks` is empty:
1. For each chunk ID in `all_chunk_ids`, read the chunk file content and parse YAML frontmatter to extract `token_count`.
- If parsing is undesirable, recompute token count from body using `lib.chunker.count_tokens`.
2. Populate `chunk_idx_entries[cid].tokens` with the recovered value.
3. Keep current fast-path when `all_chunks` exists (no extra I/O).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Unbounded pattern results🐞 Bug ➹ Performance
Description
explore_patterns returns all entries for matching patterns without any cap, which can produce very
large MCP tool responses as the library grows and can degrade performance or exceed response limits.
Code

server.py[R193-206]

+@mcp.tool()
+def explore_patterns(pattern: str) -> str:
+    """Look up a pattern tag to find structurally similar concepts across the library. Use after finding a concept's patterns via search_library to discover cross-domain analogies."""
+    pat_index = storage.read_pattern_index()
+    pattern_lower = pattern.lower()
+    results: dict = {}
+
+    for pat_name, entries in pat_index.patterns.items():
+        if pattern_lower in pat_name.lower():
+            results[pat_name] = [
+                {"concept": e.concept, "source": e.source, "chunks": e.chunks}
+                for e in entries
+            ]
+
Evidence
Unlike search_library which caps results with MAX_SEARCH_RESULTS, explore_patterns serializes
the entire list of PatternEntry objects for each matched pattern and returns it directly. Since
the index is library-wide and accumulates across books/corpora, popular patterns can easily map to
many entries.

server.py[160-190]
server.py[193-213]
lib/models.py[162-176]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`explore_patterns()` can return an unbounded number of entries, risking huge JSON payloads and slow responses.
### Issue Context
`search_library()` caps to `MAX_SEARCH_RESULTS`, but `explore_patterns()` has no equivalent cap.
### Fix Focus Areas
- server.py[193-213]
### What to change
1. Add a cap constant (e.g., `MAX_PATTERN_RESULTS = 100`) and truncate returned entries per pattern and/or total.
2. Include a `truncated: true` indicator when truncation happens.
3. (Optional) Add pagination parameters (`limit`, `offset`) to the tool signature for better UX.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

barkain and others added 13 commits April 4, 2026 17:07
…search tools, clean up dead code

- Fix ConceptEntry related field, chunk token counts, explore_patterns cap, compact manifest revert
- Remove hardcoded pattern seed vocabulary in favour of LLM-generated expansions
- Consolidate 6 navigation files into 3 (library_index with patterns, nav.json, manifest.json)
- Merge search_library + explore_patterns into a single tool
- Clean up dead code and unused helpers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Make nav.json preview a hard requirement in agent and skill prompts.
Reduce chunk budget from 5 to 3 to force selective reading.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Haiku was unreliable at multi-step navigation and produced oversized
output that the main agent couldn't use. Switch to sonnet and add
2000-char output limit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agent was hitting 15-turn limit mid-research and returning narration
instead of answers. Bump to 25 and add instruction to always
synthesize an answer rather than return mid-thought.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sub-agent delegation caused doubled work (17-27K tokens discarded
when main agent rejected output), turn limit failures, and nav.json
size errors. MCP tools (search_library, preview_chunks, read_chunks)
handle navigation server-side in 3 calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The MCP tools (search_library, preview_chunks, read_chunks, etc.)
were defined in server.py but never wired into the plugin config.
Claude Code loaded skills/agents but never started the server.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code may not have uv in its PATH when spawning subprocesses,
causing the MCP server to fail to start.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agents were retrying search_library 6+ times on misses. Now: try
once with broad terms, fall back to open_book chapter browsing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agent was guessing chunk ID format (ch11_chunk_001) instead of the
actual format (ch11-s01-001). Now open_book returns chunk_ids per
section so preview_chunks flows directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@barkain barkain merged commit 11092a0 into main Apr 5, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants