feat: add unified library index, pattern fingerprints, and chunk preview for smarter agent navigation by barkain · Pull Request #19 · barkain/agentlib

barkain · 2026-04-03T15:53:43Z

Introduces three new library-wide data structures that transform how agents
navigate knowledge:

library_index.json: unified cross-book/corpus concept index with aliases,
related concepts, and pattern fingerprints — one read covers the entire library
pattern_index.json: reverse index from abstract structural patterns to concepts,
enabling cross-domain associative recall ("this reminds me of...")
chunk_index.json: per-book chunk preview metadata (section, concepts, tokens,
prev/next chains) so agents can assess chunks before reading them

Also adds concept relationship edges (related field) and pattern tags to the
concept extraction prompt, three new MCP tools (search_library, explore_patterns,
preview_chunks), updated agent/skill prompts for the new navigation flow, and
comprehensive tests for all new functionality.

https://claude.ai/code/session_01EwryWxbML8dmiSrDsnU5mq

…iew for smarter agent navigation Introduces three new library-wide data structures that transform how agents navigate knowledge: - library_index.json: unified cross-book/corpus concept index with aliases, related concepts, and pattern fingerprints — one read covers the entire library - pattern_index.json: reverse index from abstract structural patterns to concepts, enabling cross-domain associative recall ("this reminds me of...") - chunk_index.json: per-book chunk preview metadata (section, concepts, tokens, prev/next chains) so agents can assess chunks before reading them Also adds concept relationship edges (related field) and pattern tags to the concept extraction prompt, three new MCP tools (search_library, explore_patterns, preview_chunks), updated agent/skill prompts for the new navigation flow, and comprehensive tests for all new functionality. https://claude.ai/code/session_01EwryWxbML8dmiSrDsnU5mq

Update documentation to reflect the new unified library index, pattern fingerprints, and chunk preview system. Updated sections include the navigation flow diagram, metadata layers, library structure, querying description, and ingestion command docs. https://claude.ai/code/session_01EwryWxbML8dmiSrDsnU5mq

qodo-code-review · 2026-04-03T15:55:38Z

Review Summary by Qodo

Add unified library index, pattern fingerprints, and chunk preview for smarter agent navigation

✨ Enhancement

Walkthroughs

Description

• Introduces three unified library-wide data structures for intelligent cross-library navigation
  - library_index.json: unified concept index with aliases, related concepts, and pattern
  fingerprints across all books and corpora
  - pattern_index.json: reverse index enabling cross-domain discovery via abstract structural
  patterns
  - chunk_index.json: per-book chunk preview metadata (section, concepts, tokens, prev/next chains)
• Adds pattern fingerprints to concept extraction with seed vocabulary for consistency
• Implements three new MCP tools (search_library, explore_patterns, preview_chunks) for
  smarter navigation
• Updates agent and skill prompts to leverage unified indices for faster, more intelligent queries
• Adds comprehensive tests and documentation for all new navigation features

Diagram

flowchart LR
  Q["User Query"] --> LI["library_index.json<br/>ALL concepts, ALL sources"]
  LI --> M{"Match found?"}
  M -->|yes| CI["chunk_index.json<br/>preview metadata"]
  M -->|no| PI["pattern_index.json<br/>cross-domain patterns"]
  PI --> CI
  CI --> CH["chunks<br/>300-500 tokens"]
  CH --> A["Answer with citations"]

File Changes

1. lib/models.py ✨ Enhancement +158/-0

Add library-wide navigation data structures

lib/models.py

2. lib/storage.py ✨ Enhancement +112/-14

Implement I/O for unified indices and chunk metadata

lib/storage.py

3. lib/summariser.py ✨ Enhancement +50/-3

Extract patterns and related concepts from content

lib/summariser.py

View more (13)

4. preprocessing/books.py ✨ Enhancement +178/-0

Build chunk index and update library indices on ingestion

preprocessing/books.py

5. preprocessing/corpus.py ✨ Enhancement +101/-1

Update corpus ingestion to populate unified indices

preprocessing/corpus.py

6. server.py ✨ Enhancement +74/-0

Add three new MCP tools for library navigation

server.py

7. tests/conftest.py 🧪 Tests +9/-2

Add pattern and related fields to test fixtures

tests/conftest.py

8. tests/test_server.py 🧪 Tests +110/-2

Add tests for new search, pattern, and preview tools

tests/test_server.py

9. tests/test_storage.py 🧪 Tests +106/-1

Add tests for chunk, library, and pattern index I/O

tests/test_storage.py

10. tests/test_summariser.py 🧪 Tests +16/-0

Add pattern and related concept parsing tests

tests/test_summariser.py

11. README.md 📝 Documentation +50/-24

Update navigation flow and document new indices

README.md

12. agents/library-researcher.md 📝 Documentation +33/-16

Revise agent prompt for unified library search workflow

agents/library-researcher.md

13. skills/agentlib-knowledge/SKILL.md 📝 Documentation +27/-28

Update skill documentation for new navigation paths

skills/agentlib-knowledge/SKILL.md

14. commands/agentlib-ingest-book.md 📝 Documentation +5/-3

Document new ingestion outputs and unified index updates

commands/agentlib-ingest-book.md

15. commands/agentlib-ingest-corpus.md 📝 Documentation +3/-2

Document corpus ingestion updates to unified indices

commands/agentlib-ingest-corpus.md

16. pyproject.toml ⚙️ Configuration changes +1/-1

Bump version to 1.8.0

pyproject.toml

qodo-code-review · 2026-04-03T15:55:39Z

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX Issues (0)

1. ~~Concept metadata dropped~~ ☑ 🐞 Bug ≡ Correctness

Description

Book ingestion discards LLM-extracted patterns and cannot persist related, so concepts.json,
library_index.json, and pattern_index.json will miss the new navigation metadata and the new MCP
tools will return incomplete results.

Code

preprocessing/books.py[1]
write_concept_index(book_id, manifest.concept_index)

Evidence

extract_concepts() parses both patterns and related into ConceptMapping, but ingestion
converts those mappings into ConceptEntry objects without copying patterns and with no way to
store related because ConceptEntry has no related field. Downstream writers/updaters
(write_concept_index, _update_library_indices) explicitly look for these fields, so the
resulting indices will be empty/missing for book-derived patterns/related.

preprocessing/books.py[485-490]
lib/summariser.py[270-298]
lib/models.py[71-79]
lib/storage.py[322-366]
preprocessing/books.py[169-193]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Book ingestion currently drops the new concept navigation metadata: `ConceptMapping.patterns` is not copied into `ConceptEntry`, and `ConceptEntry` lacks a `related` field so `ConceptMapping.related` cannot be persisted. This breaks the PR’s core features because downstream outputs (concepts.json / library_index.json / pattern_index.json) are built from `Manifest.concept_index`.
### Issue Context
- `lib.summariser.extract_concepts()` now returns `ConceptMapping` objects with `patterns` and `related`.
- `preprocessing/books.py` converts these into `ConceptEntry` but only passes aliases.
- Storage/indexing code expects `patterns`/`related` to exist on entries.
### Fix Focus Areas
- preprocessing/books.py[485-490]
- lib/models.py[71-79]
### What to change
1. Add `related: list[str] = field(default_factory=list)` to `ConceptEntry` in `lib/models.py`.
2. In `preprocessing/books.py`, when constructing `ConceptEntry` from `ConceptMapping`, pass:
- `patterns=m.patterns`
- `related=m.related`
3. (Recommended) After merging/dedup in `extract_concepts`, filter `related` to only include concepts that exist in the final index to avoid dangling references from batching.
4. Ensure Manifest (de)serialization remains backward-compatible (defaults cover missing fields).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~Chunk token counts lost~~ ☑ 🐞 Bug ☼ Reliability

Description

When ingestion reuses existing chunks (all_chunks == []), chunk_index.json token counts remain 0
for all chunks, degrading preview_chunks token budgeting and any logic that depends on accurate
token counts.

Code

preprocessing/books.py[R548-573]

+    all_chunk_ids = list_chunks(book_id)
+    for cid in all_chunk_ids:
+        chunk_idx_entries[cid] = ChunkIndexEntry(
+            section=section_labels.get(cid, ""),
+            concepts=chunk_concepts.get(cid, []),
+            tokens=0,  # filled below if available
+        )
+
+    # Set prev/next chains per section group and token counts
+    sec_groups: dict[str, list[str]] = defaultdict(list)
+    for cid in all_chunk_ids:
+        sec_id = _section_id_from_chunk(cid)
+        sec_groups[sec_id].append(cid)
+    for sec_id, cids in sec_groups.items():
+        for i, cid in enumerate(cids):
+            entry = chunk_idx_entries[cid]
+            if i > 0:
+                entry.prev = cids[i - 1]
+            if i < len(cids) - 1:
+                entry.next = cids[i + 1]
+
+    # Get token counts from the chunks we just created
+    if all_chunks:
+        for chunk in all_chunks:
+            if chunk.chunk_id in chunk_idx_entries:
+                chunk_idx_entries[chunk.chunk_id].tokens = chunk.meta.token_count

Evidence

If existing chunks are found and force is false, ingestion skips chunking and sets all_chunks to
an empty list. Later, chunk_index entries are initialized with tokens=0 and token counts are only
filled inside if all_chunks:; therefore, reused chunks never get token counts. The chunker already
writes token_count into each chunk’s YAML frontmatter, but ingestion does not parse it when
all_chunks is empty.

preprocessing/books.py[292-300]
preprocessing/books.py[548-574]
lib/chunker.py[266-284]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`chunk_index.json` token counts remain `0` whenever ingestion skips chunking (because chunks already exist on disk). This makes `preview_chunks` misleading and undermines the “token budgeting” feature.
### Issue Context
- When `existing_chunks and not force`, `all_chunks` is set to `[]`.
- `chunk_index` initializes `tokens=0` for each chunk and only backfills tokens when `all_chunks` is non-empty.
- Chunk markdown files include `token_count` in their YAML frontmatter.
### Fix Focus Areas
- preprocessing/books.py[292-300]
- preprocessing/books.py[548-574]
- lib/chunker.py[266-284]
### What to change
Implement a fallback when `all_chunks` is empty:
1. For each chunk ID in `all_chunk_ids`, read the chunk file content and parse YAML frontmatter to extract `token_count`.
- If parsing is undesirable, recompute token count from body using `lib.chunker.count_tokens`.
2. Populate `chunk_idx_entries[cid].tokens` with the recovered value.
3. Keep current fast-path when `all_chunks` exists (no extra I/O).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. ~~Unbounded pattern results~~ ☑ 🐞 Bug ➹ Performance

Description

explore_patterns returns all entries for matching patterns without any cap, which can produce very
large MCP tool responses as the library grows and can degrade performance or exceed response limits.

Code

server.py[R193-206]

+@mcp.tool()
+def explore_patterns(pattern: str) -> str:
+    """Look up a pattern tag to find structurally similar concepts across the library. Use after finding a concept's patterns via search_library to discover cross-domain analogies."""
+    pat_index = storage.read_pattern_index()
+    pattern_lower = pattern.lower()
+    results: dict = {}
+
+    for pat_name, entries in pat_index.patterns.items():
+        if pattern_lower in pat_name.lower():
+            results[pat_name] = [
+                {"concept": e.concept, "source": e.source, "chunks": e.chunks}
+                for e in entries
+            ]
+

Evidence

Unlike search_library which caps results with MAX_SEARCH_RESULTS, explore_patterns serializes
the entire list of PatternEntry objects for each matched pattern and returns it directly. Since
the index is library-wide and accumulates across books/corpora, popular patterns can easily map to
many entries.

server.py[160-190]
server.py[193-213]
lib/models.py[162-176]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`explore_patterns()` can return an unbounded number of entries, risking huge JSON payloads and slow responses.
### Issue Context
`search_library()` caps to `MAX_SEARCH_RESULTS`, but `explore_patterns()` has no equivalent cap.
### Fix Focus Areas
- server.py[193-213]
### What to change
1. Add a cap constant (e.g., `MAX_PATTERN_RESULTS = 100`) and truncate returned entries per pattern and/or total.
2. Include a `truncated: true` indicator when truncation happens.
3. (Optional) Add pagination parameters (`limit`, `offset`) to the tool signature for better UX.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

…search tools, clean up dead code - Fix ConceptEntry related field, chunk token counts, explore_patterns cap, compact manifest revert - Remove hardcoded pattern seed vocabulary in favour of LLM-generated expansions - Consolidate 6 navigation files into 3 (library_index with patterns, nav.json, manifest.json) - Merge search_library + explore_patterns into a single tool - Clean up dead code and unused helpers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Make nav.json preview a hard requirement in agent and skill prompts. Reduce chunk budget from 5 to 3 to force selective reading. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Haiku was unreliable at multi-step navigation and produced oversized output that the main agent couldn't use. Switch to sonnet and add 2000-char output limit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Agent was hitting 15-turn limit mid-research and returning narration instead of answers. Bump to 25 and add instruction to always synthesize an answer rather than return mid-thought. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Sub-agent delegation caused doubled work (17-27K tokens discarded when main agent rejected output), turn limit failures, and nav.json size errors. MCP tools (search_library, preview_chunks, read_chunks) handle navigation server-side in 3 calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The MCP tools (search_library, preview_chunks, read_chunks, etc.) were defined in server.py but never wired into the plugin config. Claude Code loaded skills/agents but never started the server. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Claude Code may not have uv in its PATH when spawning subprocesses, causing the MCP server to fail to start. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Agents were retrying search_library 6+ times on misses. Now: try once with broad terms, fall back to open_book chapter browsing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Agent was guessing chunk ID format (ch11_chunk_001) instead of the actual format (ch11-s01-001). Now open_book returns chunk_ids per section so preview_chunks flows directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude added 2 commits April 3, 2026 15:46

barkain and others added 13 commits April 4, 2026 17:07

fix: enforce mandatory chunk preview before reading

af4d7a3

Make nav.json preview a hard requirement in agent and skill prompts. Reduce chunk budget from 5 to 3 to force selective reading. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: upgrade library-researcher agent to sonnet, cap output size

c13c9cb

Haiku was unreliable at multi-step navigation and produced oversized output that the main agent couldn't use. Switch to sonnet and add 2000-char output limit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: use CLAUDE_PLUGIN_ROOT for MCP server paths, bump to v1.8.0

8816b16

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: use absolute path to uv in plugin.json for MCP server startup

ff75a61

Claude Code may not have uv in its PATH when spawning subprocesses, causing the MCP server to fail to start. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: use --project flag for uv to find venv when spawned by Claude Code

bfba980

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: simplify skill navigation — one search, then browse chapters

1e5cdbd

Agents were retrying search_library 6+ times on misses. Now: try once with broad terms, fall back to open_book chapter browsing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: include chunk IDs in open_book section output

d874cb1

Agent was guessing chunk ID format (ch11_chunk_001) instead of the actual format (ch11-s01-001). Now open_book returns chunk_ids per section so preview_chunks flows directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: use portable /usr/bin/env for uv in MCP server config

dc1dba6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: update README for v1.8.0 navigation architecture

469f0d8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

barkain merged commit 11092a0 into main Apr 5, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add unified library index, pattern fingerprints, and chunk preview for smarter agent navigation#19

feat: add unified library index, pattern fingerprints, and chunk preview for smarter agent navigation#19
barkain merged 15 commits intomainfrom
claude/research-idea-improvements-4PJXx

barkain commented Apr 3, 2026

Uh oh!

qodo-code-review Bot commented Apr 3, 2026

Uh oh!

qodo-code-review Bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

barkain commented Apr 3, 2026

Uh oh!

qodo-code-review Bot commented Apr 3, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qodo-code-review Bot commented Apr 3, 2026 •

edited

Loading