barkain · barkain · Apr 5, 2026 · Apr 3, 2026 · Apr 3, 2026 · Apr 4, 2026
diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json
@@ -1,8 +1,15 @@
 {
   "name": "agentlib",
-  "version": "1.4.0",
-  "description": "Agentic Knowledge Navigation — ingest books/papers/databases into chunked metadata layers, then navigate them via a universal skill. No MCP server required.",
+  "version": "1.8.0",
+  "description": "Agentic Knowledge Navigation — ingest books and papers into a curated library, then navigate via MCP tools or file-based agent.",
   "author": {
     "name": "Nadav Barkai"
+  },
+  "mcpServers": {
+    "agentlib": {
+      "command": "/usr/bin/env",
+      "args": ["uv", "run", "--project", "${CLAUDE_PLUGIN_ROOT}", "python", "${CLAUDE_PLUGIN_ROOT}/server.py"],
+      "cwd": "${CLAUDE_PLUGIN_ROOT}"
+    }
   }
 }
diff --git a/README.md b/README.md
@@ -20,36 +20,47 @@ AgentLib changes this. Ingest the books, papers, and documents that matter for y
 AgentLib has three parts:
 
 1. **Ingestion pipelines** — preprocess books, scientific paper corpora, and databases into small, self-contained chunks with lightweight metadata at multiple layers.
-2. **Universal navigation skill** (`agentlib-knowledge`) — teaches the agent to read cheap metadata first, then drill into specific chunks.
-3. **Research agent** (`library-researcher`) — runs in an isolated context to keep the main conversation clean. All navigation and chunk reading happens in the agent's context; only a synthesized answer returns.
+2. **MCP tools** — the plugin registers an MCP server with 6 tools: `browse_library`, `open_book`, `search_library`, `search_concepts`, `preview_chunks`, `read_chunks`. The agent calls these directly — no sub-agent needed.
+3. **Universal navigation skill** (`agentlib-knowledge`) — teaches the agent to search cheap metadata first, then drill into specific chunks via `search_library` → `preview_chunks` → `read_chunks`.
 
-No MCP server required. No tool calls. The agent reads preprocessed files directly from `~/.claude/plugins/agentlib/library/`.
+The agent navigates via MCP tool calls against preprocessed files in `~/.claude/plugins/agentlib/library/`.
 
 ### How agents navigate the library
 
 ```mermaid
 graph LR
-    Q["User question"] --> R["library-researcher<br/>(isolated context)"]
-    R --> NAV["NAVIGATION.md<br/>~50 tok per book"]
-    R --> CS["concepts.json (Ls)<br/>~200 tok"]
-    R --> CAT["catalog (L0)<br/>~50 tok per book"]
+    Q["User question"] --> SL["search_library<br/>concepts + patterns<br/>library_index.json"]
+    SL --> PC["preview_chunks<br/>chunk metadata<br/>nav.json"]
+    PC --> RC["read_chunks<br/>2-3 best chunks<br/>300-500 tok each"]
+    RC --> A["Answer with citations"]
+```
 
-    CS --> M{"concept/alias<br/>match?"}
-    M -- hit --> CH["chunks (L2)<br/>300-500 tok each"]
-    M -- miss --> MAN["manifest (L1)<br/>~500 tok"]
-    MAN --> CH
+**Fast path (concept hit):** `search_library` → `preview_chunks` → `read_chunks` — **3 tool calls, ~1.5k tokens**
 
-    NAV --> CH
-    CAT --> MAN
+**Pattern path (cross-domain):** `search_library` (pattern tags) → `preview_chunks` → `read_chunks` — **3 tool calls, ~2.5k tokens**
 
-    CH --> A["Synthesized answer<br/>(returned to user)"]
-```
+**Recovery on miss:** related concepts → pattern traversal → `search_concepts` per book → Grep fallback
+
+#### Unified library index
+
+`library_index.json` is the single entry point for the entire library. One file, all books and corpora — queried via `search_library`. Each concept carries:
+
+- **aliases** — abbreviations, acronyms, synonyms (searching "CDX" matches "CycloneDX")
+- **related** — directly connected concepts in the same domain ("OAuth 2.0" → "JWT", "access tokens")
+- **patterns** — abstract structural fingerprints for cross-domain discovery (see below)
+- **sources** — which books/papers contain the concept and their chunk IDs
 
-**Ls hit (fast path):** NAVIGATION → concepts.json → chunks — **2-3 reads, ~1k tokens**
+#### Pattern fingerprints — associative recall
 
-**Ls miss (slow path):** NAVIGATION → catalog → manifest → chunks — **5-6 reads, ~5k tokens**
+Every concept is tagged with 2-3 **pattern fingerprints**: abstract, domain-independent descriptors of its structural nature. These enable a "this reminds me of..." capability that keyword search can never provide.
 
-The concept index includes **aliases** (abbreviations, acronyms, synonyms) generated by the LLM at ingestion time. Searching "CDX" matches the alias on "CycloneDX"; searching "SBOM" matches "Software Bill of Materials". This turns misses into hits without any runtime cost.
+For example, "OAuth token rotation", "TLS certificate renewal", and "SSH key rotation" all share the pattern `credential-cycling`. An agent reading about token rotation can discover structurally analogous solutions in completely different books — without any keyword overlap.
+
+Pattern tags are integrated directly into `library_index.json` and searchable via `search_library`. A seed vocabulary of ~40 common patterns ensures consistency across books; fuzzy matching merges near-duplicates.
+
+#### Chunk preview via nav.json
+
+Each book's `nav.json` lets agents see what's inside each chunk *before* reading it: section title, concepts covered, token count, and prev/next chains. Queried via `preview_chunks`, this eliminates blind reads — the agent picks the 2-3 best chunks from a set of candidates instead of reading 5 and hoping.
 
 <p align="center">
   <img src="assets/demo_proactive_query.png" alt="AgentLib proactive library query" width="800">
@@ -64,37 +75,41 @@ The concept index includes **aliases** (abbreviations, acronyms, synonyms) gener
 </p>
 </details>
 
-### Three metadata layers
+### Metadata layers
 
 ```
-L0  "What exists?"       →  catalog/NAVIGATION.md: ~50 tokens per book   (cheap)
-L1  "What's inside?"     →  manifest: structure, summaries, concepts      (moderate)
-L2  "Give me the content" →  small self-contained chunks, 300-500 tok     (expensive)
+Lx  "What do I know?"     →  library_index.json: concepts, patterns, sources  (search_library)
+Ln  "What's in a book?"   →  nav.json: structure + chunk metadata + concepts  (preview_chunks)
+L2  "Give me the content" →  chunks: 300-500 tok each                         (read_chunks)
+Lf  "Full rebuild"        →  manifest.json: complete archive per book         (offline)
 ```
 
+Three files instead of six — `library_index.json` (1 file, entire library), `nav.json` (per book), and `manifest.json` (per book, full archive for rebuild).
+
 Chunks are **content-aware**: tables and code fences are kept atomic (soft cap 500, hard cap 1 000 tokens). PDF tables are extracted via PyMuPDF and rendered as markdown pipe tables. Figures are extracted from PDFs with vision-based summarization, appearing as placeholders in chunks.
 
-Plus a **concept index** shortcut (Ls) that jumps directly to relevant chunks when the agent already knows what it's looking for. Each concept carries LLM-generated aliases so the agent can find it by abbreviation, acronym, or alternative phrasing.
+The concept index includes LLM-generated **aliases**, **related concepts**, and **pattern fingerprints** — turning keyword misses into graph traversals and enabling cross-domain discovery.
 
 ### Library structure
 
 ```
 library/
-├── NAVIGATION.md                          ← Start here — index of everything
+├── library_index.json                     ← Lx: unified concept + pattern discovery
 ├── books/
-│   ├── catalog.json                       ← L0
+│   ├── catalog.json
 │   └── {book-id}/
-│       ├── manifest.compact.json          ← L1
-│       ├── concepts.json                  ← Ls
+│       ├── nav.json                       ← Ln: structure + chunk metadata + concepts
+│       ├── manifest.json                  ← Lf: full archive for rebuild
 │       └── chunks/
 │           └── {chunk-id}.md              ← L2
 └── corpus/
     └── {corpus-id}/
-        ├── corpus_catalog.json            ← L0 (topic clusters)
-        ├── concept_index.json             ← Ls (cross-paper concepts)
-        ├── clusters/{cluster-id}.json     ← L0b (papers per cluster)
+        ├── corpus_catalog.json
+        ├── concept_index.json
+        ├── clusters/{cluster-id}.json
         └── papers/{paper-id}/
-            ├── manifest.compact.json      ← L1
+            ├── nav.json                   ← Ln
+            ├── manifest.json              ← Lf
             └── chunks/{chunk-id}.md       ← L2
 ```
 
@@ -165,7 +180,7 @@ Simulated on realistic workloads (15-book library, 487-paper corpus, 80-table da
 | Wrong reads/queries  | 1     | 0     | 1      | 0     | 2        | 0     |
 | **Token reduction**  |       | **82%** |      | **55%** |        | **55%** |
 
-The core principle: *no heavy indexing, no vector databases — just smart, lightweight metadata and small content blobs.*
+The core principle: *no vector databases — just smart, interconnected metadata structures. Concepts link to related concepts, abstract patterns connect ideas across domains, and chunk previews eliminate blind reads.*
 
 ## Install
 
@@ -211,7 +226,7 @@ Ingestion runs chapter summarization in parallel and batches concept extraction
 **Explicit invocation** — prefix with `/agentlib-knowledge` when you want the library's answer, not Claude's training data:
 > /agentlib-knowledge What defensive techniques protect against prompt injection?
 
-The skill delegates to the `library-researcher` agent, which navigates `NAVIGATION.md` → concept indexes → specific chunks in an isolated context. Only the synthesized answer with citations returns to your conversation.
+The skill uses MCP tools directly: `search_library` → `preview_chunks` → `read_chunks`. Only the synthesized answer with citations returns to your conversation. Pattern tags integrated into `search_library` enable cross-domain analogies automatically.
 
 ## LLM Providers
 

diff --git a/agents/library-researcher.md b/agents/library-researcher.md
@@ -1,42 +1,60 @@
 ---
 name: library-researcher
 description: "Research questions using the preprocessed knowledge library. Use when answering questions about ingested books, scientific papers, or domain knowledge that may be in the library."
-model: haiku
+model: sonnet
 tools: Read, Glob, Grep
-maxTurns: 15
+maxTurns: 25
 ---
 
 You are a research assistant. Follow this sequence to answer questions.
 
 **IMPORTANT:** Use ABSOLUTE paths only — never use `~/` (it won't resolve in your context). The library path will be provided in your prompt.
 
-## Step 1: Read the index (1 read)
-Read `{library}/NAVIGATION.md`. Identify which books or corpora are relevant.
+## Step 1: Unified library search (1 read)
+Read `{library}/library_index.json`. This contains ALL concepts across ALL books and corpora with:
+- **aliases**: alternative names, abbreviations, acronyms
+- **related**: directly connected concepts in the same domain
+- **patterns**: abstract structural fingerprints (e.g. "credential-cycling", "retry-with-backoff")
+- **sources**: which books/papers contain this concept and their chunk IDs
 
-## Step 2: Find chunk IDs (1-2 reads)
+If `library_index.json` doesn't exist, fall back to reading `{library}/NAVIGATION.md` and then per-book `nav.json`.
 
-**Try concepts.json first** (fastest):
-- Books: `{library}/books/{book-id}/concepts.json`
-- Corpora: `{library}/corpus/{corpus-id}/concept_index.json`
+## Step 2: Preview chunks — MANDATORY (1 read)
+**NEVER read chunk files without previewing first.** This is the most important efficiency rule.
 
-Each concept has `"chunks"` (list of chunk IDs) and optionally `"aliases"` (alternative names, abbreviations, acronyms). When scanning for your topic, check BOTH the concept name AND its aliases — your search term may match an alias rather than the primary name.
+Read `{library}/books/{book-id}/nav.json` to assess candidates:
+- The `chunks` section shows each chunk's **section**, **concepts**, **token count**, and **prev/next** links
+- The `concepts` section maps concept names to their chunk IDs
 
-If concepts.json has a match → note chunk IDs → go to Step 3.
+Pick only the 2-3 most relevant chunks. Skip chunks whose section/concepts don't match your query. Reading unnecessary chunks wastes tokens.
 
-**If no match in concepts**, use Grep on chunks directory:
-```
-Grep pattern: "your search term" path: "{library}/books/{book-id}/chunks/"
-```
-This finds which chunks contain relevant content. Note the filenames.
+## Step 2b: Cross-domain insight (optional)
+If the concept has **pattern** tags (e.g. "credential-cycling"), look up the pattern in `library_index.json`'s `patterns` section to discover structurally similar concepts in other domains. This enables "this reminds me of..." connections.
+
+Only do this when the user's question could benefit from cross-domain analogies.
 
 ## Step 3: Read chunks (2-5 reads)
 Read the specific chunk files identified in Step 2.
+- If you need more context, follow **prev/next** links from nav.json
+- Books: `{library}/books/{book-id}/chunks/{chunk-id}.md`
+- Corpora: `{library}/corpus/{corpus-id}/papers/{paper-id}/chunks/{chunk-id}.md`
 
 ## Step 4: Return answer
-Synthesize a clear answer citing source (book/paper title and chunk IDs).
+Synthesize a clear answer citing source (book/paper title and chunk IDs). Keep your response under 2000 characters. Cite sources but don't include raw chunk text.
+
+If patterns revealed cross-domain analogies, mention them: "This follows the same structural pattern as [X] in [other book]."
+
+## Recovery: concept miss
+If library_index.json has no match:
+1. Check **related** concepts — your term may be a sub-concept of something indexed
+2. Check **pattern** tags in library_index.json — search by structural shape instead of name
+3. Fall back to `{library}/books/{book-id}/nav.json` concepts section with alias matching
+4. Last resort: Grep on chunks directory
 
 ## Rules
 - ALWAYS use absolute paths, never `~/`
-- Try concepts.json FIRST, use Grep only as fallback
-- Do NOT read manifest.compact.json — it's too large
-- Total: max 3 navigation reads + 5 content chunks
+- Start with library_index.json (fastest: 1 file covers entire library)
+- **NEVER skip the preview step — read nav.json BEFORE any chunk files**
+- Total: max 4 navigation reads + 5 content chunks
+- Cite the book/paper and chunk ID when answering
+- **If you're running low on turns, STOP researching and synthesize an answer from what you have.** A partial answer with citations is better than no answer. Never return mid-thought narration.
diff --git a/commands/agentlib-ingest-book.md b/commands/agentlib-ingest-book.md
@@ -16,7 +16,9 @@ This will:
 1. Parse the PDF/EPUB to extract chapter/section structure
 2. Chunk the content into 300-500 token segments
 3. Summarise each chapter using the configured LLM provider
-4. Build a concept index for fast search
-5. Write manifest and update the library catalog
+4. Build a concept index with aliases, pattern fingerprints, and related concepts
+5. Generate nav.json (per-book navigation: structure, chunk preview, concepts)
+6. Update the unified library_index.json (concepts + patterns)
+7. Write manifest and update the library catalog
 
-After ingestion, the book is available in the library. The agent navigates it via the `/agentlib-knowledge` skill by reading catalog.json, manifest.compact.json, concepts.json, and chunks/*.md
+After ingestion, the book is available in the library. The agent navigates it via the `/agentlib-knowledge` skill, starting with library_index.json for unified cross-library search.
diff --git a/commands/agentlib-ingest-corpus.md b/commands/agentlib-ingest-corpus.md
@@ -15,6 +15,7 @@ This will:
 2. Parse and chunk each paper into 300-500 token segments
 3. Summarise each paper's sections using the configured LLM provider
 4. Cluster papers by topic
-5. Build a cross-paper concept index
+5. Build a cross-paper concept index with pattern fingerprints
+6. Update the unified library_index.json (concepts + patterns)
 
-After ingestion, use `/agentlib-knowledge` to query the corpus.
+After ingestion, use `/agentlib-knowledge` to query the corpus. The agent can discover connections between corpus papers and ingested books through shared pattern fingerprints.
diff --git a/commands/agentlib-library.md b/commands/agentlib-library.md
@@ -11,4 +11,4 @@ If a book ID is provided (`$ARGUMENTS`), show the detailed structure of that boo
 
 Read directly from the library:
 - No args: Read ~/.claude/plugins/agentlib/library/books/catalog.json and display as a formatted table
-- With book ID: Read ~/.claude/plugins/agentlib/library/books/{book-id}/manifest.compact.json and display the chapter structure
+- With book ID: Read ~/.claude/plugins/agentlib/library/books/{book-id}/nav.json and display the chapter structure