Skip to content

Update wiki-compiler and obsidian-memory skill for graph.json #11

@verkligheten

Description

@verkligheten

Parent Epic

Part of #5 — Integrate Graphify for zero-cost code entity extraction

Task

Update agent instructions so the wiki-compiler leverages graph.json when available (skipping expensive grep-based discovery), and the obsidian-memory skill documents the folder auto-detection feature.

Files

  1. agent_notes/data/agents/wiki-compiler.md
  2. agent_notes/data/skills/obsidian-memory/SKILL.md

Changes to wiki-compiler.md

Add a new section after the existing "## Process" section:

### Pre-extracted graph (when available)

Before starting the Discover step, check if a graph file exists:

    ls raw/*-graph.json

If found, read the graph JSON to accelerate compilation:

1. **Skip Discover** — the graph already contains all code entities with their source locations.
   - `nodes[].label` = entity names (classes, functions, modules)
   - `nodes[].source_file` = which file to read
   - `nodes[].source_location` = line number (e.g., "L42")
   - `nodes[].type` = "class", "function", "module", "rationale"

2. **Use edges for relationships** — no need to grep for cross-references.
   - `edges[].relation` = "calls", "imports", "uses", "inherits", "contains"
   - `edges[].confidence` = "EXTRACTED" (deterministic from AST), "INFERRED", "AMBIGUOUS"

3. **Use communities for grouping** — Leiden algorithm clusters related entities.
   - `communities` = `{community_id: [node_ids]}`
   - `cohesion` = `{community_id: score}` (higher = tighter coupling)

4. **Use god_nodes for priority** — compile the most-connected entities first.
   - `god_nodes[].label` = entity name
   - `god_nodes[].degree` = number of connections

5. **Go directly to Read** — use `source_file` and `source_location` to read the actual code, then write the domain narrative.

This saves significant tokens: entity discovery is free (tree-sitter extracted), and you focus exclusively on semantic enrichment — writing the "why" narratives that AST parsing can't provide.

**Example workflow with graph.json:**

```bash
# Read the graph
cat raw/my-project-graph.json | head -100

# Identify top entities from god_nodes
# Read their source files at the specified locations
# Write concept/entity pages with domain narratives
agent-notes memory add "UserService" "..." entity wiki-compiler

## Changes to obsidian-memory/SKILL.md

Update the ingest workflow section to document folder auto-detection:

```markdown
### Folder ingestion with auto-extraction

When the first argument to `ingest` is a directory path, the CLI automatically:
1. Walks the directory (respects .gitignore, skips __pycache__/.git/node_modules)
2. If graphifyy is installed: runs tree-sitter AST extraction (zero API cost)
3. Discovers code entities (classes, functions, modules) and their relationships
4. Saves extraction as `raw/<slug>-graph.json`
5. Creates entity stub pages for discovered classes and high-connectivity functions
6. Creates concept pages for detected code communities (Leiden algorithm)
7. Falls back to text-only ingestion if graphifyy is not installed

```bash
# Ingest a code project (Graphify auto-extracts if installed)
agent-notes memory ingest /path/to/project "Project summary"
agent-notes memory ingest ./src "Source code analysis"
agent-notes memory ingest ~/code/my-app

# Install graphifyy for zero-cost code extraction
pip install agent-notes[graph]

After folder ingestion, run agent-notes memory lint to see which stub pages need compilation. The wiki-compiler can then leverage the saved graph.json to skip discovery and focus on writing domain narratives.


## Rationale

These are prompt changes, not code changes. They teach the LLM wiki-compiler to:
1. **Check for graph.json first** — if it exists, skip the expensive grep + read discovery loop
2. **Use structural data** — entities, relationships, and communities are already known
3. **Focus on semantics** — the LLM's value is in writing "why" narratives, not in discovering "what" exists

This is where the bulk of the LLM cost savings comes from in practice: the wiki-compiler currently spends 60-70% of its tokens on discovery (grepping, reading files to find entities), and only 30-40% on actually writing the wiki page content.

## Dependencies

- #8 (graph.json must be saved during ingest for these instructions to work)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions