Create code_graph.py extraction boundary module

## Parent Epic

Part of #5 — Integrate Graphify for zero-cost code entity extraction

## Task

Create `agent_notes/services/code_graph.py` — a boundary module that encapsulates all Graphify interaction. No Graphify types leak into the rest of the codebase; every function works with plain Python dicts and Path objects.

## Location

`/agent_notes/services/code_graph.py` (new file, follows existing pattern: `wiki_backend.py`, `memory_backend.py`, `credentials.py`)

## Functions

### 1. `graphify_available() -> bool`

```python
def graphify_available() -> bool:
    """Return True if the graphifyy package is importable."""
    try:
        import graphify.extract  # noqa: F401
        return True
    except ImportError:
        return False
```

### 2. `extract_code_graph(folder_path, *, extensions=None, skip_dirs=None) -> dict`

Core extraction function. Runs tree-sitter parsing via Graphify's Python API.

**Parameters:**
- `folder_path: Path` — directory to scan
- `extensions: set[str] | None` — allowed code extensions (default: `_CODE_EXTENSIONS`)
- `skip_dirs: set[str] | None` — directories to skip (reuse `wiki_backend._SKIP_DIRS`)

**Returns:**
```python
{
    "nodes": [
        {"id": "auth_userservice", "label": "UserService", "source_file": "auth.py",
         "source_location": "L42", "type": "class"}
    ],
    "edges": [
        {"source": "auth_userservice", "target": "payments_gateway",
         "relation": "calls", "confidence": "EXTRACTED"}
    ],
    "communities": {0: ["auth_userservice", "auth_login"], 1: ["payments_gateway"]},
    "cohesion": {0: 0.85, 1: 0.72},
    "god_nodes": [{"label": "UserService", "degree": 12}],
    "stats": {"files_parsed": 5, "nodes": 23, "edges": 41, "communities": 3}
}
```

**Implementation logic:**

```python
def extract_code_graph(folder_path: Path, *, extensions=None, skip_dirs=None):
    from graphify.extract import collect_files, extract
    from graphify.build import build_from_json
    from graphify.cluster import cluster, score_all
    from graphify.analyze import god_nodes

    # Step 1: Collect code files
    code_files = collect_files(folder_path)

    # Step 2: Filter by extensions if specified
    if extensions:
        code_files = [f for f in code_files if f.suffix in extensions]

    # Step 3: Filter by skip_dirs if specified
    if skip_dirs:
        code_files = [f for f in code_files
                      if not any(d in f.parts for d in skip_dirs)]

    if not code_files:
        return _empty_graph()

    # Step 4: Extract AST (zero API cost)
    extraction = extract(code_files)
    if not extraction.get("nodes"):
        return _empty_graph()

    # Step 5: Build graph
    G = build_from_json(extraction)

    # Step 6: Community detection
    communities = cluster(G)
    cohesion = score_all(G, communities)
    gods = god_nodes(G)

    # Step 7: Convert to plain dict
    nodes = [
        {
            "id": n,
            "label": G.nodes[n].get("label", n),
            "source_file": G.nodes[n].get("source_file", ""),
            "source_location": G.nodes[n].get("source_location", ""),
            "type": G.nodes[n].get("file_type", "code"),
        }
        for n in G.nodes
    ]
    edges = [
        {
            "source": u,
            "target": v,
            "relation": d.get("relation", "related"),
            "confidence": d.get("confidence", "EXTRACTED"),
        }
        for u, v, d in G.edges(data=True)
    ]

    return {
        "nodes": nodes,
        "edges": edges,
        "communities": {k: list(v) for k, v in communities.items()},
        "cohesion": {k: v for k, v in cohesion.items()},
        "god_nodes": gods,
        "stats": {
            "files_parsed": len(code_files),
            "nodes": len(nodes),
            "edges": len(edges),
            "communities": len(communities),
        },
    }
```

### 3. `graph_to_wiki_terms(graph_data) -> dict`

Maps Graphify nodes and communities to wiki-compatible entity and concept names.

**Mapping rules:**

| Graphify node | Condition | Wiki type | Example |
|---|---|---|---|
| `class` | any degree | entity | "UserService" |
| `function` (top-level) | degree >= 3 | entity | "process_payment" |
| `function` (method) | skip | — | stays inside class page |
| `module` / file | degree >= 2 | entity | "auth" |
| Leiden community | size >= 2 | concept | "Authentication System" |

**Community naming algorithm:**
1. Collect `source_file` values from all community member nodes
2. Extract common path prefix (e.g., `auth/`, `payments/`)
3. If prefix gives a meaningful directory name → use it title-cased
4. Otherwise → use the highest-degree node's label + "Module" suffix
5. Deduplicate against existing concept names

**Returns:**
```python
{
    "entities": ["UserService", "PaymentGateway", "process_payment"],
    "concepts": ["Authentication", "Payment Processing"],
    "edges_by_entity": {
        "UserService": [
            {"target": "PaymentGateway", "relation": "calls"},
            {"target": "login", "relation": "contains"}
        ]
    }
}
```

**Implementation detail — filtering trivial nodes:**
- Skip nodes whose label starts with `_` (private/internal)
- Skip nodes whose label is `__init__`, `__main__`, `setup`
- Skip `"rationale"` type nodes (Graphify extracts `# NOTE:` comments as rationale nodes)
- Skip file-level module nodes that are just containers (only have `"contains"` edges out)

### 4. `save_graph_json(wiki_root, slug, graph_data) -> Path`

```python
import json

def save_graph_json(wiki_root: Path, slug: str, graph_data: dict) -> Path:
    """Write graph.json to raw/<slug>-graph.json. Returns the path."""
    raw_dir = wiki_root / "raw"
    raw_dir.mkdir(parents=True, exist_ok=True)
    path = raw_dir / f"{slug}-graph.json"
    path.write_text(json.dumps(graph_data, indent=2, default=str))
    return path
```

**Storage rationale**: `raw/` is the immutable source material directory. The graph is derived from source code — it belongs with source data. `.obsidianignore` already excludes `raw/` from Obsidian indexing.

### 5. Helper: `_empty_graph() -> dict`

```python
def _empty_graph():
    return {
        "nodes": [], "edges": [],
        "communities": {}, "cohesion": {},
        "god_nodes": [],
        "stats": {"files_parsed": 0, "nodes": 0, "edges": 0, "communities": 0},
    }
```

### 6. Constant: `_CODE_EXTENSIONS`

```python
_CODE_EXTENSIONS = {
    ".py", ".ts", ".js", ".tsx", ".jsx",
    ".go", ".rs", ".java", ".cpp", ".c", ".h",
    ".rb", ".swift", ".kt", ".cs", ".scala",
    ".php", ".lua", ".groovy", ".jl",
    ".f90", ".pas",
}
```

This matches Graphify's supported tree-sitter languages.

## Potential Issues

1. **Graphify's `collect_files()` vs our file walking**: `collect_files()` has its own filtering logic. We may get different file sets than `wiki_ingest_folder()`. Solution: use our own file list from the walk loop where possible, or at minimum filter `collect_files()` output with our `_SKIP_DIRS` and extensions.

2. **NetworkX graph iteration order**: `G.nodes` and `G.edges(data=True)` iteration order is insertion-order in Python 3.7+, but community assignment is non-deterministic (Leiden uses randomization). This is fine — we only need consistent node IDs, not consistent community assignment.

3. **Large repositories**: `extract()` on a 1000+ file repo could take 10-30 seconds (tree-sitter is fast but not instant). This is acceptable for a one-time ingest operation, but document that large repos may take a moment.

4. **`extract()` with `cache_root`**: The v7 API supports `extract(code_files, cache_root=Path("."))` for caching parsed results. We should pass a cache path to avoid re-parsing on `--update` runs. Use `wiki_root / "raw"` as cache root.

5. **Import safety**: All Graphify imports are lazy (inside function bodies), so `import agent_notes` never fails even when graphifyy isn't installed.

## Dependencies

- #6 (optional dependency must be declared first)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create code_graph.py extraction boundary module #7

Parent Epic

Task

Location

Functions

1. `graphify_available() -> bool`

2. `extract_code_graph(folder_path, *, extensions=None, skip_dirs=None) -> dict`

3. `graph_to_wiki_terms(graph_data) -> dict`

4. `save_graph_json(wiki_root, slug, graph_data) -> Path`

5. Helper: `_empty_graph() -> dict`

6. Constant: `_CODE_EXTENSIONS`

Potential Issues

Dependencies

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Graphify node	Condition	Wiki type	Example
`class`	any degree	entity	"UserService"
`function` (top-level)	degree >= 3	entity	"process_payment"
`function` (method)	skip	—	stays inside class page
`module` / file	degree >= 2	entity	"auth"
Leiden community	size >= 2	concept	"Authentication System"

Create code_graph.py extraction boundary module #7

Description

Parent Epic

Task

Location

Functions

1. graphify_available() -> bool

2. extract_code_graph(folder_path, *, extensions=None, skip_dirs=None) -> dict

3. graph_to_wiki_terms(graph_data) -> dict

4. save_graph_json(wiki_root, slug, graph_data) -> Path

5. Helper: _empty_graph() -> dict

6. Constant: _CODE_EXTENSIONS

Potential Issues

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `graphify_available() -> bool`

2. `extract_code_graph(folder_path, *, extensions=None, skip_dirs=None) -> dict`

3. `graph_to_wiki_terms(graph_data) -> dict`

4. `save_graph_json(wiki_root, slug, graph_data) -> Path`

5. Helper: `_empty_graph() -> dict`

6. Constant: `_CODE_EXTENSIONS`