diff --git a/docs/codedocs/api-reference/embeddings.md b/docs/codedocs/api-reference/embeddings.md
new file mode 100644
index 0000000..b32425b
--- /dev/null
+++ b/docs/codedocs/api-reference/embeddings.md
@@ -0,0 +1,95 @@
+---
+title: "Embeddings"
+description: "Reference for the TF-IDF vectorizer and vector utilities exported by pmll_memory_mcp."
+---
+
+The embeddings module provides the long-term layer's local vectorization primitives.
+
+## Import Path
+
+```python
+from pmll_memory_mcp import TfIdfVectorizer, embed, cosine_similarity
+```
+
+Source file: `mcp/pmll_memory_mcp/embeddings.py`
+
+## `TfIdfVectorizer`
+
+Constructor:
+
+```python
+TfIdfVectorizer() -> None
+```
+
+### Property: `vocab_size`
+
+```python
+vocab_size: int
+```
+
+Current number of terms in the vocabulary.
+
+### `add_document`
+
+```python
+add_document(text: str) -> None
+```
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `text` | `str` | — | Document text to fold into corpus statistics. |
+
+### `vectorize`
+
+```python
+vectorize(text: str) -> list[float]
+```
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `text` | `str` | — | Text to convert into a normalized TF-IDF vector. |
+
+## Functions
+
+### `embed`
+
+```python
+embed(text: str) -> list[float]
+```
+
+Adds the text to the module-level vectorizer and returns its vector.
+
+### `cosine_similarity`
+
+```python
+cosine_similarity(a: list[float], b: list[float]) -> float
+```
+
+Returns a score between `0.0` and `1.0` for aligned non-negative vectors.
+
+## Behavior Notes
+
+- `TfIdfVectorizer` gives you an isolated corpus. That is the right choice when you need reproducible vector dimensions inside one test or workflow.
+- `embed()` uses the module-level singleton managed by `get_vectorizer()` in the source. That is convenient for the graph layer because every new node contributes to the shared vocabulary.
+- `cosine_similarity()` only compares the overlapping vector length. In practice that works because both vectors usually come from the same vectorizer instance.
+
+## Example
+
+```python
+from pmll_memory_mcp import TfIdfVectorizer, embed, cosine_similarity
+
+vectorizer = TfIdfVectorizer()
+vectorizer.add_document("authentication login user")
+vectorizer.add_document("authentication login password")
+
+a = vectorizer.vectorize("authentication login user")
+b = vectorizer.vectorize("authentication login password")
+print(cosine_similarity(a, b))
+print(embed("session cache and semantic search"))
+```
+
+## Notes
+
+- `embed()` uses a module-level shared vectorizer, while `TfIdfVectorizer()` gives you an isolated one.
+- Vector dimensions grow as the vocabulary grows.
+- The module also defines `tokenize()`, `get_vectorizer()`, and `reset_vectorizer()` in `mcp/pmll_memory_mcp/embeddings.py`; they are useful for testing and internals even though `__init__.py` does not re-export them.
diff --git a/docs/codedocs/api-reference/memory-graph.md b/docs/codedocs/api-reference/memory-graph.md
new file mode 100644
index 0000000..95931a0
--- /dev/null
+++ b/docs/codedocs/api-reference/memory-graph.md
@@ -0,0 +1,139 @@
+---
+title: "Memory Graph"
+description: "Reference for the long-term graph functions exported by pmll_memory_mcp."
+---
+
+The memory graph module is the long-term retrieval engine behind semantic search and traversal.
+
+## Import Path
+
+```python
+from pmll_memory_mcp import (
+    upsert_node,
+    create_relation,
+    search_graph,
+    prune_stale_links,
+    add_interlinked_context,
+    retrieve_with_traversal,
+    get_graph_stats,
+    clear_graph,
+)
+```
+
+Source file: `mcp/pmll_memory_mcp/memory_graph.py`
+
+## Functions
+
+### `upsert_node`
+
+```python
+upsert_node(
+    session_id: str,
+    node_type: NodeType,
+    label: str,
+    content: str,
+    metadata: dict[str, str] | None = None,
+) -> MemoryNode
+```
+
+Creates or updates a typed node.
+
+### `create_relation`
+
+```python
+create_relation(
+    session_id: str,
+    source_id: str,
+    target_id: str,
+    relation: RelationType,
+    weight: float | None = None,
+    metadata: dict[str, str] | None = None,
+) -> MemoryEdge | None
+```
+
+Creates a typed edge or updates the weight of an existing duplicate.
+
+### `search_graph`
+
+```python
+search_graph(
+    session_id: str,
+    query: str,
+    max_depth: int = 1,
+    top_k: int = 5,
+    edge_filter: list[RelationType] | None = None,
+) -> GraphSearchResult
+```
+
+Runs semantic search, then neighbor traversal.
+
+### `prune_stale_links`
+
+```python
+prune_stale_links(
+    session_id: str,
+    threshold: float | None = None,
+) -> dict[str, int]
+```
+
+Removes decayed edges and old orphan nodes.
+
+### `add_interlinked_context`
+
+```python
+add_interlinked_context(
+    session_id: str,
+    items: list[dict[str, Any]],
+    auto_link: bool = True,
+) -> dict[str, Any]
+```
+
+Bulk-adds nodes and optional similarity edges.
+
+### `retrieve_with_traversal`
+
+```python
+retrieve_with_traversal(
+    session_id: str,
+    start_node_id: str,
+    max_depth: int = 2,
+    edge_filter: list[RelationType] | None = None,
+) -> list[TraversalResult]
+```
+
+Walks outward from a starting node.
+
+### `get_graph_stats`
+
+```python
+get_graph_stats(session_id: str) -> dict[str, Any]
+```
+
+Returns node, edge, type, and relation counts.
+
+### `clear_graph`
+
+```python
+clear_graph(session_id: str) -> int
+```
+
+Clears the graph for the session and returns the removed object count.
+
+## Example
+
+```python
+from pmll_memory_mcp import (
+    upsert_node,
+    create_relation,
+    search_graph,
+    get_graph_stats,
+)
+
+sid = "api-ref-graph"
+service = upsert_node(sid, "concept", "service", "Processes requests")
+queue = upsert_node(sid, "concept", "queue", "Buffers jobs")
+create_relation(sid, service.id, queue.id, "depends_on")
+
+print(search_graph(sid, "job processing").direct[0].node.label)
+print(get_graph_stats(sid))
+```
diff --git a/docs/codedocs/api-reference/pmmemorystore.md b/docs/codedocs/api-reference/pmmemorystore.md
new file mode 100644
index 0000000..cffbe0a
--- /dev/null
+++ b/docs/codedocs/api-reference/pmmemorystore.md
@@ -0,0 +1,119 @@
+---
+title: "PMMemoryStore"
+description: "Reference for the short-term KV store class exported by pmll_memory_mcp."
+---
+
+`PMMemoryStore` is the short-term session cache exported from `pmll_memory_mcp` and implemented in `mcp/pmll_memory_mcp/kv_store.py`.
+
+## Import Path
+
+```python
+from pmll_memory_mcp import PMMemoryStore
+```
+
+Source file: `mcp/pmll_memory_mcp/kv_store.py`
+
+## Constructor
+
+```python
+PMMemoryStore(silo_size: int = 256) -> None
+```
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `silo_size` | `int` | `256` | Informational silo capacity carried on the instance. |
+
+## Public Methods
+
+### `peek`
+
+```python
+peek(key: str) -> tuple[bool, str \| None, int \| None]
+```
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `key` | `str` | — | Cache key to inspect. |
+
+Returns a tuple `(hit, value, index)`.
+
+Example:
+
+```python
+store = PMMemoryStore()
+print(store.peek("user:1"))
+```
+
+### `set`
+
+```python
+set(key: str, value: str) -> int
+```
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `key` | `str` | — | Cache key to store. |
+| `value` | `str` | — | Resolved string payload. |
+
+Returns the slot index used for the entry.
+
+Example:
+
+```python
+store = PMMemoryStore()
+slot = store.set("user:1", "{"name": "Ada"}")
+print(slot)
+```
+
+### `flush`
+
+```python
+flush() -> int
+```
+
+Returns the number of slots cleared.
+
+Example:
+
+```python
+store = PMMemoryStore()
+store.set("a", "1")
+store.set("b", "2")
+print(store.flush())
+```
+
+### `__len__`
+
+```python
+__len__() -> int
+```
+
+Returns the number of stored slots.
+
+### `__contains__`
+
+```python
+__contains__(key: object) -> bool
+```
+
+Returns `True` when the key exists in the store.
+
+## Common Combined Pattern
+
+```python
+from pmll_memory_mcp import PMMemoryStore
+
+store = PMMemoryStore()
+
+if not store.peek("docs")[0]:
+    store.set("docs", "cached docs payload")
+
+print(len(store), "docs" in store)
+```
+
+## Notes
+
+- Existing keys are updated in place and keep their original slot index.
+- The constructor does not enforce a hard limit on writes.
+- The server wrappers usually create instances indirectly through `get_store(session_id)` in the same source module, which is why application code should think in terms of session lifecycle rather than a single global cache.
+- Because `peek()` only reports resolved values, it pairs naturally with `peek_context()` when you also need to account for in-flight work.
diff --git a/docs/codedocs/api-reference/qpromiseregistry.md b/docs/codedocs/api-reference/qpromiseregistry.md
new file mode 100644
index 0000000..1585db2
--- /dev/null
+++ b/docs/codedocs/api-reference/qpromiseregistry.md
@@ -0,0 +1,114 @@
+---
+title: "QPromiseRegistry"
+description: "Reference for the in-flight promise registry and the peek_context helper."
+---
+
+`QPromiseRegistry` and `peek_context` are the package exports that implement in-flight deduplication.
+
+## Import Paths
+
+```python
+from pmll_memory_mcp import QPromiseRegistry, peek_context
+```
+
+Source files:
+
+- `mcp/pmll_memory_mcp/q_promise_bridge.py`
+- `mcp/pmll_memory_mcp/peek.py`
+
+## `QPromiseRegistry`
+
+Constructor:
+
+```python
+QPromiseRegistry() -> None
+```
+
+### `register`
+
+```python
+register(promise_id: str) -> None
+```
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `promise_id` | `str` | — | Unique identifier for an in-flight operation. |
+
+### `resolve`
+
+```python
+resolve(promise_id: str, payload: str) -> bool
+```
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `promise_id` | `str` | — | Promise to resolve. |
+| `payload` | `str` | — | Final resolved value. |
+
+Returns `True` when the promise existed.
+
+### `peek_promise`
+
+```python
+peek_promise(promise_id: str) -> tuple[bool, str \| None, str \| None]
+```
+
+Returns `(found, status, payload)`.
+
+### `__len__`
+
+```python
+__len__() -> int
+```
+
+### `__contains__`
+
+```python
+__contains__(promise_id: object) -> bool
+```
+
+## `peek_context`
+
+```python
+peek_context(
+    key: str,
+    session_id: str,
+    store: PMMemoryStore,
+    promise_registry: QPromiseRegistry,
+) -> dict[str, Any]
+```
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `key` | `str` | — | Context key to inspect. |
+| `session_id` | `str` | — | Session identifier used by the caller. |
+| `store` | `PMMemoryStore` | — | Session-local KV store. |
+| `promise_registry` | `QPromiseRegistry` | — | Shared registry of in-flight work. |
+
+Return shapes:
+
+- `{"hit": True, "value": str, "index": int}`
+- `{"hit": True, "status": "pending", "promise_id": str}`
+- `{"hit": False}`
+
+The implementation order matters: `peek_context()` checks the store before the registry. If you have already committed a value with `store.set()`, the function returns the cached payload even when an older promise record still exists.
+
+## Example
+
+```python
+from pmll_memory_mcp import PMMemoryStore, QPromiseRegistry, peek_context
+
+store = PMMemoryStore()
+registry = QPromiseRegistry()
+registry.register("session-1:fetch")
+
+print(peek_context("session-1:fetch", "session-1", store, registry))
+registry.resolve("session-1:fetch", "done")
+store.set("session-1:fetch", "done")
+print(peek_context("session-1:fetch", "session-1", store, registry))
+```
+
+## Notes
+
+- The registry does not remove resolved promises automatically.
+- Promise IDs should be namespaced by the caller, usually with the session ID embedded into the key.
diff --git a/docs/codedocs/api-reference/server-tools.md b/docs/codedocs/api-reference/server-tools.md
new file mode 100644
index 0000000..576f93b
--- /dev/null
+++ b/docs/codedocs/api-reference/server-tools.md
@@ -0,0 +1,133 @@
+---
+title: "Server Tools"
+description: "Reference for the MCP-facing tool wrappers in the TypeScript and Python server entry points."
+---
+
+The server layer exposes the package through MCP tool wrappers. There are two relevant source files:
+
+- TypeScript server: `mcp/src/index.ts`
+- Python server: `mcp/pmll_memory_mcp/server.py`
+
+## Canonical MCP Surface
+
+The TypeScript server is the canonical tool surface because it includes all 15 tools advertised by the package manifest and README.
+
+```text
+init
+peek
+set
+resolve
+flush
+graphql
+upsert_memory_node
+create_relation
+search_memory_graph
+prune_stale_links
+add_interlinked_context
+retrieve_with_traversal
+resolve_context
+promote_to_long_term
+memory_status
+```
+
+## Python Wrapper Surface
+
+The Python server exposes direct functions with near-equivalent behavior:
+
+```python
+from pmll_memory_mcp.server import (
+    init,
+    peek,
+    set,
+    resolve,
+    flush,
+    upsert_memory_node,
+    create_memory_relation,
+    search_memory_graph,
+    prune_memory_links,
+    add_interlinked_memory,
+    retrieve_memory_traversal,
+    resolve_memory_context,
+    promote_memory_to_long_term,
+    memory_status,
+)
+```
+
+## Representative Signatures
+
+### Short-term tools
+
+```python
+init(session_id: str, silo_size: int = 256) -> dict[str, Any]
+peek(session_id: str, key: str) -> dict[str, Any]
+set(session_id: str, key: str, value: str) -> dict[str, Any]
+resolve(session_id: str, promise_id: str) -> dict[str, Any]
+flush(session_id: str) -> dict[str, Any]
+```
+
+### Long-term tools
+
+```python
+upsert_memory_node(
+    session_id: str,
+    type: str,
+    label: str,
+    content: str,
+    metadata: dict[str, str] | None = None,
+) -> dict[str, Any]
+
+create_memory_relation(
+    session_id: str,
+    source_id: str,
+    target_id: str,
+    relation: str,
+    weight: float = 1.0,
+) -> dict[str, Any]
+```
+
+### Solution-engine tools
+
+```python
+resolve_memory_context(session_id: str, key: str) -> dict[str, Any]
+promote_memory_to_long_term(
+    session_id: str,
+    key: str,
+    value: str,
+    node_type: str = "concept",
+) -> dict[str, Any]
+memory_status(session_id: str) -> dict[str, Any]
+```
+
+## GraphQL Helper
+
+The GraphQL tool is only implemented in TypeScript. Its underlying helper lives in `mcp/src/graphql.ts`:
+
+```typescript
+executeGraphQL(
+  endpoint: string,
+  operation: string,
+  variables: Record<string, unknown> = {},
+  headers: Record<string, string> = {},
+): Promise<GraphQLResponse>
+```
+
+It also exports `GRAPHQL_QUERY`, `GRAPHQL_MUTATION`, and `GRAPHQL_DEFAULT_VARIABLES`.
+
+## Example
+
+```python
+from pmll_memory_mcp.server import (
+    init,
+    set,
+    upsert_memory_node,
+    resolve_memory_context,
+    memory_status,
+)
+
+sid = "server-tools-demo"
+init(sid)
+set(sid, "recent", "fresh output")
+upsert_memory_node(sid, "note", "recent", "fresh output")
+print(resolve_memory_context(sid, "recent"))
+print(memory_status(sid))
+```
diff --git a/docs/codedocs/api-reference/solution-engine.md b/docs/codedocs/api-reference/solution-engine.md
new file mode 100644
index 0000000..7765ed5
--- /dev/null
+++ b/docs/codedocs/api-reference/solution-engine.md
@@ -0,0 +1,93 @@
+---
+title: "Solution Engine"
+description: "Reference for the hybrid context-resolution functions exported by pmll_memory_mcp."
+---
+
+The solution engine module provides the high-level API for combining short-term and long-term memory.
+
+## Import Path
+
+```python
+from pmll_memory_mcp import resolve_context, promote_to_long_term, get_memory_status
+```
+
+Source file: `mcp/pmll_memory_mcp/solution_engine.py`
+
+## Functions
+
+### `resolve_context`
+
+```python
+resolve_context(
+    session_id: str,
+    key: str,
+    store: PMMemoryStore,
+) -> dict[str, Any]
+```
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `session_id` | `str` | — | Session whose graph should be searched on cache miss. |
+| `key` | `str` | — | Lookup key or natural-language query. |
+| `store` | `PMMemoryStore` | — | Short-term store checked first. |
+
+Return shape:
+
+```python
+{"source": "short_term" | "long_term" | "miss", "value": str | None, "score": float}
+```
+
+### `promote_to_long_term`
+
+```python
+promote_to_long_term(
+    session_id: str,
+    key: str,
+    value: str,
+    node_type: NodeType = "concept",
+    metadata: dict[str, str] | None = None,
+) -> dict[str, Any]
+```
+
+Returns `{"promoted": True, "node_id": str}`.
+
+### `get_memory_status`
+
+```python
+get_memory_status(
+    session_id: str,
+    store: PMMemoryStore,
+) -> dict[str, Any]
+```
+
+Returns:
+
+```python
+{
+    "short_term": {"slots": int, "silo_size": int},
+    "long_term": {"nodes": int, "edges": int, "types": dict[str, int]},
+    "promotion_threshold": 3,
+}
+```
+
+## Example
+
+```python
+from pmll_memory_mcp import PMMemoryStore, promote_to_long_term, resolve_context, get_memory_status
+
+sid = "api-ref-solution"
+store = PMMemoryStore()
+store.set("recent", "fresh summary")
+promote_to_long_term(sid, "recent", "fresh summary", "note")
+
+print(resolve_context(sid, "recent", store))
+print(get_memory_status(sid, store))
+```
+
+## Notes
+
+- `resolve_context()` always prefers the short-term store over graph search.
+- `promotion_threshold` is informational in the current implementation.
+- `promote_to_long_term()` is a thin wrapper around `upsert_node()`, so repeated promotions of the same `(label, type)` pair update the existing node instead of creating duplicates.
+- `get_memory_status()` is safe to call for diagnostics because it does not mutate either memory layer.
+- In the tested workflow under `mcp/tests/test_solution_engine.py`, the most important invariant is short-term priority: if both the cache and graph contain a value for the same logical key, the cache wins.
diff --git a/docs/codedocs/architecture.md b/docs/codedocs/architecture.md
new file mode 100644
index 0000000..386f218
--- /dev/null
+++ b/docs/codedocs/architecture.md
@@ -0,0 +1,101 @@
+---
+title: "Architecture"
+description: "Understand how the pmll-memory-mcp package inside the PPM repository is structured internally."
+---
+
+The stable architecture in this repository lives under `mcp/`: a reusable Python package in `mcp/pmll_memory_mcp`, a matching TypeScript implementation in `mcp/src`, and thin server entry points that expose both layers as MCP tools.
+
+```mermaid
+graph TD
+  A[Agent or app] --> B[Python API<br/>mcp/pmll_memory_mcp/__init__.py]
+  A --> C[TypeScript MCP server<br/>mcp/src/index.ts]
+  A --> D[Python MCP server<br/>mcp/pmll_memory_mcp/server.py]
+  B --> E[KV store<br/>kv_store.py]
+  B --> F[Q-promise registry<br/>q_promise_bridge.py]
+  B --> G[Peek guard<br/>peek.py]
+  B --> H[Embeddings<br/>embeddings.py]
+  B --> I[Memory graph<br/>memory_graph.py]
+  B --> J[Solution engine<br/>solution_engine.py]
+  C --> E2[kv-store.ts]
+  C --> F2[q-promise-bridge.ts]
+  C --> G2[peek.ts]
+  C --> H2[embeddings.ts]
+  C --> I2[memory-graph.ts]
+  C --> J2[solution-engine.ts]
+  C --> K2[graphql.ts]
+```
+
+## Module Layout
+
+- `mcp/pmll_memory_mcp/__init__.py` is the Python package entry point. It re-exports the reusable classes and functions that matter to application code.
+- `mcp/src/index.ts` is the Node entry point. It is primarily a server bootstrapper that wires tools onto `McpServer`.
+- `mcp/pmll_memory_mcp/server.py` is the Python MCP wrapper. It mirrors most of the tool surface, but not perfectly.
+- `mcp/tests/` is important for understanding intent. The tests confirm session isolation, cache-first resolution, traversal behavior, and long-term promotion.
+
+## Data Flow
+
+The request lifecycle is intentionally layered:
+
+1. A caller starts with a session ID.
+2. Short-term lookup happens first through `PMMemoryStore.peek()` or the higher-level `peek_context()`.
+3. If there is an in-flight operation, `QPromiseRegistry.peek_promise()` reports a pending state instead of triggering duplicate work.
+4. On a miss, the caller performs the expensive work, then stores the result with `set()`.
+5. Important results can be promoted into the long-term graph, where `upsert_node()`, `create_relation()`, `search_graph()`, and `retrieve_with_traversal()` operate.
+6. `resolve_context()` in `solution_engine.py` always prefers short-term memory and falls back to graph search only when the cache misses.
+
+```mermaid
+sequenceDiagram
+  participant Caller
+  participant Store as PMMemoryStore
+  participant Promises as QPromiseRegistry
+  participant Graph as MemoryGraph
+
+  Caller->>Store: peek(key)
+  alt KV hit
+    Store-->>Caller: value
+  else KV miss
+    Caller->>Promises: peek_promise(key)
+    alt pending
+      Promises-->>Caller: pending
+    else full miss
+      Caller->>Caller: perform expensive work
+      Caller->>Store: set(key, value)
+      Caller->>Graph: upsert_node(...) or promote_to_long_term(...)
+      Graph-->>Caller: persistent node id
+    end
+  end
+```
+
+## Key Design Decisions
+
+### Session-scoped registries instead of global shared memory
+
+`kv_store.py` stores session state in `_session_stores`, and `memory_graph.py` does the same with `_graph_stores`. That isolates independent agent tasks without introducing external databases or cross-task contamination. It also makes the package trivial to test, because each test can clear module state and start fresh.
+
+### Simple in-memory data structures instead of infrastructure dependencies
+
+`embeddings.py` implements TF-IDF and cosine similarity directly rather than depending on a hosted embedding provider. That choice keeps the package runnable in CI, local shells, and air-gapped environments. The trade-off is that semantic quality depends on the local corpus and token overlap, not a pre-trained language model.
+
+### Short-term and long-term layers stay separate until the solution engine
+
+The repo avoids hiding both concerns inside one class. `kv_store.py` and `memory_graph.py` stay independently usable, while `solution_engine.py` performs the policy decision of "short-term first, graph second." That is a clean separation, and it is visible directly in the tests under `mcp/tests/test_solution_engine.py`.
+
+### Server wrappers stay thin
+
+The Python server in `mcp/pmll_memory_mcp/server.py` mostly marshals parameters and returns dicts. The TypeScript server in `mcp/src/index.ts` does the same with `zod` schemas and MCP response envelopes. The actual behavior stays in the lower-level modules, which is why those modules are the right place to read when you need to understand correctness.
+
+## Important Implementation Details
+
+- `PMMemoryStore.set()` is append-like for new keys and in-place for existing keys. Slot indexes stay stable after updates.
+- `peek_context()` in both languages checks the KV layer before the promise layer. That means resolved cache hits always win over pending work.
+- `memory_graph.py` uses a decayed edge score computed by `_decay_weight()`, then combines that with cosine similarity during traversal. That is why graph results can degrade over time even if node content never changes.
+- `add_interlinked_context()` links newly created nodes to each other and then to up to 200 existing nodes. The 200-node cap is a deliberate bound in the source to keep bulk operations from ballooning.
+
+## Source-Level Differences Between Python and TypeScript
+
+One detail worth calling out: the TypeScript server is the canonical 15-tool MCP surface, because it includes the GraphQL tool in `mcp/src/index.ts` and `mcp/src/graphql.ts`. The Python server is close, but it omits GraphQL and renames several wrappers:
+
+- TypeScript: `create_relation`, `prune_stale_links`, `add_interlinked_context`, `retrieve_with_traversal`, `resolve_context`
+- Python wrappers: `create_memory_relation`, `prune_memory_links`, `add_interlinked_memory`, `retrieve_memory_traversal`, `resolve_memory_context`
+
+That mismatch is visible in `mcp/src/index.ts` versus `mcp/pmll_memory_mcp/server.py`. If you are documenting or scripting MCP tools, prefer the TypeScript tool names. If you are importing functions in Python, use the reusable package exports from `pmll_memory_mcp`.
diff --git a/docs/codedocs/guides/cache-patterns.md b/docs/codedocs/guides/cache-patterns.md
new file mode 100644
index 0000000..6a771fb
--- /dev/null
+++ b/docs/codedocs/guides/cache-patterns.md
@@ -0,0 +1,99 @@
+---
+title: "Cache Patterns"
+description: "Implement the intended init → peek → set → flush short-term memory workflow."
+---
+
+This guide shows the core short-term pattern the repository is built around: initialize a session, check the cache before doing expensive work, populate the cache on a miss, and flush the session when the task is done.
+
+<Steps>
+<Step>
+
+### Initialize the session
+
+Using the Python tool wrappers keeps the example runnable as plain application code.
+
+```python
+from pmll_memory_mcp.server import init
+
+session = init("agent-task-1", silo_size=32)
+print(session)
+```
+
+</Step>
+<Step>
+
+### Check the cache before the expensive call
+
+```python
+from pmll_memory_mcp.server import peek
+
+result = peek("agent-task-1", "https://example.com/pricing")
+print(result)
+```
+
+</Step>
+<Step>
+
+### Run the expensive work only on a miss and persist the result
+
+```python
+from pmll_memory_mcp.server import peek, set
+
+key = "https://example.com/pricing"
+result = peek("agent-task-1", key)
+
+if not result["hit"]:
+    page_html = "<html>pricing page payload</html>"
+    store_result = set("agent-task-1", key, page_html)
+    print(store_result)
+
+print(peek("agent-task-1", key))
+```
+
+</Step>
+<Step>
+
+### Flush the session at task completion
+
+```python
+from pmll_memory_mcp.server import flush
+
+print(flush("agent-task-1"))
+```
+
+</Step>
+</Steps>
+
+## Complete Runnable Example
+
+```python
+from pmll_memory_mcp.server import init, peek, set, flush
+
+session_id = "guide-cache"
+init(session_id, silo_size=16)
+
+key = "url:https://example.com/docs"
+first = peek(session_id, key)
+if not first["hit"]:
+    payload = "rendered docs html"
+    set(session_id, key, payload)
+
+second = peek(session_id, key)
+print(first)
+print(second)
+print(flush(session_id))
+```
+
+Expected output:
+
+```text
+{'hit': False}
+{'hit': True, 'value': 'rendered docs html', 'index': 0}
+{'status': 'flushed', 'cleared_count': 1}
+```
+
+## Why This Pattern Works
+
+The implementation in `mcp/pmll_memory_mcp/peek.py` is intentionally narrow. It does not try to be a full cache framework. Instead, it gives you a safe gate in front of repeated work. That simplicity is why it is cheap enough to call before every expensive operation.
+
+<Callout type="warn">Do not skip `flush()` in long-lived processes unless you also add explicit session cleanup. The source uses unbounded in-memory registries, so abandoned session IDs will hold onto memory until you clear them.</Callout>
diff --git a/docs/codedocs/guides/install-and-run.md b/docs/codedocs/guides/install-and-run.md
new file mode 100644
index 0000000..a4af252
--- /dev/null
+++ b/docs/codedocs/guides/install-and-run.md
@@ -0,0 +1,130 @@
+---
+title: "Install and Run"
+description: "Install the package, start the MCP server, and verify the reusable Python API."
+---
+
+This guide covers the practical setup path that matches the source repository: install the published package, run the MCP server over stdio, and verify the underlying Python API with a short local script.
+
+<Steps>
+<Step>
+
+### Install the package
+
+Choose the runtime you need.
+
+" "pip"]}>
+<Tab value="npm">
+
+```bash
+npm install pmll-memory-mcp
+```
+
+</Tab>
+<Tab value="pnpm">
+
+```bash
+pnpm add pmll-memory-mcp
+```
+
+</Tab>
+<Tab value="yarn">
+
+```bash
+yarn add pmll-memory-mcp
+```
+
+</Tab>
+<Tab value="bun">
+
+```bash
+bun add pmll-memory-mcp
+```
+
+</Tab>
+<Tab value="pip">
+
+```bash
+pip install pmll-memory-mcp
+```
+
+</Tab>
+</Tabs>
+
+</Step>
+<Step>
+
+### Start the MCP server
+
+For the Node implementation documented in `mcp/src/index.ts`, run:
+
+```bash
+npx pmll-memory-mcp
+```
+
+For the Python implementation documented in `mcp/pmll_memory_mcp/server.py`, run:
+
+```bash
+python -m pmll_memory_mcp.server
+```
+
+Both commands use stdio transport.
+
+</Step>
+<Step>
+
+### Register it with an MCP client
+
+Use the npm server as the canonical config because it matches the 15-tool TypeScript surface:
+
+```json
+{
+  "mcpServers": {
+    "pmll-memory-mcp": {
+      "command": "npx",
+      "args": ["pmll-memory-mcp"]
+    }
+  }
+}
+```
+
+</Step>
+<Step>
+
+### Verify the reusable Python API
+
+This script exercises the same logic without an MCP transport:
+
+```python
+from pmll_memory_mcp import (
+    PMMemoryStore,
+    QPromiseRegistry,
+    peek_context,
+    upsert_node,
+    resolve_context,
+)
+
+session_id = "verify-install"
+store = PMMemoryStore()
+registry = QPromiseRegistry()
+
+print(peek_context("search:docs", session_id, store, registry))
+store.set("search:docs", "cached docs result")
+upsert_node(session_id, "note", "docs", "cached docs result")
+print(resolve_context(session_id, "docs", store))
+```
+
+Expected output:
+
+```text
+{'hit': False}
+{'source': 'long_term', 'value': 'cached docs result', 'score': 1.0}
+```
+
+</Step>
+</Steps>
+
+## Why This Setup Order Matters
+
+The package is designed so the reusable modules remain usable without the server, but the intended operational path is still "server first" for agents. That is why the TypeScript manifest exposes a `bin`, while the Python package also exports plain classes and functions from `__init__.py`.
+
+<Callout type="info">If you only need the library surface, stop after the `pip install` step and import from <code>pmll_memory_mcp</code>. If you need MCP tools, use the Node server unless you specifically want the Python wrapper names.</Callout>
diff --git a/docs/codedocs/guides/semantic-memory-workflows.md b/docs/codedocs/guides/semantic-memory-workflows.md
new file mode 100644
index 0000000..1aabd08
--- /dev/null
+++ b/docs/codedocs/guides/semantic-memory-workflows.md
@@ -0,0 +1,109 @@
+---
+title: "Semantic Memory Workflows"
+description: "Build a hybrid workflow that writes to the graph, links nodes, searches semantically, and resolves context."
+---
+
+This guide covers a realistic workflow for the long-term layer: ingest related knowledge, connect it, search it semantically, and then use the solution engine as the stable read API.
+
+<Steps>
+<Step>
+
+### Seed a session graph
+
+```python
+from pmll_memory_mcp import upsert_node, create_relation
+
+session_id = "workflow"
+
+api = upsert_node(session_id, "file", "api.py", "Defines the public HTTP client")
+auth = upsert_node(session_id, "symbol", "authenticate", "Creates and refreshes tokens")
+docs = upsert_node(session_id, "note", "auth docs", "Login, refresh, and token storage guidance")
+
+create_relation(session_id, api.id, auth.id, "contains")
+create_relation(session_id, auth.id, docs.id, "references")
+```
+
+</Step>
+<Step>
+
+### Add a batch of related context
+
+```python
+from pmll_memory_mcp import add_interlinked_context
+
+add_interlinked_context(
+    session_id,
+    [
+        {"type": "concept", "label": "token refresh", "content": "Rotates expired access tokens"},
+        {"type": "concept", "label": "session cache", "content": "Stores fresh auth responses for one task"},
+    ],
+    auto_link=True,
+)
+```
+
+</Step>
+<Step>
+
+### Search semantically and inspect neighbors
+
+```python
+from pmll_memory_mcp import search_graph, retrieve_with_traversal
+
+search = search_graph(session_id, "how do tokens refresh", max_depth=1, top_k=3)
+top = search.direct[0]
+
+print(top.node.label, top.relevance_score)
+for item in retrieve_with_traversal(session_id, top.node.id, max_depth=2):
+    print(item.depth, item.node.label, item.path_relations)
+```
+
+</Step>
+<Step>
+
+### Promote and resolve through the solution engine
+
+```python
+from pmll_memory_mcp import PMMemoryStore, promote_to_long_term, resolve_context
+
+store = PMMemoryStore()
+store.set("auth-response", "fresh login payload")
+
+promote_to_long_term(session_id, "auth-response", "fresh login payload", "note")
+print(resolve_context(session_id, "token refresh", store))
+```
+
+</Step>
+</Steps>
+
+## Complete Runnable Example
+
+```python
+from pmll_memory_mcp import (
+    PMMemoryStore,
+    upsert_node,
+    create_relation,
+    search_graph,
+    promote_to_long_term,
+    resolve_context,
+)
+
+session_id = "semantic-guide"
+store = PMMemoryStore()
+
+node_a = upsert_node(session_id, "concept", "billing", "Tracks invoices and usage")
+node_b = upsert_node(session_id, "concept", "receipts", "Stores payment receipts")
+create_relation(session_id, node_a.id, node_b.id, "relates_to")
+
+store.set("billing-cache", "latest billing summary")
+promote_to_long_term(session_id, "billing-cache", "latest billing summary", "note")
+
+result = search_graph(session_id, "payment usage")
+print(result.direct[0].node.label)
+print(resolve_context(session_id, "billing-cache", store))
+```
+
+## When To Use This Pattern
+
+Use this workflow when you need memory that survives a single task and remains queryable by meaning, not just by exact key. The source design makes it especially useful for codebase notes, tool outputs, and cross-step summaries.
+
+<Callout type="warn">Search quality depends heavily on the descriptive quality of `label` and `content`. If you only store opaque IDs or one-word fragments, TF-IDF similarity will be weak and auto-linking will be noisy.</Callout>
diff --git a/docs/codedocs/index.md b/docs/codedocs/index.md
new file mode 100644
index 0000000..541e460
--- /dev/null
+++ b/docs/codedocs/index.md
@@ -0,0 +1,128 @@
+---
+title: "Getting Started"
+description: "Start using the stable memory-layer package inside the PPM repository: pmll-memory-mcp."
+---
+
+`pmll-memory-mcp` is the reusable memory subsystem inside `/drqedwards/ppm`, combining a session-scoped KV cache, Q-promise deduplication, and a semantic memory graph for MCP agents.
+
+## The Problem
+
+- Agent workflows repeat the same expensive tool calls because there is no shared short-term memory between steps.
+- Pure key-value caches are fast, but they cannot answer fuzzy or semantic lookups once a session ends.
+- Long-term memory systems often depend on external embedding APIs, which makes local and air-gapped setups harder to run.
+- Tool wrappers and memory logic usually get coupled together, which makes it difficult to test the core algorithms without running a full server.
+
+## The Solution
+
+Inside this repository, the stable package is `mcp/pmll_memory_mcp` for Python, with a matching TypeScript implementation in `mcp/src`. The design splits memory into short-term and long-term layers: `PMMemoryStore` handles deterministic session-local cache slots, `QPromiseRegistry` tracks in-flight work, and the memory graph adds semantic retrieval and traversal on top.
+
+```python
+from pmll_memory_mcp import (
+    PMMemoryStore,
+    QPromiseRegistry,
+    peek_context,
+    promote_to_long_term,
+    resolve_context,
+)
+
+store = PMMemoryStore()
+promises = QPromiseRegistry()
+session_id = "demo-session"
+
+print(peek_context("docs:index", session_id, store, promises))
+store.set("docs:index", "cached home page payload")
+print(resolve_context(session_id, "docs:index", store))
+print(promote_to_long_term(session_id, "docs:index", "cached home page payload"))
+```
+
+## Installation
+
+" "bun"]}>
+<Tab value="npm">
+
+```bash
+npm install pmll-memory-mcp
+```
+
+</Tab>
+<Tab value="pnpm">
+
+```bash
+pnpm add pmll-memory-mcp
+```
+
+</Tab>
+<Tab value="yarn">
+
+```bash
+yarn add pmll-memory-mcp
+```
+
+</Tab>
+<Tab value="bun">
+
+```bash
+bun add pmll-memory-mcp
+```
+
+</Tab>
+</Tabs>
+
+Python package:
+
+```bash
+pip install pmll-memory-mcp
+```
+
+Supported runtimes from the source manifests are Python `>=3.11` in `mcp/pyproject.toml` and Node.js `>=18` for the TypeScript server path in `mcp/package.json`.
+
+## Quick Start
+
+The minimum working example below uses the Python package surface exported from `mcp/pmll_memory_mcp/__init__.py`.
+
+```python
+from pmll_memory_mcp import (
+    PMMemoryStore,
+    QPromiseRegistry,
+    peek_context,
+    upsert_node,
+    resolve_context,
+)
+
+session_id = "quickstart"
+store = PMMemoryStore()
+promises = QPromiseRegistry()
+
+print(peek_context("auth-flow", session_id, store, promises))
+
+store.set("auth-flow", "cached login sequence")
+print(peek_context("auth-flow", session_id, store, promises))
+
+upsert_node(session_id, "concept", "authentication", "cached login sequence")
+print(resolve_context(session_id, "authentication", store))
+```
+
+Expected output:
+
+```text
+{'hit': False}
+{'hit': True, 'value': 'cached login sequence', 'index': 0}
+{'source': 'long_term', 'value': 'cached login sequence', 'score': 1.0}
+```
+
+## Key Features
+
+- Session-isolated KV storage via `PMMemoryStore` in `mcp/pmll_memory_mcp/kv_store.py`
+- In-flight deduplication via `QPromiseRegistry` and `peek_context`
+- Dependency-free TF-IDF embeddings in `mcp/pmll_memory_mcp/embeddings.py`
+- Long-term graph search, traversal, and edge decay in `mcp/pmll_memory_mcp/memory_graph.py`
+- A solution engine that resolves from short-term first, then semantic long-term memory
+- Two server implementations: TypeScript in `mcp/src/index.ts` and Python in `mcp/pmll_memory_mcp/server.py`
+
+<Callout type="info">The PPM repository also contains C, CUDA, and older experimental MCP code. These docs focus on the package-quality memory server under <code>mcp/</code>, because that is the part with coherent manifests, tests, and reusable exports.</Callout>
+
+<Cards>
+  <Card title="Architecture" href="/docs/architecture">See how the Python package, TypeScript server, and memory layers fit together.</Card>
+  <Card title="Core Concepts" href="/docs/kv-silo">Understand the short-term cache, promise registry, graph, and solution engine.</Card>
+  <Card title="API Reference" href="/docs/api-reference/pmmemorystore">Review classes, functions, tool wrappers, signatures, and source locations.</Card>
+</Cards>
diff --git a/docs/codedocs/kv-silo.md b/docs/codedocs/kv-silo.md
new file mode 100644
index 0000000..f6138db
--- /dev/null
+++ b/docs/codedocs/kv-silo.md
@@ -0,0 +1,82 @@
+---
+title: "KV Silo"
+description: "Learn how short-term memory works through PMMemoryStore and the session-scoped silo model."
+---
+
+The KV silo is the fast path in this package. `PMMemoryStore` in `mcp/pmll_memory_mcp/kv_store.py` stores resolved values by key and assigns each new key a stable slot index, mirroring the `memory_silo_t` idea referenced throughout the code comments.
+
+## What It Is
+
+`PMMemoryStore` is a per-session cache with three core operations:
+
+- `peek(key)` checks whether a value is already resolved
+- `set(key, value)` stores or updates a slot
+- `flush()` clears the whole session
+
+The point is not sophisticated cache eviction. The point is deterministic, cheap memory for one agent task.
+
+## How It Relates To Other Concepts
+
+- `peek_context()` builds on top of the store and adds the Q-promise pending check.
+- `resolve_context()` uses the store as layer one before searching the semantic graph.
+- The server wrappers call `get_store(session_id)` so every MCP session gets its own silo.
+
+## How It Works Internally
+
+In `mcp/pmll_memory_mcp/kv_store.py`, the internal `_KVSlot` dataclass stores `index`, `key`, `value`, and `resolved`. `PMMemoryStore.set()` checks whether the key already exists. If it does, the existing slot is updated in place and the original index is preserved. If it does not, the new slot gets `len(self._slots)` as its index, so insertion order becomes slot order.
+
+The registry below the class is just as important as the class itself. `_session_stores` maps each `session_id` to a store instance, and `get_store()` lazily creates the store the first time a session touches memory. That pattern is why tests such as `mcp/tests/test_server.py` can prove that two sessions do not see each other's values.
+
+```mermaid
+flowchart TD
+  A[session_id + key] --> B[get_store(session_id)]
+  B --> C{Store exists?}
+  C -->|No| D[Create PMMemoryStore]
+  C -->|Yes| E[Reuse existing store]
+  D --> F[peek or set]
+  E --> F[peek or set]
+  F --> G{Existing key?}
+  G -->|Yes| H[Update value, keep index]
+  G -->|No| I[Allocate next index]
+```
+
+## Basic Usage
+
+```python
+from pmll_memory_mcp import PMMemoryStore
+
+store = PMMemoryStore(silo_size=4)
+print(store.peek("page:/docs"))
+
+slot = store.set("page:/docs", "<html>cached</html>")
+print(slot)
+print(store.peek("page:/docs"))
+```
+
+## Advanced Usage
+
+This pattern mirrors how the server uses the store for isolated sessions.
+
+```python
+from pmll_memory_mcp.kv_store import get_store, drop_store
+
+store_a = get_store("session-a", silo_size=8)
+store_b = get_store("session-b", silo_size=8)
+
+store_a.set("shared-key", "value-for-a")
+
+print(store_a.peek("shared-key"))
+print(store_b.peek("shared-key"))
+print(drop_store("session-a"))
+```
+
+<Callout type="warn">`silo_size` is stored on the class, but the current implementation does not enforce a hard capacity. If you need bounded memory, you must add your own eviction policy instead of assuming the constructor argument will cap writes.</Callout>
+
+<Accordions>
+<Accordion title="Why stable slot indexes are useful">
+The store preserves the original slot index when a key is updated. That matters because the code is modeling the older PMLL silo semantics rather than a generic dictionary cache. When you use the returned index in logs or diagnostics, the identity of the slot stays stable even if the payload changes. The trade-off is that the index is an implementation detail, not a durable external identifier, so you should not persist it across process restarts.
+</Accordion>
+<Accordion title="Why the store is simple instead of feature-rich">
+The module does not implement TTL, LRU, or size-based eviction. That simplicity keeps the hot path small and predictable, and it matches how the server expects callers to use `flush()` at task completion. The downside is that long-running sessions can accumulate values indefinitely. If your application keeps one session open for a long time, add explicit lifecycle management around `flush()` or wrap the store with your own policy.
+</Accordion>
+</Accordions>
diff --git a/docs/codedocs/mcp-tools.md b/docs/codedocs/mcp-tools.md
new file mode 100644
index 0000000..e70bc1d
--- /dev/null
+++ b/docs/codedocs/mcp-tools.md
@@ -0,0 +1,105 @@
+---
+title: "MCP Tools"
+description: "Understand the server-facing tool layer and the differences between the TypeScript and Python wrappers."
+---
+
+The MCP tool layer is how this package is meant to be used by agents. The core logic lives in reusable modules, but `mcp/src/index.ts` and `mcp/pmll_memory_mcp/server.py` convert that logic into callable MCP tools.
+
+## What It Is
+
+There are two server implementations in the repository:
+
+- TypeScript server: `mcp/src/index.ts`
+- Python server: `mcp/pmll_memory_mcp/server.py`
+
+The TypeScript server is the more complete surface. It wires `McpServer`, validates inputs with `zod`, and includes the GraphQL bridge from `mcp/src/graphql.ts`. The Python server uses `FastMCP` and exposes most of the same capabilities as plain tool functions.
+
+## Why It Exists
+
+Most callers do not want to instantiate the low-level objects manually. They want an MCP server that:
+
+- initializes a session
+- checks memory before expensive work
+- persists important results
+- exposes semantic search and traversal tools
+
+The server layer packages those decisions into a tool API.
+
+## How It Relates To Other Concepts
+
+- The short-term tools wrap `PMMemoryStore`, `QPromiseRegistry`, and `peek_context()`.
+- The long-term tools wrap `memory_graph.py` or `memory-graph.ts`.
+- The solution-engine tools wrap `resolve_context()`, `promote_to_long_term()`, and `get_memory_status()`.
+
+## How It Works Internally
+
+The Node entry point builds a single `server` instance, registers each tool with a schema, and then starts stdio transport in `main()`. The Python entry point uses decorators from `FastMCP` and then calls `mcp.run()` in its own `main()`.
+
+The tool handlers themselves are intentionally thin. For example, the TypeScript `peek` tool simply calls `getStore(session_id)` and then `peekContext(key, session_id, store, _promiseRegistry)`. The Python `peek()` wrapper does the same with the package imports. That is why debugging behavior almost always sends you back into the lower-level modules, not the wrappers.
+
+```mermaid
+graph TD
+  A[Agent] --> B[init]
+  B --> C[peek]
+  C -->|hit| D[Use cached or pending result]
+  C -->|miss| E[Call expensive tool]
+  E --> F[set]
+  F --> G[promote_to_long_term]
+  G --> H[resolve_context or memory_status]
+```
+
+## Basic Usage
+
+Start the Node server over stdio:
+
+```bash
+npx pmll-memory-mcp
+```
+
+Or call the Python wrappers directly:
+
+```python
+from pmll_memory_mcp.server import init, peek, set, flush
+
+init("session-1", silo_size=256)
+print(peek("session-1", "https://example.com"))
+set("session-1", "https://example.com", "<html>cached</html>")
+print(peek("session-1", "https://example.com"))
+print(flush("session-1"))
+```
+
+## Advanced Usage
+
+The TypeScript server adds a GraphQL tool that can cache network results inside the short-term store.
+
+```typescript
+import { executeGraphQL, GRAPHQL_QUERY } from "./graphql.js";
+
+const result = await executeGraphQL(
+  "https://example.com/graphql",
+  GRAPHQL_QUERY,
+  { first: 10, offset: 0 },
+  { Authorization: "Bearer token" },
+);
+
+console.log(result.data);
+```
+
+## Tool Surface Differences
+
+The two wrappers are close, but not identical:
+
+- The TypeScript server exposes `graphql`; the Python server does not.
+- The TypeScript server uses the short names from the core modules: `create_relation`, `prune_stale_links`, `add_interlinked_context`, `retrieve_with_traversal`, `resolve_context`.
+- The Python server renames several wrappers to `create_memory_relation`, `prune_memory_links`, `add_interlinked_memory`, `retrieve_memory_traversal`, and `resolve_memory_context`.
+
+<Callout type="warn">If you are publishing an MCP configuration or teaching an agent workflow, use the TypeScript tool names from <code>mcp/src/index.ts</code>. The Python wrapper names are useful for direct imports and tests, but they are not the canonical cross-language contract.</Callout>
+
+<Accordions>
+<Accordion title="Why the wrappers stay thin">
+The server modules are mostly transport adapters. That keeps business logic in normal package modules where it is easy to test and reuse without a running MCP transport. It also means there is little risk that the Python and TypeScript wrappers drift in algorithmic behavior. The downside is that the wrappers can still drift in naming and coverage, which is exactly what happened with the GraphQL tool and several Python wrapper names.
+</Accordion>
+<Accordion title="Trade-off of exposing both Python and TypeScript servers">
+Shipping both implementations increases reach: Python users get direct imports and FastMCP integration, while Node users get the canonical MCP package published to npm. It also serves as a reference implementation pair, which helps when validating behavior. The cost is documentation and maintenance overhead because the public tool names are not perfectly aligned. Any production integration should standardize on one wrapper instead of mixing both surfaces casually.
+</Accordion>
+</Accordions>
diff --git a/docs/codedocs/q-promises.md b/docs/codedocs/q-promises.md
new file mode 100644
index 0000000..5878144
--- /dev/null
+++ b/docs/codedocs/q-promises.md
@@ -0,0 +1,94 @@
+---
+title: "Q-Promises"
+description: "Understand how in-flight work is deduplicated with QPromiseRegistry and peek_context."
+---
+
+The Q-promise layer exists to answer a different question than the KV silo. The store tells you whether a value is already resolved. `QPromiseRegistry` tells you whether the same work is already in flight, so you can avoid launching a duplicate request.
+
+## What It Is
+
+`QPromiseRegistry` in `mcp/pmll_memory_mcp/q_promise_bridge.py` is a lightweight registry of promise IDs with two states:
+
+- `pending`
+- `resolved`
+
+`peek_context()` in `mcp/pmll_memory_mcp/peek.py` combines the registry with `PMMemoryStore` and returns one of three shapes:
+
+- a KV hit
+- a pending promise hit
+- a full miss
+
+## Why It Exists
+
+Agent systems do not only repeat completed work. They also repeat work that has already started but has not finished yet. Without the pending check, two concurrent subtasks can both miss the cache and both trigger the same expensive action.
+
+## How It Relates To Other Concepts
+
+- It sits directly between the KV silo and the external tool call.
+- The TypeScript and Python servers both expose `resolve` wrappers that read from the shared promise registry.
+- The semantic graph does not participate in the pending state. It is only a long-term retrieval layer after work has been completed and stored.
+
+## How It Works Internally
+
+`QPromiseRegistry` stores `_QPromise` entries in a dict keyed by `promise_id`. `register()` inserts a new pending record. `resolve()` flips that record to `resolved` and attaches a payload. `peek_promise()` reads the record without removing it, which is why repeated status checks are safe.
+
+`peek_context()` is where the pieces come together. In `peek.py`, the function checks the store first with `store.peek(key)`. Only when the store misses does it query `promise_registry.peek_promise(key)`. That order matters: a resolved cache entry should beat a stale or still-pending promise record.
+
+```mermaid
+sequenceDiagram
+  participant Caller
+  participant Store as PMMemoryStore
+  participant Registry as QPromiseRegistry
+
+  Caller->>Store: peek(key)
+  alt cached
+    Store-->>Caller: {hit: true, value, index}
+  else not cached
+    Caller->>Registry: peek_promise(key)
+    alt pending
+      Registry-->>Caller: {hit: true, status: "pending"}
+    else unknown
+      Registry-->>Caller: {hit: false}
+    end
+  end
+```
+
+## Basic Usage
+
+```python
+from pmll_memory_mcp import PMMemoryStore, QPromiseRegistry, peek_context
+
+store = PMMemoryStore()
+registry = QPromiseRegistry()
+registry.register("page:pricing")
+
+print(peek_context("page:pricing", "demo", store, registry))
+```
+
+## Advanced Usage
+
+Use a namespaced promise ID so independent sessions do not trample each other.
+
+```python
+from pmll_memory_mcp import QPromiseRegistry
+
+registry = QPromiseRegistry()
+promise_id = "session-42:search:billing"
+
+registry.register(promise_id)
+print(registry.peek_promise(promise_id))
+
+registry.resolve(promise_id, "resolved payload")
+print(registry.peek_promise(promise_id))
+```
+
+<Callout type="warn">The registry does not namespace promise IDs for you. The source comments in both languages explicitly leave that responsibility to the caller. If you use bare keys like <code>"search"</code> in a multi-session system, you will create false pending hits across unrelated work.</Callout>
+
+<Accordions>
+<Accordion title="Why this is a registry instead of real async futures">
+The package models the older Q-promise semantics from the C layer rather than exposing Python `asyncio.Future` or JavaScript `Promise` objects directly. That keeps the boundary serializable and easy to expose over MCP tools, where payloads are plain values and statuses. The trade-off is that there is no built-in waiting primitive or callback chain in this Python wrapper. If you need push-based completion, you must layer it on top of the registry yourself.
+</Accordion>
+<Accordion title="Why peek_context checks the cache before the promise registry">
+Checking the resolved cache first makes the hot path deterministic and cheap. If a value has already been committed to the silo, there is no reason to care whether some earlier promise record still exists. That ordering also keeps behavior intuitive for callers who only want the best available answer. The trade-off is that promise cleanup becomes your responsibility if you want the registry itself to stay perfectly tidy.
+</Accordion>
+</Accordions>
diff --git a/docs/codedocs/semantic-memory-graph.md b/docs/codedocs/semantic-memory-graph.md
new file mode 100644
index 0000000..f332a40
--- /dev/null
+++ b/docs/codedocs/semantic-memory-graph.md
@@ -0,0 +1,98 @@
+---
+title: "Semantic Memory Graph"
+description: "See how the long-term memory layer stores typed nodes, weighted edges, and TF-IDF embeddings."
+---
+
+The semantic memory graph is the long-term layer in `pmll-memory-mcp`. It is implemented in `mcp/pmll_memory_mcp/memory_graph.py` and uses `mcp/pmll_memory_mcp/embeddings.py` to build lightweight TF-IDF vectors without any external service.
+
+## What It Is
+
+The graph stores four node types:
+
+- `concept`
+- `file`
+- `symbol`
+- `note`
+
+Edges connect those nodes with typed relationships such as `depends_on`, `references`, and `similar_to`. Searches start with cosine similarity on node embeddings, then expand outward through neighboring edges.
+
+## Why It Exists
+
+Short-term memory is exact and fast, but it disappears when the session ends and only works for exact keys. The graph gives the package a way to keep semantically related knowledge around and retrieve it later even when the query is approximate.
+
+## How It Relates To Other Concepts
+
+- `promote_to_long_term()` writes into the graph from the solution engine.
+- `resolve_context()` reads from the graph only after the KV store misses.
+- `add_interlinked_context()` can bulk-ingest related concepts and automatically add `similar_to` edges when cosine similarity exceeds `0.72`.
+
+## How It Works Internally
+
+`embeddings.py` tokenizes input locally, adds documents to a module-level `TfIdfVectorizer`, and computes normalized vectors. `memory_graph.py` then stores the vector on each `MemoryNode`. When you call `search_graph()`, the function computes a query vector, scores every node with cosine similarity, sorts the hits, and then calls `_traverse_neighbors()` to explore connected nodes.
+
+Traversal scores are not pure similarity scores. The code blends similarity with decayed edge weight:
+
+```python
+relevance = similarity * 0.6 + (edge_decay / max(edge.weight, 0.01)) * 0.4
+```
+
+That detail matters because old edges fade over time even if the linked nodes are still semantically close. It lets the graph prefer fresher connections during traversal.
+
+```mermaid
+flowchart TD
+  A[Text input] --> B[tokenize]
+  B --> C[TfIdfVectorizer.add_document]
+  C --> D[embed]
+  D --> E[upsert_node]
+  E --> F[search_graph]
+  F --> G[direct semantic hits]
+  G --> H[neighbor traversal]
+  H --> I[decay-weighted ranked results]
+```
+
+## Basic Usage
+
+```python
+from pmll_memory_mcp import upsert_node, create_relation, search_graph
+
+session_id = "graph-demo"
+api = upsert_node(session_id, "concept", "API client", "Wraps HTTP requests")
+auth = upsert_node(session_id, "concept", "Authentication", "Handles login tokens")
+create_relation(session_id, api.id, auth.id, "depends_on")
+
+result = search_graph(session_id, "login client")
+print(result.direct[0].node.label)
+print(result.total_nodes, result.total_edges)
+```
+
+## Advanced Usage
+
+```python
+from pmll_memory_mcp import add_interlinked_context, retrieve_with_traversal
+
+session_id = "graph-advanced"
+bulk = add_interlinked_context(
+    session_id,
+    [
+        {"type": "file", "label": "server.py", "content": "Registers MCP tools"},
+        {"type": "symbol", "label": "resolve_context", "content": "Falls back from cache to graph"},
+        {"type": "note", "label": "deployment", "content": "Run over stdio transport"},
+    ],
+    auto_link=True,
+)
+
+start_id = bulk["nodes"][0].id
+for item in retrieve_with_traversal(session_id, start_id, max_depth=2):
+    print(item.depth, item.node.label, item.relevance_score)
+```
+
+<Callout type="warn">`embed()` updates the module-level vectorizer every time you add a document. That means embedding dimensions evolve as the corpus grows. Do not assume vectors generated early in a process are directly comparable to vectors exported from a different process with a different corpus state.</Callout>
+
+<Accordions>
+<Accordion title="Why TF-IDF was chosen instead of external embeddings">
+The package is explicitly designed to run without Ollama, OpenAI, or another embedding service, as the comments in `embeddings.py` say. That makes installs easy and keeps the server usable in CI and offline environments. The trade-off is retrieval quality: TF-IDF is strong when your query shares vocabulary with stored content, but it is weaker on paraphrases than a neural embedding model. If you need better semantic recall, this is the first subsystem to swap out.
+</Accordion>
+<Accordion title="Trade-off of auto-linking with a fixed similarity threshold">
+`add_interlinked_context()` uses a hard-coded `SIMILARITY_THRESHOLD` of `0.72`. That is easy to reason about and cheap to compute, and it helps the graph become useful with almost no manual edge authoring. The downside is that false positives and false negatives are both possible, especially when documents are short or highly repetitive. In practice, bulk ingestion works best when `content` fields contain meaningful descriptive text instead of bare names.
+</Accordion>
+</Accordions>
diff --git a/docs/codedocs/solution-engine.md b/docs/codedocs/solution-engine.md
new file mode 100644
index 0000000..ac0cf75
--- /dev/null
+++ b/docs/codedocs/solution-engine.md
@@ -0,0 +1,93 @@
+---
+title: "Solution Engine"
+description: "Learn how the package resolves context across short-term and long-term memory layers."
+---
+
+The solution engine in `mcp/pmll_memory_mcp/solution_engine.py` is the policy layer of the package. It decides when to trust the fast session cache, when to fall back to the semantic graph, and how to report the combined state back to callers.
+
+## What It Is
+
+The module exposes three package-level functions:
+
+- `resolve_context()`
+- `promote_to_long_term()`
+- `get_memory_status()`
+
+Those functions sit above the raw data structures and encode the intended runtime workflow.
+
+## Why It Exists
+
+Without this layer, callers would need to orchestrate every lookup manually:
+
+1. query the KV store
+2. query the graph
+3. decide which result wins
+4. decide what should be promoted
+5. compute a combined health view
+
+The solution engine centralizes that policy so both the reusable library and the MCP servers can behave consistently.
+
+## How It Relates To Other Concepts
+
+- It depends on `PMMemoryStore` for short-term lookup.
+- It depends on `search_graph()` and `upsert_node()` for long-term work.
+- The server wrappers expose it through the TypeScript `resolve_context`, `promote_to_long_term`, and `memory_status` tools, and the Python wrappers with slightly different names.
+
+## How It Works Internally
+
+`resolve_context()` is intentionally simple: it checks `store.peek(key)` first, and only if that misses does it call `search_graph(session_id, key, max_depth=1, top_k=1)`. If there is a graph hit, it returns the top direct hit's `content` and normalizes the score back to a `0.0-1.0` range.
+
+`promote_to_long_term()` does not currently inspect access counts even though the module defines `PROMOTION_THRESHOLD = 3`. The function always calls `upsert_node()` and returns a promoted node ID immediately. That is an important source-level detail: the threshold is part of status reporting, not enforced promotion logic.
+
+`get_memory_status()` asks the graph for stats and combines them with `len(store)` and `store.silo_size`. It is purely observational, which makes it safe to call for dashboards or debug output.
+
+```mermaid
+flowchart TD
+  A[resolve_context key] --> B{KV hit?}
+  B -->|Yes| C[Return short_term score 1.0]
+  B -->|No| D[search_graph top_k=1]
+  D --> E{Graph hit?}
+  E -->|Yes| F[Return long_term with normalized score]
+  E -->|No| G[Return miss]
+```
+
+## Basic Usage
+
+```python
+from pmll_memory_mcp import PMMemoryStore, resolve_context
+
+store = PMMemoryStore()
+store.set("pricing", "cached pricing page")
+
+print(resolve_context("demo", "pricing", store))
+```
+
+## Advanced Usage
+
+```python
+from pmll_memory_mcp import (
+    PMMemoryStore,
+    promote_to_long_term,
+    get_memory_status,
+    resolve_context,
+)
+
+session_id = "hybrid"
+store = PMMemoryStore()
+store.set("auth", "fresh login flow")
+
+promote_to_long_term(session_id, "authentication", "fresh login flow", "concept")
+print(resolve_context(session_id, "authentication", store))
+print(get_memory_status(session_id, store))
+```
+
+<Callout type="warn">`PROMOTION_THRESHOLD` is reported by `get_memory_status()`, but the current implementation does not automatically promote when access count reaches that value. If you need threshold-based promotion, you must build that trigger in your application or extend `promote_to_long_term()` yourself.</Callout>
+
+<Accordions>
+<Accordion title="Why short-term results always win">
+The source code makes short-term memory authoritative because it represents the most recent task-local state. That is the right default for agent flows where a fresh tool result should override older semantic knowledge. The trade-off is that stale session data can mask a better long-term answer if you forget to refresh or flush the store. When debugging odd results, inspect the cache before blaming graph search quality.
+</Accordion>
+<Accordion title="Why promotion is explicit instead of automatic">
+Explicit promotion keeps the core functions predictable and easy to test. The library never silently writes to the long-term graph as a side effect of a read, which avoids surprise graph growth. The downside is operational discipline: if you do not call `promote_to_long_term()` at the right moments, the graph will stay sparse and `resolve_context()` will miss after the session cache disappears. Most production integrations should define a small set of promotion rules tied to successful expensive operations.
+</Accordion>
+</Accordions>
diff --git a/docs/codedocs/types.md b/docs/codedocs/types.md
new file mode 100644
index 0000000..92c0ac2
--- /dev/null
+++ b/docs/codedocs/types.md
@@ -0,0 +1,142 @@
+---
+title: "Types"
+description: "TypeScript source-level types and interfaces used by the Node implementation in mcp/src."
+---
+
+The repository exports several useful TypeScript types from deep source modules under `mcp/src`. These are source-level exports used by the Node implementation; the package does not declare a top-level `exports` map for them, so treat these as internal-but-documented shapes rather than a guaranteed public runtime contract.
+
+## `peek.ts`
+
+Source file: `mcp/src/peek.ts`
+
+```ts
+export interface PeekHitResult {
+  hit: true;
+  value: string;
+  index: number;
+}
+
+export interface PeekPendingResult {
+  hit: true;
+  status: "pending";
+  promise_id: string;
+}
+
+export interface PeekMissResult {
+  hit: false;
+}
+
+export type PeekContextResult =
+  | PeekHitResult
+  | PeekPendingResult
+  | PeekMissResult;
+```
+
+These types model the three legal outcomes of the cache guard.
+
+## `memory-graph.ts`
+
+Source file: `mcp/src/memory-graph.ts`
+
+```ts
+export type NodeType = "concept" | "file" | "symbol" | "note";
+
+export type RelationType =
+  | "relates_to"
+  | "depends_on"
+  | "implements"
+  | "references"
+  | "similar_to"
+  | "contains";
+```
+
+`NodeType` constrains what kind of memory node can be created, while `RelationType` constrains the legal edge kinds used by traversal and search.
+
+```ts
+export interface MemoryNode {
+  id: string;
+  type: NodeType;
+  label: string;
+  content: string;
+  embedding: number[];
+  createdAt: number;
+  lastAccessed: number;
+  accessCount: number;
+  metadata: Record<string, string>;
+}
+```
+
+`MemoryNode` is the stored semantic unit. The `embedding` field comes from `embed()`, and the access timestamps are updated during search and traversal.
+
+```ts
+export interface MemoryEdge {
+  id: string;
+  source: string;
+  target: string;
+  relation: RelationType;
+  weight: number;
+  createdAt: number;
+  metadata: Record<string, string>;
+}
+```
+
+`MemoryEdge` drives both manual relations and auto-linked similarity edges.
+
+```ts
+export interface TraversalResult {
+  node: MemoryNode;
+  depth: number;
+  pathRelations: string[];
+  relevanceScore: number;
+}
+
+export interface GraphSearchResult {
+  direct: TraversalResult[];
+  neighbors: TraversalResult[];
+  totalNodes: number;
+  totalEdges: number;
+}
+```
+
+`TraversalResult` is used for both direct hits and neighbor exploration. `GraphSearchResult` groups those ranked collections into one response.
+
+## `kv-store.ts`
+
+Source file: `mcp/src/kv-store.ts`
+
+```ts
+export type PeekResult = [boolean, string | null, number | null];
+```
+
+This is the low-level tuple returned by `PMMemoryStore.peek()` before `peekContext()` converts it into a tagged object union.
+
+## `q-promise-bridge.ts`
+
+Source file: `mcp/src/q-promise-bridge.ts`
+
+```ts
+export type PeekPromiseResult = [boolean, string | null, string | null];
+```
+
+This tuple is the low-level status format for promise inspection.
+
+## `graphql.ts`
+
+Source file: `mcp/src/graphql.ts`
+
+```ts
+export interface GraphQLResponse {
+  data?: Record<string, unknown> | null;
+  errors?: Array<{ message: string; locations?: unknown; path?: unknown }>;
+}
+```
+
+This is the parsed JSON shape returned by `executeGraphQL()`.
+
+## When These Types Matter
+
+- Use the `peek.ts` types when you are extending the Node guard logic.
+- Use `NodeType`, `RelationType`, and the graph interfaces when you are modifying search, traversal, or auto-linking.
+- Use `GraphQLResponse` when you are integrating the TypeScript GraphQL tool.
+
+<Callout type="info">If you need a stable import contract for application code today, prefer the Python package exports from <code>pmll_memory_mcp</code>. The TypeScript types are best read as implementation documentation for the Node server in this repository.</Callout>