brianmeyer · brianmeyer · May 17, 2026 · May 17, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,7 @@ All notable changes to RecallForge will be documented in this file.
 
 - Added staged background reindex promotion so document, video, audio, and conversation replacements stay hidden until their parent/child memory batches are complete.
 - Added index-version-aware query caching for repeated text/media embeddings and generated expansion branches.
+- Added MCP progress notifications for long-running search, ingest, batch, memory write, and FTS rebuild tool calls when clients provide a progress token.
 - Added deterministic memory graph enrichment with entity/relation side tables and new `memory_graph_entities` / `memory_graph_related` MCP tools.
 - Replaced the tiny UAT video clips with compact episodic-memory fixtures, richer transcript sidecars, related artifact metadata, and regression coverage for the video corpus.
 - Added `memory_add_conversation` so conversation threads ingest as canonical parent memories with turn-level child memories and standard memory rollups.

diff --git a/README.md b/README.md
@@ -146,7 +146,7 @@ Run over HTTP/SSE:
 recallforge serve --http --host 127.0.0.1 --port 7433 --mode embed
 ```
 
-RecallForge now exposes **26 MCP tools** across search, ingest, memory graph navigation, collection admin, and runtime config. HTTP/SSE mode also exposes `/health`, `/sse`, and `/messages/`.
+RecallForge now exposes **26 MCP tools** across search, ingest, memory graph navigation, collection admin, and runtime config. HTTP/SSE mode also exposes `/health`, `/sse`, and `/messages/`. Long-running tools emit MCP `notifications/progress` when the client supplies a request `_meta.progressToken`, so compatible HTTP/SSE clients can show live progress for ingest, search, batch, memory writes, and FTS rebuilds.
 
 See [docs/mcp-tools.md](docs/mcp-tools.md) for the full tool reference.
 

diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
@@ -216,13 +216,15 @@ Tools: 26 MCP tools across search, ingest, memory, memory graph, collection admi
 Transport: stdio (default) or HTTP/SSE (`/health`, `/sse`, `/messages/`)
 Startup: backend.warm_up() for predictable latency
 Signals: SIGTERM/SIGINT graceful shutdown
+Progress: request `_meta.progressToken` enables `notifications/progress` during long-running tool calls
 ```
 
 Key runtime details:
 
 - Blocking tool work is routed through a bounded async semaphore to avoid overloading local model/runtime resources
 - HTTP mode requires the optional `server` extra (`starlette` + `uvicorn`)
 - Runtime-safe config changes (`mode`, `collection`, `rerank_top_k`, `caption_media`, model IDs) are exposed through `get_config` / `set_config`
+- Progress notifications are best-effort and preserve stable final response JSON. `search_batch` reports per-query completion before returning the final merged results; `batch` reports per-operation completion.
 
 ## Storage Layout
 

diff --git a/docs/mcp-tools.md b/docs/mcp-tools.md
@@ -21,6 +21,17 @@ HTTP mode also exposes:
 - `/sse`
 - `/messages/`
 
+## Progress Notifications
+
+RecallForge supports MCP progress notifications for long-running tool calls. When a client includes `_meta.progressToken` in a request, compatible transports receive `notifications/progress` events with numeric progress, optional total, and a human-readable status message.
+
+Progress is best-effort and does not change the final tool response shape. It currently covers:
+
+- search and explain phases
+- vector and full-text search phases
+- `search_batch` per-query completion updates before the final merged result
+- `ingest`, individual index/memory writes, `batch`, and `rebuild_fts`
+
 Example MCP client config (Claude Desktop):
 
 ```json

diff --git a/docs/research/recallforge-memory-mcp-roadmap.md b/docs/research/recallforge-memory-mcp-roadmap.md
@@ -135,16 +135,18 @@ Goal:
 - Prove RecallForge as a memory MCP, not just a benchmark pipeline.
 
 Current Linear fit:
-- `REC-160`
-- `REC-153`
 - `REC-33`
+
+Shipped Linear work:
+- `REC-153`
 - `REC-61`
 
 What this phase delivers:
 - memory-level evaluation
 - explanation quality checks
 - latency and RSS budget enforcement
 - real episodic corpora coverage
+- MCP progress notifications for long-running search, ingest, batch, and rebuild workflows
 - alpha and beta validation with real workflows
 
 Why this comes last:
@@ -156,7 +158,7 @@ Why this comes last:
 - Keep `Retrieval and Ranking` for cheap broad retrieval work like `REC-169`, `REC-148`, `REC-72`, `REC-71`, `REC-146`
 - Add a milestone such as `Memory Policy and Enrichment` for `REC-84`, `REC-83`, `REC-75`, `REC-76`, `REC-78`
 - Keep `Research Queue` for gated expensive-stage work like `REC-130`, `REC-115`, `REC-147`, `REC-168`
-- Keep `Benchmark Integrity` and `Launch and Distribution` for `REC-160`, `REC-153`, `REC-33`, `REC-61`
+- Keep `Benchmark Integrity` and `Launch and Distribution` for `REC-33` and any future public validation work
 
 ## Architecture Principle
 

diff --git a/src/recallforge/search.py b/src/recallforge/search.py
@@ -23,7 +23,7 @@
 import time
 from dataclasses import dataclass, field, replace
 from hashlib import sha256
-from typing import List, Dict, Any, Optional, Union
+from typing import Any, Callable, Dict, List, Optional, Union
 
 from .backends.base import ModelBackend
 from .cache import EmbeddingCache
@@ -1769,6 +1769,7 @@ def search_batch(
     profile: Optional[str] = None,
     max_workers: int = 4,
     rrf_k: int = 60,
+    progress_callback: Optional[Callable[[int, int, int], None]] = None,
 ) -> List[BatchSearchResult]:
     """
     Run multiple search queries in parallel and merge results using RRF.
@@ -1789,6 +1790,8 @@ def search_batch(
         profile: Optional profile namespace filter
         max_workers: Maximum parallel threads
         rrf_k: RRF fusion constant
+        progress_callback: Optional callback invoked as each query branch
+            completes with (completed_count, total_count, branch_result_count)
 
     Returns:
         List of BatchSearchResult objects, sorted by best merged score
@@ -1845,6 +1848,7 @@ def run_single_query(q: BatchQuery) -> List[tuple]:
 
     # Run all queries in parallel
     all_results: List[List[tuple]] = [[] for _ in batch_queries]
+    completed_queries = 0
     with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
         future_to_idx = {
             executor.submit(run_single_query, q): i
@@ -1857,6 +1861,12 @@ def run_single_query(q: BatchQuery) -> List[tuple]:
             except Exception as e:
                 logger.error("Batch query %d failed: %s", idx, e)
                 all_results[idx] = []
+            completed_queries += 1
+            if progress_callback is not None:
+                try:
+                    progress_callback(completed_queries, len(batch_queries), len(all_results[idx]))
+                except Exception as exc:
+                    logger.debug("search_batch progress callback failed: %s", exc)
 
     # Merge results using RRF with best-score-wins
     merged: Dict[str, Dict[str, Any]] = {}