brianmeyer · brianmeyer · May 17, 2026 · May 17, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,7 +4,7 @@ All notable changes to RecallForge will be documented in this file.
 
 ## [Unreleased]
 
-*Nothing yet.*
+- Replaced the tiny UAT video clips with compact episodic-memory fixtures, richer transcript sidecars, related artifact metadata, and regression coverage for the video corpus.
 
 ## [0.2.1] — 2026-05-17
 

diff --git a/README.md b/README.md
@@ -62,7 +62,7 @@ The reranker delivers **+20.7% R@1 over RRF fusion** and pushes R@10 to 97.8%. E
 
 *Benchmark categories: text_only (30 queries), image_only (30 queries), long_query (12 queries), typo_query (20 queries). See `benchmarks/results/pipeline_ablation_modality_results.json` for full breakdown.*
 
-For release validation, use `benchmarks/cross_modal_ablation.py`. It checkpoints JSON output as it runs, so long MLX benchmark sessions still leave behind a partial artifact if interrupted. To turn that artifact into a ranked fix list, run `benchmarks/cross_modal_diagnostics.py`; the current report is in [docs/research/cross-modal-diagnostics.md](docs/research/cross-modal-diagnostics.md).
+For release validation, use `benchmarks/cross_modal_ablation.py`. It checkpoints JSON output as it runs, so long MLX benchmark sessions still leave behind a partial artifact if interrupted. The UAT video corpus now uses compact episodic fixtures with searchable transcript sidecars and related artifact metadata, so video queries exercise meeting, screen-recording, walkthrough, field, and recipe-style memories. To turn a benchmark artifact into a ranked fix list, run `benchmarks/cross_modal_diagnostics.py`; the current report is in [docs/research/cross-modal-diagnostics.md](docs/research/cross-modal-diagnostics.md).
 
 ### Latency & resource usage
 

diff --git a/benchmarks/cross_modal_ablation.py b/benchmarks/cross_modal_ablation.py
@@ -1001,37 +1001,37 @@ def _media_query_variants(
 TEXT_TO_VIDEO = [
     # EASY (6 queries)
     GroundTruth(
-        query="architecture walkthrough building tour presentation",
+        query="office walkthrough connecting floor plan and system architecture",
         relevant_paths=["videos/architecture_walkthrough.mp4"],
         category="text_to_video",
         difficulty="easy",
     ),
     GroundTruth(
-        query="coding demonstration software development tutorial",
+        query="screen recording debugging RecallForge video search test",
         relevant_paths=["videos/coding_demo.mp4"],
         category="text_to_video",
         difficulty="easy",
     ),
     GroundTruth(
-        query="cooking tutorial recipe demonstration video",
+        query="family dinner pasta recipe video with handwritten substitutions",
         relevant_paths=["videos/cooking_tutorial.mp4"],
         category="text_to_video",
         difficulty="easy",
     ),
     GroundTruth(
-        query="nature timelapse video forest mountains",
+        query="weekend trail scouting video forest mountain coast",
         relevant_paths=["videos/nature_timelapse.mp4"],
         category="text_to_video",
         difficulty="easy",
     ),
     GroundTruth(
-        query="whiteboard session brainstorming meeting recording",
+        query="product planning whiteboard meeting memory rollups",
         relevant_paths=["videos/whiteboard_session.mp4"],
         category="text_to_video",
         difficulty="easy",
     ),
     GroundTruth(
-        query="video content with transcript about buildings",
+        query="walkthrough transcript about floor plan architecture deck",
         relevant_paths=["videos/architecture_walkthrough.mp4", "videos/architecture_walkthrough.transcript.json"],
         category="text_to_video",
         difficulty="easy",
@@ -1040,33 +1040,33 @@ def _media_query_variants(
 
     # MEDIUM (6 queries)
     GroundTruth(
-        query="programming and software engineering video content",
+        query="developer screen recording and meeting notes about search pipeline",
         relevant_paths=["videos/coding_demo.mp4", "videos/whiteboard_session.mp4"],
         category="text_to_video",
         difficulty="medium",
         graded_relevance={"videos/coding_demo.mp4": 2, "videos/whiteboard_session.mp4": 1},
     ),
     GroundTruth(
-        query="food preparation and culinary instruction videos",
+        query="recipe memory with pasta sauce timing and grocery planning",
         relevant_paths=["videos/cooking_tutorial.mp4"],
         category="text_to_video",
         difficulty="medium",
     ),
     GroundTruth(
-        query="natural environment scenery video footage",
+        query="outdoor field clip with route planning and park notes",
         relevant_paths=["videos/nature_timelapse.mp4"],
         category="text_to_video",
         difficulty="medium",
     ),
     GroundTruth(
-        query="meeting recordings with transcripts for review",
+        query="meeting recordings with transcript action items for review",
         relevant_paths=["videos/whiteboard_session.mp4", "videos/whiteboard_session.transcript.json"],
         category="text_to_video",
         difficulty="medium",
         graded_relevance={"videos/whiteboard_session.mp4": 2, "videos/whiteboard_session.transcript.json": 2},
     ),
     GroundTruth(
-        query="educational video content with searchable transcripts",
+        query="searchable transcript memories from kitchen and developer videos",
         relevant_paths=["videos/cooking_tutorial.mp4", "videos/cooking_tutorial.transcript.json",
                         "videos/coding_demo.mp4", "videos/coding_demo.transcript.json"],
         category="text_to_video",
@@ -1075,7 +1075,7 @@ def _media_query_variants(
                          "videos/coding_demo.mp4": 2, "videos/coding_demo.transcript.json": 2},
     ),
     GroundTruth(
-        query="visual documentation of outdoor spaces",
+        query="visual documentation of outdoor spaces and walkthrough locations",
         relevant_paths=["videos/nature_timelapse.mp4", "videos/architecture_walkthrough.mp4"],
         category="text_to_video",
         difficulty="medium",
@@ -1084,7 +1084,7 @@ def _media_query_variants(
 
     # HARD (3 queries)
     GroundTruth(
-        query="multimedia content for learning and development",
+        query="episodic videos with procedural learning and follow-up actions",
         relevant_paths=["videos/cooking_tutorial.mp4", "videos/coding_demo.mp4", "videos/whiteboard_session.mp4",
                         "videos/cooking_tutorial.transcript.json", "videos/coding_demo.transcript.json", "videos/whiteboard_session.transcript.json"],
         category="text_to_video",
@@ -1093,7 +1093,7 @@ def _media_query_variants(
                          "videos/cooking_tutorial.transcript.json": 2, "videos/coding_demo.transcript.json": 2, "videos/whiteboard_session.transcript.json": 2},
     ),
     GroundTruth(
-        query="archived recordings with searchable text content",
+        query="archived recordings with field notes and architecture narration",
         relevant_paths=["videos/architecture_walkthrough.mp4", "videos/architecture_walkthrough.transcript.json",
                         "videos/nature_timelapse.mp4", "videos/nature_timelapse.transcript.json"],
         category="text_to_video",
@@ -1102,7 +1102,7 @@ def _media_query_variants(
                          "videos/nature_timelapse.mp4": 2, "videos/nature_timelapse.transcript.json": 2},
     ),
     GroundTruth(
-        query="comprehensive video library with transcripts",
+        query="comprehensive episodic video library with transcripts",
         relevant_paths=["videos/cooking_tutorial.mp4", "videos/coding_demo.mp4", "videos/whiteboard_session.mp4", "videos/architecture_walkthrough.mp4", "videos/nature_timelapse.mp4"],
         category="text_to_video",
         difficulty="hard",
@@ -1434,7 +1434,7 @@ def _media_query_variants(
         query="related text",
         query_type="video",
         video_query_path="videos/coding_demo.mp4",
-        relevant_paths=["text/tech_cybersecurity.md", "text/tech_cloud_computing.md"],
+        relevant_paths=["text/ai_agents.md", "text/tech_cloud_computing.md", "text/ai_embeddings.md"],
         category="video_to_text",
         difficulty="hard",
     ),

diff --git a/docs/RELEASE.md b/docs/RELEASE.md
@@ -43,6 +43,18 @@ Then run the expanded benchmark:
 .venv/bin/python benchmarks/cross_modal_ablation.py --backend mlx --expansion-profile caption_only --output benchmarks/results/cross_modal_ablation_results.json
 ```
 
+The committed video corpus is an episodic fixture set rather than generic toy clips. Before trusting video-related benchmark changes, confirm the generated sidecars still include searchable `text`, timed segments, and related image/document metadata:
+
+```bash
+.venv/bin/python -m pytest -q tests/test_video_corpus.py tests/test_video_sidecars.py
+```
+
+The shell video-quality UAT uses a deterministic backend by default so CI and local smoke runs are not gated on live model quality. To exercise the installed vision-language backend on this host, opt in explicitly:
+
+```bash
+UAT_VIDEO_LIVE=1 bash tests/uat/test_video_quality.sh
+```
+
 The benchmark now checkpoints to JSON as it runs. If the run is interrupted, the output file still contains partial results plus progress metadata.
 
 After a complete or partial benchmark run, generate the cross-modal diagnosis report:

diff --git a/tests/test_video_corpus.py b/tests/test_video_corpus.py
@@ -0,0 +1,59 @@
+"""Regression tests for the committed episodic video corpus."""
+
+from __future__ import annotations
+
+import importlib.util
+import json
+import sys
+import unittest
+from pathlib import Path
+
+
+REPO_ROOT = Path(__file__).resolve().parent.parent
+GENERATOR_PATH = REPO_ROOT / "tests" / "uat" / "helpers" / "generate_video_corpus.py"
+VIDEOS_DIR = REPO_ROOT / "tests" / "uat" / "corpus" / "videos"
+
+
+def _load_generator():
+    spec = importlib.util.spec_from_file_location("generate_video_corpus", GENERATOR_PATH)
+    module = importlib.util.module_from_spec(spec)
+    assert spec.loader is not None
+    sys.modules[spec.name] = module
+    spec.loader.exec_module(module)
+    return module
+
+
+class TestEpisodicVideoCorpus(unittest.TestCase):
+    def test_generator_specs_are_rich_episodic_fixtures(self):
+        module = _load_generator()
+
+        self.assertEqual(len(module.VIDEOS), 5)
+        for spec in module.VIDEOS:
+            with self.subTest(video=spec["name"]):
+                self.assertGreaterEqual(spec["duration"], 9)
+                self.assertGreaterEqual(len(spec["images"]), 2)
+                self.assertGreaterEqual(len(spec["transcript"]), 3)
+                self.assertTrue(spec["scenario"])
+                self.assertTrue(spec["notes"])
+                self.assertTrue(spec["related_images"])
+                self.assertTrue(spec["related_documents"])
+
+    def test_committed_sidecars_include_searchable_transcript_text(self):
+        sidecars = sorted(VIDEOS_DIR.glob("*.transcript.json"))
+
+        self.assertEqual(len(sidecars), 5)
+        for sidecar in sidecars:
+            with self.subTest(sidecar=sidecar.name):
+                payload = json.loads(sidecar.read_text(encoding="utf-8"))
+                self.assertEqual(payload["memory_type"], "episodic_video_fixture")
+                self.assertTrue(payload["scenario"])
+                self.assertTrue(payload["description"])
+                self.assertTrue(payload["notes"])
+                self.assertTrue(payload["text"])
+                self.assertGreaterEqual(len(payload["segments"]), 3)
+                self.assertTrue(payload["related_images"])
+                self.assertTrue(payload["related_documents"])
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/tests/uat/README.md b/tests/uat/README.md
@@ -6,7 +6,7 @@ Manual end-to-end test suite for RecallForge. Validates correctness gates, bench
 
 - **Python 3.12+**
 - **RecallForge installed:** `pip install -e .` (from repo root)
-- **ffmpeg** (for video frame extraction and synthetic video generation)
+- **ffmpeg** (for video frame extraction and regenerating committed video fixtures)
 - **Backends:** torch (CPU/CUDA) and/or MLX (macOS ARM64)
 - **Memory requirements:**
   - MLX 4-bit embed mode: ~1.7GB
@@ -42,7 +42,7 @@ All tests live in `tests/uat/`. Each is self-contained and can be run independen
 | `test_tiered_modes.sh` | Tiered modes (embed/hybrid/full) loading and behavior |
 | `test_document_ingest.sh` | Document ingest (PDF/DOCX/PPTX extraction via CLI) |
 | `test_video_ingest.sh` | Video ingest (transcript fallback + ffmpeg frame extraction) |
-| `test_video_quality.sh` | Video retrieval quality (text/image/video query coverage) |
+| `test_video_quality.sh` | Video retrieval quality (deterministic by default; set `UAT_VIDEO_LIVE=1` for live model retrieval) |
 | `test_video_query_contract.sh` | Raw video query smoke test |
 | `test_cross_modal.sh` | ★ CROSS-MODAL SEARCH (key differentiator) |
 | `test_search_quality.sh` | Search quality (recall@5, MRR, edge cases, dedup) |
@@ -85,7 +85,7 @@ Benchmark tests are **informational** — they report metrics but don't block co
 Tests use a committed video corpus and built-in text/image fixtures in `tests/uat/corpus/`:
 
 ### Video Corpus
-A committed set of test videos with known transcripts and ground-truth frames. Used by `test_video_ingest.sh`, `test_video_quality.sh`, and `test_video_query_contract.sh` to validate cross-modal retrieval on temporal media.
+A committed set of compact episodic video fixtures with known transcripts, related-image/document metadata, and ground-truth frames. The clips cover a screen recording, outdoor field clip, architecture walkthrough, kitchen recipe memory, and product-planning whiteboard session. Used by `test_video_ingest.sh`, `test_video_quality.sh`, and `test_video_query_contract.sh` to validate cross-modal retrieval on temporal media.
 
 ### Text Documents (15 files)
 | Topic | Files |
@@ -171,7 +171,7 @@ Each test script exits 0 on success, 1 on any failure.
 
 1. **Torch video crash on Qwen3-VL (REC-44):** Known issue where torch backend crashes during video frame processing with Qwen3-VL models. Workaround: use MLX backend on Apple Silicon or skip video tests when using Qwen3-VL with torch.
 
-2. **Synthetic test images:** Generated images are simple drawings, not real photos. Cross-modal accuracy will be lower than with real-world images. This is expected.
+2. **Compact local fixtures:** Images and videos are generated/curated to stay small, deterministic, and license-safe. The video corpus now uses episodic memory scenarios with transcripts and related artifacts, but broad public benchmark claims should still be validated against larger real-world datasets.
 
 3. **First run is slow:** Models download on first use (~4GB per model). Subsequent runs use cached models.
 
@@ -185,7 +185,7 @@ Each test script exits 0 on success, 1 on any failure.
 
 8. **Video ingest depends on host capabilities:** Transcript sidecars (`.srt`, `.vtt`, `.txt`) are always supported. Frame extraction runs when `ffmpeg` and `ffprobe` are installed; otherwise video UAT validates transcript-only fallback.
 
-9. **Raw video query requires ffmpeg:** `test_video_query_contract.sh` and raw-video portions of CLI/MCP/video-quality UAT require `ffmpeg` to generate valid synthetic video fixtures. Without it, those checks skip cleanly.
+9. **Raw video query requires ffmpeg:** `test_video_query_contract.sh` and raw-video portions of CLI/MCP/video-quality UAT require `ffmpeg` for frame extraction and video fixture regeneration. Without it, those checks skip cleanly.
 
 10. **Document ingest is local-first:** DOCX and PPTX fixtures are extracted through built-in OOXML parsing. PDF ingestion uses a lightweight fallback extractor by default and gets richer parsing when optional PDF tooling is installed.
 

diff --git a/tests/uat/corpus/CORPUS_EXPANSION.md b/tests/uat/corpus/CORPUS_EXPANSION.md
@@ -6,12 +6,32 @@ This document describes the expanded RecallForge benchmark corpus and what addit
 
 - **Text documents**: 54 files (15 original + 39 new)
 - **Images**: 10 files (existing)
-- **Videos**: 5 `.mp4` files plus 5 transcript JSON placeholders
+- **Videos**: 5 compact episodic `.mp4` fixtures plus 5 rich transcript JSON sidecars
 - **Documents**: 8 generated `.docx` / `.pptx` / `.pdf` files
 - **Total corpus documents**: 82 registered in `CORPUS_DOCS`
-- **Total indexed benchmark items**: 77 searchable items (the transcript JSON placeholders are empty and not indexed)
+- **Total indexed benchmark items**: 82 searchable top-level/sidecar items, plus derived video frame and transcript child memories during video ingest
 - **Total benchmark queries**: 231 queries across all modalities
 
+## Episodic Video Corpus
+
+REC-153 replaced the earlier tiny toy clips with a license-safe episodic fixture set. The files are still small enough to commit, but each video now resembles a real personal or work memory: a screen recording, a field clip, a walkthrough, a kitchen note, or a product-planning meeting.
+
+| File | Memory Scenario | Primary Signals |
+|------|-----------------|-----------------|
+| `coding_demo.mp4` | RecallForge debugging screen recording | code editor, architecture board, action notes, reranking and transcript discussion |
+| `nature_timelapse.mp4` | Weekend trail scouting phone clip | forest, mountain, coast, route planning, park/climate notes |
+| `architecture_walkthrough.mp4` | Office and system architecture walkthrough | floor plan, service diagram, model diagram, milestone narration |
+| `cooking_tutorial.mp4` | Weeknight family recipe memory | pasta, recipe substitutions, handwritten cooking notes |
+| `whiteboard_session.mp4` | Product planning meeting | brainstorm board, parent/child memory rollups, benchmark scoring, release actions |
+
+Each `.transcript.json` sidecar now includes:
+
+- timed transcript segments used by video ingest
+- a top-level `text` field so the sidecar can also be indexed as a searchable transcript artifact
+- `scenario`, `notes`, `related_images`, and `related_documents` metadata for benchmark and documentation provenance
+
+The design follows the same broad shape as episodic-memory video benchmarks such as [Ego4D Episodic Memory](https://ego4d-data.org/docs/benchmarks/episodic-memory/): queries should be able to recover an event, scene, moment, transcript detail, or related artifact from a video-backed memory.
+
 ## New Text Documents Added (39 files)
 
 ### Technology (5 files)
@@ -120,18 +140,17 @@ To further expand the corpus for more comprehensive cross-modal testing, the fol
 39. **travel_yosemite.jpg** - Yosemite National Park
 40. **travel_grand_canyon.jpg** - Grand Canyon landscape
 
-### Recommended Videos to Add
-
-1. **tech_quantum_explainer.mp4** - Quantum computing explanation
-2. **tech_security_demo.mp4** - Cybersecurity demonstration
-3. **science_lab_experiment.mp4** - Science lab experiment
-4. **cooking_masterclass.mp4** - Professional cooking demonstration
-5. **sports_highlights.mp4** - Sports highlights reel
-6. **history_documentary.mp4** - Historical documentary clip
-7. **medicine_procedure.mp4** - Medical procedure video
-8. **music_concert.mp4** - Live music performance
-9. **art_gallery_tour.mp4** - Art gallery walkthrough
-10. **travel_vlog.mp4** - Travel destination vlog
+### Future Real-World Video Additions
+
+The committed corpus is intentionally compact and license-safe. Future benchmark expansions should add opt-in downloaded fixtures or locally supplied clips in these shapes:
+
+1. **meeting_recording_with_slides.mp4** - transcript-heavy meeting with visible slide/document references
+2. **screen_recording_debug_trace.mp4** - developer workflow with code, terminal output, and spoken issue context
+3. **mobile_walkthrough_errand.mp4** - personal memory clip with objects, location changes, and follow-up tasks
+4. **cooking_or_repair_procedure.mp4** - procedural video with step ordering and recipe/tool notes
+5. **document_review_session.mp4** - video that references PDFs, decks, and handwritten annotations
+6. **travel_or_field_visit_clip.mp4** - visually rich outdoor clip with route, weather, and place notes
+7. **classroom_or_tutorial_clip.mp4** - instructional video with transcript-heavy concepts and whiteboard imagery
 
 ## Benchmark Query Distribution
 

diff --git a/tests/uat/corpus/videos/architecture_walkthrough.mp4 b/tests/uat/corpus/videos/architecture_walkthrough.mp4