Expand weak cross-modal benchmark categories by brianmeyer · Pull Request #29 · brianmeyer/recallforge

brianmeyer · 2026-05-17T18:08:23Z

Summary

Expand the REC-168 weak media-query categories to 20 queries each: image_to_text, image_to_document, video_to_text, video_to_image, and video_to_document.
Add grounded media-query variants with explicit source-path provenance and graded relevance where applicable.
Add regression coverage for the weak-category query floor and media source path preservation.
Update the UAT corpus benchmark distribution docs to 231 total queries and document parent-memory vs asset-level metrics.

python3 benchmarks/cross_modal_ablation.py --dry-run
python3 -m pytest -q tests/test_cross_modal_benchmark_defs.py tests/test_cross_modal_diagnostics.py
python3 -m pytest -q

Expand weak cross-modal benchmark categories

32ba244

brianmeyer merged commit 1b4d609 into master May 17, 2026
4 checks passed

brianmeyer deleted the codex/rec-160-expanded-cross-modal-benchmark branch May 17, 2026 18:11