Skip to content

Expand weak cross-modal benchmark categories#29

Merged
brianmeyer merged 1 commit into
masterfrom
codex/rec-160-expanded-cross-modal-benchmark
May 17, 2026
Merged

Expand weak cross-modal benchmark categories#29
brianmeyer merged 1 commit into
masterfrom
codex/rec-160-expanded-cross-modal-benchmark

Conversation

@brianmeyer
Copy link
Copy Markdown
Owner

Summary

  • Expand the REC-168 weak media-query categories to 20 queries each: image_to_text, image_to_document, video_to_text, video_to_image, and video_to_document.
  • Add grounded media-query variants with explicit source-path provenance and graded relevance where applicable.
  • Add regression coverage for the weak-category query floor and media source path preservation.
  • Update the UAT corpus benchmark distribution docs to 231 total queries and document parent-memory vs asset-level metrics.

Research grounding

Tests

  • python3 benchmarks/cross_modal_ablation.py --dry-run
  • python3 -m pytest -q tests/test_cross_modal_benchmark_defs.py tests/test_cross_modal_diagnostics.py
  • python3 -m pytest -q

@brianmeyer brianmeyer merged commit 1b4d609 into master May 17, 2026
4 checks passed
@brianmeyer brianmeyer deleted the codex/rec-160-expanded-cross-modal-benchmark branch May 17, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant