A compact Python toolkit for region-level scoring from patch-level relevance maps.
It is designed for multimodal RAG pipelines where you have:
- patch/token relevance scores from a VLM encoder
- OCR bounding boxes from a document/image parser
- a need to rank/filter only the useful regions
- Convert patch index to patch bounding box.
- Compute IoU between each patch and each OCR region.
- Aggregate patch scores into a region relevance score with IoU weighting.
- Keep only regions above threshold.
- Optionally normalize by overlap to reduce large-box bias.
No extra training required. This is pure inference-time post-processing.
pip install -e .patch2region-lite score \
--patch-scores examples/patch_scores.json \
--regions examples/ocr_regions.json \
--image-width 1600 \
--image-height 1200 \
--min-iou 0.02 \
--output /tmp/regions_ranked.json
# inspect which patches contributed to one region
patch2region-lite explain \
--patch-scores examples/patch_scores.json \
--regions examples/ocr_regions.json \
--region-id r3 \
--image-width 1600 \
--image-height 1200 \
--output /tmp/r3_explain.jsonpatch_scores.json
{
"n_patches_x": 4,
"n_patches_y": 4,
"scores": [0.1, 0.2, 0.9, 0.8, 0.2, 0.1, 0.7, 0.5, 0.1, 0.1, 0.6, 0.4, 0.1, 0.2, 0.3, 0.2]
}ocr_regions.json
[
{"id": "r1", "bbox": [120, 180, 900, 380], "text": "Q4 revenue table"},
{"id": "r2", "bbox": [100, 700, 1400, 980], "text": "footnotes"}
]- region-level context pruning before MLLM generation
- patch heatmap debugging and threshold tuning
- experiments on multimodal retrieval precision/recall trade-off
MIT
- Tune
thresholdandmin_ioutogether on a held-out set. - Use
explainoutput to debug failure cases before model retraining.