Skip to content

eloriana/patch2region-lite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

patch2region-lite

A compact Python toolkit for region-level scoring from patch-level relevance maps.

It is designed for multimodal RAG pipelines where you have:

  • patch/token relevance scores from a VLM encoder
  • OCR bounding boxes from a document/image parser
  • a need to rank/filter only the useful regions

Core idea

  1. Convert patch index to patch bounding box.
  2. Compute IoU between each patch and each OCR region.
  3. Aggregate patch scores into a region relevance score with IoU weighting.
  4. Keep only regions above threshold.
  5. Optionally normalize by overlap to reduce large-box bias.

No extra training required. This is pure inference-time post-processing.

Install

pip install -e .

CLI example

patch2region-lite score \
  --patch-scores examples/patch_scores.json \
  --regions examples/ocr_regions.json \
  --image-width 1600 \
  --image-height 1200 \
  --min-iou 0.02 \
  --output /tmp/regions_ranked.json

# inspect which patches contributed to one region
patch2region-lite explain \
  --patch-scores examples/patch_scores.json \
  --regions examples/ocr_regions.json \
  --region-id r3 \
  --image-width 1600 \
  --image-height 1200 \
  --output /tmp/r3_explain.json

Input format

patch_scores.json

{
  "n_patches_x": 4,
  "n_patches_y": 4,
  "scores": [0.1, 0.2, 0.9, 0.8, 0.2, 0.1, 0.7, 0.5, 0.1, 0.1, 0.6, 0.4, 0.1, 0.2, 0.3, 0.2]
}

ocr_regions.json

[
  {"id": "r1", "bbox": [120, 180, 900, 380], "text": "Q4 revenue table"},
  {"id": "r2", "bbox": [100, 700, 1400, 980], "text": "footnotes"}
]

Use cases

  • region-level context pruning before MLLM generation
  • patch heatmap debugging and threshold tuning
  • experiments on multimodal retrieval precision/recall trade-off

License

MIT

Practical Tips

  • Tune threshold and min_iou together on a held-out set.
  • Use explain output to debug failure cases before model retraining.

About

Inference-time patch-to-region relevance propagation for multimodal retrieval

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages