Implementation companion to Resource Efficient Perception for Vision Systems. The work targets high-resolution vision under tight GPU memory: images are processed in patches, combined with a global context, and fed to downstream heads so models can be trained and deployed on large fields of view—including on resource-constrained hardware (e.g. Jetson-class devices). Results are reported across seven benchmarks spanning classification, object detection, and segmentation.
| Path | Description |
|---|---|
classification_detection_patchgd/ |
PatchGD for classification (patchGD.py, oursPatchGDv1.py, …) and detection (detection.py, fcos.py). See classification_detection_patchgd/README.md. |
segmentation_patchgd/ |
PatchGD-style segmentation experiments. |
1. Python 3 (3.8+ recommended).
2. PyTorch — install the wheel that matches your CUDA or CPU from pytorch.org.
3. Per subproject
-
Classification & detection (this repo’s PatchGD scripts):
cd classification_detection_patchgd pip install -r requirements.txtDependencies covered:
torch/torchvision,Pillow,numpy,opencv-python,matplotlib,torchmetrics,fvcore,pytorch-warmup,ultralytics. (Installtorchfirst as above.) -
Segmentation subfolder (pinned stack used there):
pip install -r segmentation_patchgd/requirements.txt
Before training, set basePath in classification_detection_patchgd/constants.py to your data root.
- Image classification (PatchGD, UltraMNIST, PANDA, ImageFolder datasets such as AID)
- Object detection (FCOS + patch-based features)
- Image segmentation (see
segmentation_patchgd/)
@article{subramanyam2024resource,
title={Resource Efficient Perception for Vision Systems},
author={Subramanyam, A V and Singal, Niyati and Verma, Vinay K},
journal={arXiv preprint arXiv:2405.07166},
year={2024}
}This repository studies localized, memory-bounded perception: patch-based computation preserves fine structure while global context stabilizes semantics, enabling competitive accuracy on large images within practical memory budgets.