Skip to content

Latest commit

 

History

History
58 lines (38 loc) · 2.5 KB

File metadata and controls

58 lines (38 loc) · 2.5 KB

Localized-Perception-Constrained-Vision-Systems

Implementation companion to Resource Efficient Perception for Vision Systems. The work targets high-resolution vision under tight GPU memory: images are processed in patches, combined with a global context, and fed to downstream heads so models can be trained and deployed on large fields of view—including on resource-constrained hardware (e.g. Jetson-class devices). Results are reported across seven benchmarks spanning classification, object detection, and segmentation.

Repository layout

Path Description
classification_detection_patchgd/ PatchGD for classification (patchGD.py, oursPatchGDv1.py, …) and detection (detection.py, fcos.py). See classification_detection_patchgd/README.md.
segmentation_patchgd/ PatchGD-style segmentation experiments.

Install

1. Python 3 (3.8+ recommended).

2. PyTorch — install the wheel that matches your CUDA or CPU from pytorch.org.

3. Per subproject

  • Classification & detection (this repo’s PatchGD scripts):

    cd classification_detection_patchgd
    pip install -r requirements.txt

    Dependencies covered: torch/torchvision, Pillow, numpy, opencv-python, matplotlib, torchmetrics, fvcore, pytorch-warmup, ultralytics. (Install torch first as above.)

  • Segmentation subfolder (pinned stack used there):

    pip install -r segmentation_patchgd/requirements.txt

Before training, set basePath in classification_detection_patchgd/constants.py to your data root.

Experiments

  1. Image classification (PatchGD, UltraMNIST, PANDA, ImageFolder datasets such as AID)
  2. Object detection (FCOS + patch-based features)
  3. Image segmentation (see segmentation_patchgd/)

Citation

@article{subramanyam2024resource,
  title={Resource Efficient Perception for Vision Systems},
  author={Subramanyam, A V and Singal, Niyati and Verma, Vinay K},
  journal={arXiv preprint arXiv:2405.07166},
  year={2024}
}

Conclusion

This repository studies localized, memory-bounded perception: patch-based computation preserves fine structure while global context stabilizes semantics, enabling competitive accuracy on large images within practical memory budgets.