A lightweight and parameter-efficient dense readout on top of frozen vision backbones (DINOv3) that matches or outperforms state-of-the-art PEFT methods across semantic segmentation, object detection, pose estimation, and contour prediction — with very few trainable parameters, making it robust to overfitting on small datasets. Can be combined with LoRA for further flexibility.
Code coming soon.