Skip to content

kushagrathisside/Image-Segmentation-Using-ViT-Finetuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Fine-Tuning DINOv2 for Terrain Understanding using RUGD

Overview

This project focuses on adapting a self-supervised Vision Transformer (DINOv2) for terrain understanding in unstructured outdoor environments using the RUGD dataset.

While DINOv2 provides strong general-purpose visual representations, real-world robotics applications require domain adaptation to:

  • Terrain-specific textures
  • Traversability patterns
  • Natural environmental variability

This work fine-tunes DINOv2 to improve feature sensitivity toward ground-level semantic distinctions relevant to robotics and navigation systems. rugd


Objective

To bridge the gap between:

Foundation Vision Models
and
Terrain-aware perception for real-world deployment

By adapting DINOv2 embeddings for:

  • Outdoor ground scene understanding
  • Unstructured terrain classification
  • Robust visual representation learning

Dataset

RUGD (Robot Unstructured Ground Driving Dataset)

Used for:

  • Ground-level scene diversity
  • Natural terrain variability
  • Robotics-relevant visual semantics

Includes terrain categories such as:

  • Dirt
  • Grass
  • Gravel
  • Asphalt
  • Vegetation
  • Obstacles

This enables learning of representations aligned with real-world navigation contexts.


Methodology

1. Feature Extraction

  • Leveraged pretrained DINOv2 Vision Transformer backbone
  • Extracted patch-level embeddings from terrain images

2. Fine-Tuning Strategy

  • Adapted representation layers to RUGD-specific terrain patterns
  • Improved discriminability across terrain classes
  • Preserved foundational visual generalization

3. Supervised Adaptation

  • Trained classification head on extracted features
  • Optimized model for terrain-aware semantic separation

4. Evaluation

  • Assessed class-level performance
  • Observed improved contextual sensitivity to ground textures

Key Contributions

  • Adapted a foundation vision model for robotics terrain perception
  • Enabled domain-specific representation learning without full model retraining
  • Demonstrated applicability of self-supervised models in real-world robotic sensing tasks

Applications

This approach supports:

  • Vision-based navigation
  • Autonomous mobility
  • Terrain classification systems
  • Cross-domain robustness studies

Especially relevant for:

  • Robotics
  • Autonomous systems
  • Field AI deployments

Tech Stack

  • DINOv2
  • PyTorch
  • Vision Transformers
  • RUGD Dataset

Future Work

  • Integration with mobility or navigation pipelines
  • Robustness testing across varied environmental conditions
  • Transferability analysis to unseen terrains

About

FineTuned ViT for segmentation tasks on multiple frame sizes and optimizations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors