Discover and explore top open-source AI tools and projects—updated daily.
RobvanGastelFinetuning self-supervised vision encoders for segmentation
Top 86.9% on SourcePulse
Summary This repository enables efficient finetuning of DINOv2 and DINOv3 self-supervised visual encoders for tasks like image segmentation. It targets researchers and engineers seeking to adapt powerful pre-trained models with minimal computational overhead. By employing Low-Rank Adaptation (LoRA), it allows task-specific finetuning with significantly fewer trainable parameters, preserving original encoder weights and reducing resource demands.
How It Works The project leverages pre-trained DINOv2 or DINOv3 encoders, renowned for their robust natural image domain representations. Finetuning is achieved via Low-Rank Adaptation (LoRA), which injects small, trainable low-rank matrices into transformer layers, freezing most pre-trained weights for efficiency. A lightweight 1x1 convolution or Feature Pyramid Network (FPN) decoder is trained atop the adapted encoder for segmentation tasks, facilitating effective transfer learning.
Quick Start & Requirements Installation requires a Python 3.11 Conda environment:
conda create --name dino python=3.11
conda activate dino
pip install -e .
Advanced visualization with FeatUp necessitates specific CUDA toolkit development tools (cudatoolkit-dev) and cuDNN, along with environment variable configuration (CUDA_HOME, LD_LIBRARY_PATH). An example finetuning command for VOC is:
python main.py --exp_name base_voc --dataset voc --size base --dino_type dinov3 --img_dim 308 308 --epochs 50 --use_fpn
Walkthroughs are available in Explanation.ipynb and Embedding_visualization.ipynb.
Highlighted Details
Maintenance & Community Maintained by RobvanGastel, with recent updates in August/September 2025. No specific community channels or roadmap are detailed in the README.
Licensing & Compatibility The README does not explicitly state the project's license, requiring further investigation for commercial use or integration into closed-source projects.
Limitations & Caveats The FeatUp visualization setup demands complex CUDA/cuDNN configuration. The absence of a stated software license is a potential adoption blocker. Performance on corrupted datasets can fluctuate.
1 week ago
Inactive
milesial
huggingface