Vision foundation model for distilling large models
Top 30.8% on SourcePulse
NVlabs/RADIO provides an Agglomerative Vision Foundation Model (AM-RADIO) designed to distill multiple vision foundation models into a single, versatile backbone. It aims to serve as a superior replacement for traditional vision backbones across various domains, offering strong performance in image classification, segmentation, and vision-language tasks. The framework is suitable for researchers and developers seeking a unified and high-performing vision model.
How It Works
RADIO integrates diverse vision foundation models like CLIP variants, DINOv2, and SAM through a distillation process. This agglomerative approach allows it to preserve and combine unique features from its teachers, such as text grounding and segmentation correspondence. The model architecture is based on Vision Transformers (ViTs) and supports arbitrary input resolutions, including non-square images, with an efficient variant (E-RADIO) offering significant speedups.
Quick Start & Requirements
torch.hub.load('NVlabs/RADIO', 'radio_model', version='radio_v2.5-h', progress=True)
AutoModel.from_pretrained("nvidia/RADIO", trust_remote_code=True)
Highlighted Details
clip
, siglip
, dino_v2
, sam
) and allows fetching intermediate layer activations.Maintenance & Community
The project is actively developed by NVIDIA Research, with multiple versions released, including RADIOv2.5, C-RADIO (for commercial use), and related research papers like PHI-S and FeatSharp.
Licensing & Compatibility
The primary license is the NVIDIA Source Code License-NC, which restricts commercial use. However, the C-RADIO
model is released under the NVIDIA Open Model License Agreement, permitting commercial products.
Limitations & Caveats
E-RADIO has limitations, primarily supporting images divisible by 32 for optimal performance, and its efficiency relies on correctly setting the window size for attention blocks. Older versions (e.g., RADIOv2.1) had mode-switching issues at higher resolutions, which have been addressed in RADIOv2.5.
1 day ago
1 day