no-time-to-train  by miquel-espinosa

Training-free instance segmentation via reference images

Created 7 months ago
281 stars

Top 92.9% on SourcePulse

GitHubView on GitHub
Project Summary

This project addresses the high cost of data annotation for instance segmentation by introducing a training-free, reference-based approach. It enables users to segment new object instances using only a few reference images, eliminating the need for extensive fine-tuning or complex prompt engineering. The primary benefit is achieving state-of-the-art performance with significantly reduced data and computational overhead, making advanced segmentation accessible for researchers and practitioners with limited resources.

How It Works

The core methodology leverages powerful foundation models, specifically DinoV2 for semantic feature extraction and SAM2 for segmentation. The system constructs a memory bank from provided reference images, aggregates their representations, and then employs semantic-aware feature matching to identify correspondences between these references and target images. This allows for the automatic generation of instance-level segmentation masks directly, bypassing traditional training pipelines.

Quick Start & Requirements

Installation involves cloning the repository, creating a conda environment from environment.yml, and installing SAM2 and DinoV2 from source. Users must download the COCO dataset and pre-trained SAM2/DinoV2 checkpoints. Key dependencies include conda, pip, git, wget, a GPU (CUDA likely required), and the specified datasets/checkpoints. Official project page and arXiv paper links are provided for further details.

Highlighted Details

  • Training-Free: Operates without any fine-tuning or prompt engineering, relying solely on reference images.
  • SOTA Performance: Achieves state-of-the-art results on benchmarks like COCO (36.8% nAP for 30-shot), PASCAL VOC Few-Shot (71.2% nAP50), and Cross-Domain FSOD, surpassing previous training-free methods.
  • Foundation Model Integration: Effectively combines DinoV2 and SAM2 for robust semantic understanding and segmentation.
  • Custom Dataset Support: Offers detailed instructions and scripts for adapting the pipeline to custom datasets, including automatic mask generation from bounding boxes using SAM2.

Maintenance & Community

The project is associated with its authors: Miguel Espinosa, Chenhongyi Yang, Linus Ericsson, Steven McDonagh, and Elliot J. Crowley. Recent updates in July 2025 suggest ongoing development. However, the README does not provide links to community channels (e.g., Discord, Slack) or a public roadmap.

Licensing & Compatibility

The specific open-source license for this repository is not explicitly stated in the provided README. This omission makes it difficult to assess compatibility for commercial use or closed-source integration without further inquiry.

Limitations & Caveats

The project is explicitly described as "research code — expect a bit of chaos!", indicating potential instability or incomplete features. Performance can be sensitive to the quality and characteristics of reference images (e.g., mask area, center location). Analysis suggests potential confusion between visually similar classes due to backbone feature geometry overlap.

Health Check
Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
9 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.