autodistill by autodistill

Tool for training supervised models using foundation models, no labeling needed

Created 2 years ago

2,565 stars

Top 18.1% on SourcePulse

Project Summary

Autodistill enables users to train custom computer vision models without manual data labeling by leveraging large foundation models. It targets developers and researchers seeking to rapidly deploy efficient, specialized models for edge or cloud inference, bypassing the traditional bottleneck of data annotation.

How It Works

Autodistill employs a distillation pipeline: a large, capable "Base Model" (e.g., Grounding SAM, LLaVA) processes unlabeled images using an "Ontology" to generate auto-labeled datasets. These datasets then train a smaller, faster "Target Model" (e.g., YOLOv8, DETR), resulting in a deployable "Distilled Model." This approach democratizes model training by reducing reliance on human annotators and expensive labeling services.

Quick Start & Requirements

Install via pip: pip install autodistill autodistill-grounded-sam autodistill-yolov8
Requires Python 3.8+.
Example command: autodistill images --base="grounding_dino" --target="yolov8" --ontology '{"prompt": "label"}' --output="./dataset"
Colab Notebook: how-to-auto-train-yolov8-model-with-autodistill.ipynb

Highlighted Details

Supports object detection, instance segmentation, and classification tasks.
Extensive compatibility table lists numerous Base and Target models (e.g., Grounding DINO, SAM-CLIP, YOLOv8, DETR).
Pluggable interface allows easy integration of new models.
Optional deployment to Roboflow for edge and cloud applications.

Maintenance & Community

Actively developed by the Roboflow team.
Community resources include tutorials, guides, and a roadmap.
Open to contributions via a contributing guide.

Licensing & Compatibility

The core autodistill package is licensed under Apache 2.0.
Individual Base and Target model plugins may have their own licenses; users must check each plugin.
Generally compatible with commercial use, provided underlying model licenses permit.

Limitations & Caveats

Performance and accuracy depend heavily on the chosen Base Model and Ontology configuration.
Some model integrations are marked as "work in progress."
The project is positioned as an evolving system with ongoing development.

Health Check

Last Commit

8 months ago

Responsiveness

1 week

Pull Requests (30d)

0

Issues (30d)

0

Star History

58 stars in the last 30 days

Explore Similar Projects

RSPrompter by KyanChen

PyTorch code for remote sensing instance segmentation via visual foundation models

Created 2 years ago

Updated 1 year ago

YOLOU by jizhishutong

Unified object detection study and deployment toolkit

Created 3 years ago

Updated 3 years ago

yolo_research by positive666

YOLO research and improvement project

Created 4 years ago

Updated 2 months ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and

Wing Lian

Wing Lian(Founder of Axolotl AI).

xtreme1 by xtreme1-io

Open-source platform for multimodal training data annotation

Created 3 years ago

Updated 6 months ago

Starred by

Calvin French-Owen

Calvin French-Owen(Cofounder of Segment),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

1 more.

igel by nidhaloff

ML tool for training, testing, and using models without code

Created 5 years ago

Updated 1 month ago

awesome-yolo-object-detection by coderonion

YOLO collection for object detection tasks

Created 3 years ago

Updated 7 months ago

practical-ml-vision-book by GoogleCloudPlatform

Code for computer vision book

Created 5 years ago

Updated 1 year ago

label-studio-ml-backend by HumanSignal

SDK for wrapping ML code into a web server for Label Studio automation

Created 5 years ago

Updated 2 days ago

Starred by

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI),

Travis Fischer

Travis Fischer(Founder of Agentic), and

8 more.

corenet by apple

DNN toolkit for training standard and novel models

Created 1 year ago

Updated 3 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Travis Fischer

Travis Fischer(Founder of Agentic), and

4 more.

oumi by oumi-ai

Open-source platform for end-to-end foundation model lifecycle

Created 1 year ago

Updated 1 day ago

Starred by

Bob van Luijt

Bob van Luijt(Cofounder of Weaviate) and

Lysandre Debut

Lysandre Debut(Chief Open-Source Officer at Hugging Face).

Semantic-Segmentation-Suite by GeorgeSeif

TensorFlow suite for semantic segmentation model training/testing

Created 8 years ago

Updated 4 years ago

mmsegmentation by open-mmlab

Semantic segmentation toolbox and benchmark

Created 5 years ago

Updated 1 year ago

Feedback? Help us improve.