maestro by roboflow

CLI/SDK for fine-tuning multimodal models

Created 2 years ago

2,652 stars

Top 17.6% on SourcePulse

View on GitHub

6 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Georgios Konstantopoulos

CTO, General Partner at Paradigm

Omar Sanseviero

DevRel at Google DeepMind

Jeremy Howard

Cofounder of fast.ai

and 2 more!

Project Summary

Maestro is a Python library designed to simplify the fine-tuning of multimodal models, specifically targeting vision-language models (VLMs) like Florence-2, PaliGemma 2, and Qwen2.5-VL. It aims to accelerate the fine-tuning process for researchers and developers by providing a unified interface for configuration, data handling, and training loop setup, encapsulating best practices for reproducibility and efficiency.

How It Works

Maestro leverages a modular architecture, with each supported model having its own core training module. This approach allows for tailored optimization strategies such as LoRA and QLoRA, and features like graph freezing to manage hardware requirements. The library standardizes data handling through a JSONL format and offers a single CLI and Python API to abstract away complex setup, promoting a streamlined and consistent fine-tuning workflow.

Quick Start & Requirements

Install: pip install "maestro[paligemma_2]" (install model-specific dependencies).
Prerequisites: Python environment, model-specific dependencies.
Usage: CLI (maestro paligemma_2 train --dataset "dataset/location" ...) or Python API (from maestro.trainer.models.paligemma_2.core import train).
Resources: Colab notebooks are available for quick experimentation.

Highlighted Details

Supports fine-tuning of Florence-2, PaliGemma 2, and Qwen2.5-VL.
Integrates LoRA, QLoRA, and graph freezing for efficient training.
Provides a unified CLI and Python API for simplified workflow.
Uses a consistent JSONL format for data handling.

Maintenance & Community

The project is actively maintained by Roboflow. Community discussions and contributions are welcomed via GitHub Discussions. A Discord server is available for support and conversation.

Licensing & Compatibility

The specific license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification of the license terms.

Limitations & Caveats

Some fine-tuning recipes are marked as experimental (e.g., Florence-2 object detection, Qwen2.5-VL object detection). The README recommends creating dedicated Python environments for each model due to potential dependency conflicts.

Health Check

Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days