CLI/SDK for fine-tuning multimodal models
Top 18.4% on sourcepulse
Maestro is a Python library designed to simplify the fine-tuning of multimodal models, specifically targeting vision-language models (VLMs) like Florence-2, PaliGemma 2, and Qwen2.5-VL. It aims to accelerate the fine-tuning process for researchers and developers by providing a unified interface for configuration, data handling, and training loop setup, encapsulating best practices for reproducibility and efficiency.
How It Works
Maestro leverages a modular architecture, with each supported model having its own core training module. This approach allows for tailored optimization strategies such as LoRA and QLoRA, and features like graph freezing to manage hardware requirements. The library standardizes data handling through a JSONL format and offers a single CLI and Python API to abstract away complex setup, promoting a streamlined and consistent fine-tuning workflow.
Quick Start & Requirements
pip install "maestro[paligemma_2]"
(install model-specific dependencies).maestro paligemma_2 train --dataset "dataset/location" ...
) or Python API (from maestro.trainer.models.paligemma_2.core import train
).Highlighted Details
Maintenance & Community
The project is actively maintained by Roboflow. Community discussions and contributions are welcomed via GitHub Discussions. A Discord server is available for support and conversation.
Licensing & Compatibility
The specific license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification of the license terms.
Limitations & Caveats
Some fine-tuning recipes are marked as experimental (e.g., Florence-2 object detection, Qwen2.5-VL object detection). The README recommends creating dedicated Python environments for each model due to potential dependency conflicts.
5 days ago
1 week