Discover and explore top open-source AI tools and projects—updated daily.
Finetuning script for Qwen2-VL and Qwen2.5-VL models
Top 34.7% on SourcePulse
This repository provides an open-source implementation for fine-tuning Alibaba Cloud's Qwen2-VL and Qwen2.5-VL multimodal large language models. It targets researchers and developers working with vision-language models, offering a streamlined process for adapting these powerful models to custom datasets and tasks, including support for multi-image and video inputs.
How It Works
The fine-tuning process leverages Hugging Face Transformers and the Liger-Kernel for memory-efficient training. It supports various fine-tuning strategies including supervised fine-tuning (SFT), full fine-tuning, and parameter-efficient methods like LoRA and DoRA. The implementation is designed to handle diverse data formats, including LLaVA-spec JSON files, and offers flexibility in configuring training parameters, learning rates for different model components (vision tower, projector, language model), and quantization.
Quick Start & Requirements
docker pull john119/vlm
followed by docker run --gpus all -it -v /host/path:/docker/path --name vlm --ipc=host john119/vlm /bin/bash
pip install -r requirements.txt --index-url https://download.pytorch.org/whl/cu126
and pip install qwen-vl-utils flash-attn --no-build-isolation
conda env create -f environment.yaml
, conda activate train
, pip install qwen-vl-utils flash-attn --no-build-isolation
cu126
wheel.Highlighted Details
Maintenance & Community
The project is actively updated, with recent additions including DPO support and Qwen2.5-VL compatibility. It is based on LLaVA-NeXT, Mipha, and Qwen2-VL-7B-Instruct projects.
Licensing & Compatibility
Licensed under the Apache-2.0 License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
Liger-kernel is not compatible with QLoRA. A known issue with libcudnn_cnn_train.so.8
may require unsetting LD_LIBRARY_PATH
. GRPO is listed as a future TODO.
2 days ago
1 day