finetune-Qwen2-VL  by zhangfaen

Fine-tuning script for Qwen2-VL models

created 11 months ago
366 stars

Top 78.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a streamlined Python-based solution for fine-tuning the Qwen2-VL multimodal large language model, targeting researchers and developers who wish to adapt the model to custom datasets. It offers a simpler alternative to heavier frameworks like LLaMA-Factory, enabling quick experimentation and deployment of fine-tuned Qwen2-VL models.

How It Works

The project leverages Hugging Face's transformers library and accelerate for distributed training. It supports mixed-precision training (bfloat16 + float32) for improved validation loss and incorporates flash_attention2 for enhanced training efficiency. The code is designed for clarity, allowing users to easily integrate their own data and training loops.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies via pip install -r requirements.txt.
  • Prerequisites: Python 3.10, CUDA (implied for multi-GPU), av for video data processing.
  • Usage: Run ./finetune.sh for single-GPU or ./finetune_distributed.sh for multi-GPU fine-tuning. Testing can be done with test_on_official_model.py and test_on_trained_model_by_us.py.
  • Resources: Video data processing requires significant GPU RAM; batch size may need adjustment.

Highlighted Details

  • Supports fine-tuning Qwen2.5-VL-3B (as of 2025/02/08).
  • Utilizes torchvision.io.VideoReader for faster video data loading.
  • Includes example fine-tuning scripts and toy data for quick start.
  • Demonstrates fine-tuning with multi-GPU using Hugging Face Accelerate and DeepSpeed plugins.

Maintenance & Community

The project is maintained by zhangfaen. Updates indicate ongoing development, including support for newer Qwen2.5 models and performance optimizations.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, it is built upon Hugging Face models, which typically have permissive licenses suitable for commercial use, but this should be verified.

Limitations & Caveats

The provided toy data is minimal, and the training code does not include an evaluation step by default, requiring manual implementation for comprehensive assessment. The README notes that video data can be memory-intensive, potentially necessitating batch size reductions or configuration adjustments.

Health Check
Last commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
13 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

InternEvo by InternLM

1.0%
402
Lightweight training framework for model pre-training
created 1 year ago
updated 1 week ago
Feedback? Help us improve.