Discover and explore top open-source AI tools and projects—updated daily.
zjystevenMinimal codebase for finetuning large multimodal models
Top 77.8% on SourcePulse
This repository provides a minimal, unified codebase for fine-tuning a wide array of large multimodal models (LMMs), including image-only, interleaved, and video models. It targets researchers and practitioners seeking a straightforward and flexible framework for experimenting with and adapting LMMs, leveraging Hugging Face's official implementations for seamless integration and inference.
How It Works
The framework abstracts core fine-tuning components like model loading and data collation, enabling easy integration of new LMMs. It utilizes Hugging Face's transformers library, ensuring that fine-tuned models retain compatibility with standard Hugging Face inference pipelines. The codebase prioritizes simplicity and transparency, making it easier to understand, modify, and quickly iterate on fine-tuning strategies, including full fine-tuning, LoRA, and Q-LoRA for the LLM component, and full fine-tuning or LoRA for vision encoders.
Quick Start & Requirements
conda create -n lmms-finetune python=3.10 -y; conda activate lmms-finetune), and install requirements (python -m pip install -r requirements.txt). Optionally install Flash Attention (python -m pip install --no-cache-dir --no-build-isolation flash-attn).Highlighted Details
python webui.py).merge_lora_weights.py) for merging LoRA weights into a standalone model.Maintenance & Community
transformers.Licensing & Compatibility
Limitations & Caveats
per_device_batch_size is 1 or if text-only instances dominate the dataset.transformers from GitHub is required.1 month ago
Inactive
BAAI-DCAI
OptimalScale
meta-llama
huggingface