LoRA training/inference scripts for video diffusion models
Top 44.1% on sourcepulse
This repository provides scripts for training and inference of LoRA models for video generation using HunyuanVideo, Wan2.1, and FramePack architectures. It is an unofficial project aimed at researchers and power users interested in fine-tuning video diffusion models. The primary benefit is enabling custom LoRA training for these advanced video generation models.
How It Works
Musubi Tuner leverages Low-Rank Adaptation (LoRA) for efficient fine-tuning of large video diffusion models. It supports multiple architectures including HunyuanVideo, Wan2.1, and FramePack, offering flexibility in model choice. The project emphasizes pre-caching of latents and text encoder outputs to optimize training performance and reduce VRAM usage. It also incorporates features like PyTorch Dynamo optimization and various attention mechanisms (SDPA, FlashAttention, xformers, SageAttention) for speed and memory efficiency.
Quick Start & Requirements
pip install -r requirements.txt
or experimental uv
installation.cu124
). Recommended VRAM: 12GB for image training, 24GB+ for video training. Recommended RAM: 64GB+.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
This project is unofficial, experimental, and under active development, meaning features and APIs may change without notice. Video training features are still under development, and some functionalities may not work as expected. Production use is not recommended.
2 days ago
1 day