musubi-tuner  by kohya-ss

LoRA training/inference scripts for video diffusion models

Created 8 months ago
1,134 stars

Top 33.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides scripts for training and inference of LoRA models for video generation using HunyuanVideo, Wan2.1, and FramePack architectures. It is an unofficial project aimed at researchers and power users interested in fine-tuning video diffusion models. The primary benefit is enabling custom LoRA training for these advanced video generation models.

How It Works

Musubi Tuner leverages Low-Rank Adaptation (LoRA) for efficient fine-tuning of large video diffusion models. It supports multiple architectures including HunyuanVideo, Wan2.1, and FramePack, offering flexibility in model choice. The project emphasizes pre-caching of latents and text encoder outputs to optimize training performance and reduce VRAM usage. It also incorporates features like PyTorch Dynamo optimization and various attention mechanisms (SDPA, FlashAttention, xformers, SageAttention) for speed and memory efficiency.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt or experimental uv installation.
  • Prerequisites: Python 3.10+, PyTorch 2.5.1+ with matching CUDA version (e.g., cu124). Recommended VRAM: 12GB for image training, 24GB+ for video training. Recommended RAM: 64GB+.
  • Setup: Requires downloading base models (HunyuanVideo, Wan2.1, FramePack) and configuring dataset paths via TOML files. Latent and text encoder output pre-caching are mandatory steps before training.
  • Docs: Dataset Configuration Guide, Training Guide, Inference Guide.

Highlighted Details

  • Supports training and inference for HunyuanVideo, Wan2.1, and FramePack.
  • Offers experimental support for PyTorch Dynamo optimization for faster training.
  • Includes options for memory-saving techniques like FP8, blocks_to_swap, and various attention mechanisms.
  • Provides scripts for LoRA merging and conversion to ComfyUI compatible formats.

Maintenance & Community

  • Active development with recent updates in April 2025 adding FramePack support and batch inference modes.
  • GitHub Discussions are enabled for community Q&A.
  • Project is supported via GitHub Sponsors.

Licensing & Compatibility

  • Primarily licensed under Apache License 2.0.
  • Code modified from HunyuanVideo follows its license. Code modified from Wan2.1 and FramePack also follows Apache 2.0.
  • Compatible with commercial use under Apache 2.0 terms.

Limitations & Caveats

This project is unofficial, experimental, and under active development, meaning features and APIs may change without notice. Video training features are still under development, and some functionalities may not work as expected. Production use is not recommended.

Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
34
Issues (30d)
54
Star History
168 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
5 more.

ai-toolkit by ostris

0.9%
6k
Training toolkit for finetuning diffusion models
Created 2 years ago
Updated 14 hours ago
Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), Lianmin Zheng Lianmin Zheng(Coauthor of SGLang, vLLM), and
2 more.

HunyuanVideo by Tencent-Hunyuan

0.2%
11k
PyTorch code for video generation research
Created 9 months ago
Updated 3 weeks ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Zack Li Zack Li(Cofounder of Nexa AI), and
19 more.

LLaVA by haotian-liu

0.2%
24k
Multimodal assistant with GPT-4 level capabilities
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.