musubi-tuner  by kohya-ss

LoRA training/inference scripts for video diffusion models

created 7 months ago
821 stars

Top 44.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides scripts for training and inference of LoRA models for video generation using HunyuanVideo, Wan2.1, and FramePack architectures. It is an unofficial project aimed at researchers and power users interested in fine-tuning video diffusion models. The primary benefit is enabling custom LoRA training for these advanced video generation models.

How It Works

Musubi Tuner leverages Low-Rank Adaptation (LoRA) for efficient fine-tuning of large video diffusion models. It supports multiple architectures including HunyuanVideo, Wan2.1, and FramePack, offering flexibility in model choice. The project emphasizes pre-caching of latents and text encoder outputs to optimize training performance and reduce VRAM usage. It also incorporates features like PyTorch Dynamo optimization and various attention mechanisms (SDPA, FlashAttention, xformers, SageAttention) for speed and memory efficiency.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt or experimental uv installation.
  • Prerequisites: Python 3.10+, PyTorch 2.5.1+ with matching CUDA version (e.g., cu124). Recommended VRAM: 12GB for image training, 24GB+ for video training. Recommended RAM: 64GB+.
  • Setup: Requires downloading base models (HunyuanVideo, Wan2.1, FramePack) and configuring dataset paths via TOML files. Latent and text encoder output pre-caching are mandatory steps before training.
  • Docs: Dataset Configuration Guide, Training Guide, Inference Guide.

Highlighted Details

  • Supports training and inference for HunyuanVideo, Wan2.1, and FramePack.
  • Offers experimental support for PyTorch Dynamo optimization for faster training.
  • Includes options for memory-saving techniques like FP8, blocks_to_swap, and various attention mechanisms.
  • Provides scripts for LoRA merging and conversion to ComfyUI compatible formats.

Maintenance & Community

  • Active development with recent updates in April 2025 adding FramePack support and batch inference modes.
  • GitHub Discussions are enabled for community Q&A.
  • Project is supported via GitHub Sponsors.

Licensing & Compatibility

  • Primarily licensed under Apache License 2.0.
  • Code modified from HunyuanVideo follows its license. Code modified from Wan2.1 and FramePack also follows Apache 2.0.
  • Compatible with commercial use under Apache 2.0 terms.

Limitations & Caveats

This project is unofficial, experimental, and under active development, meaning features and APIs may change without notice. Video training features are still under development, and some functionalities may not work as expected. Production use is not recommended.

Health Check
Last commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
8
Issues (30d)
22
Star History
223 stars in the last 90 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Author of SGLang), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
1 more.

Open-Sora-Plan by PKU-YuanGroup

0.1%
12k
Open-source project aiming to reproduce Sora-like T2V model
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.