Finetuning script for text-to-video models
Top 50.4% on sourcepulse
This repository provides tools for finetuning ModelScope's Text-to-Video model using the Diffusers library. It targets researchers and developers interested in customizing video generation models, offering capabilities for LoRA training and model conversion for web UIs.
How It Works
The project leverages the Diffusers library for finetuning video diffusion models. It supports training from scratch or finetuning existing models like ModelScope's Text-to-Video and community-provided checkpoints such as ZeroScope. The architecture allows for LoRA (Low-Rank Adaptation) training, enabling efficient finetuning with reduced computational resources, and includes options for gradient checkpointing and memory-efficient attention mechanisms (Xformers, Torch 2.0 SDP) to manage VRAM usage.
Quick Start & Requirements
git clone https://github.com/ExponentialML/Text-To-Video-Finetuning.git && cd Text-To-Video-Finetuning && git lfs install && git clone https://huggingface.co/damo-vilab/text-to-video-ms-1.7b ./models/model_scope_diffusers/
Highlighted Details
.ckpt
format.Maintenance & Community
The repository is archived and will no longer be updated, with the author recommending the damo-vilab/i2vgen-xl
repository for ongoing development. Issues and PRs are kept for posterity.
Licensing & Compatibility
The repository itself does not explicitly state a license in the README. The underlying models it references may have their own licenses. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
This repository is archived and no longer maintained. The author directs users to an alternative repository for current development. LoRA files trained with stable_lora
are not compatible with the repository's inference.py
script. Merging LoRA weights is not supported.
1 year ago
1 day