Text-To-Video-Finetuning by ExponentialML

Finetuning script for text-to-video models

Created 2 years ago

692 stars

Top 49.3% on SourcePulse

View on GitHub

3 Experts Love This Project

Chenlin Meng

Cofounder of Pika

Omar Sanseviero

DevRel at Google DeepMind

Ajay Jain

Cofounder of Genmo

Project Summary

This repository provides tools for finetuning ModelScope's Text-to-Video model using the Diffusers library. It targets researchers and developers interested in customizing video generation models, offering capabilities for LoRA training and model conversion for web UIs.

How It Works

The project leverages the Diffusers library for finetuning video diffusion models. It supports training from scratch or finetuning existing models like ModelScope's Text-to-Video and community-provided checkpoints such as ZeroScope. The architecture allows for LoRA (Low-Rank Adaptation) training, enabling efficient finetuning with reduced computational resources, and includes options for gradient checkpointing and memory-efficient attention mechanisms (Xformers, Torch 2.0 SDP) to manage VRAM usage.

Quick Start & Requirements

Install: git clone https://github.com/ExponentialML/Text-To-Video-Finetuning.git && cd Text-To-Video-Finetuning && git lfs install && git clone https://huggingface.co/damo-vilab/text-to-video-ms-1.7b ./models/model_scope_diffusers/
Environment: Python 3.10, PyTorch >= 2.0 recommended.
Hardware: RTX 3090 recommended; 16GB VRAM GPUs can train with optimizations (validation off, Xformers/SDP, gradient checkpointing, 256 resolution, LoRA).
Docs: https://github.com/ExponentialML/Text-To-Video-Finetuning

Highlighted Details

Supports LoRA training compatible with Stable Diffusion WebUI extensions.
Includes scripts for converting trained Diffusers models to .ckpt format.
Offers automatic video captioning using Video-BLIP2-Preprocessor.
Allows finetuning from various community models like ZeroScope and Potat1.

Maintenance & Community

The repository is archived and will no longer be updated, with the author recommending the damo-vilab/i2vgen-xl repository for ongoing development. Issues and PRs are kept for posterity.

Licensing & Compatibility

The repository itself does not explicitly state a license in the README. The underlying models it references may have their own licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

This repository is archived and no longer maintained. The author directs users to an alternative repository for current development. LoRA files trained with stable_lora are not compatible with the repository's inference.py script. Merging LoRA weights is not supported.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days