Text-To-Video-Finetuning  by ExponentialML

Finetuning script for text-to-video models

Created 2 years ago
687 stars

Top 49.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides tools for finetuning ModelScope's Text-to-Video model using the Diffusers library. It targets researchers and developers interested in customizing video generation models, offering capabilities for LoRA training and model conversion for web UIs.

How It Works

The project leverages the Diffusers library for finetuning video diffusion models. It supports training from scratch or finetuning existing models like ModelScope's Text-to-Video and community-provided checkpoints such as ZeroScope. The architecture allows for LoRA (Low-Rank Adaptation) training, enabling efficient finetuning with reduced computational resources, and includes options for gradient checkpointing and memory-efficient attention mechanisms (Xformers, Torch 2.0 SDP) to manage VRAM usage.

Quick Start & Requirements

  • Install: git clone https://github.com/ExponentialML/Text-To-Video-Finetuning.git && cd Text-To-Video-Finetuning && git lfs install && git clone https://huggingface.co/damo-vilab/text-to-video-ms-1.7b ./models/model_scope_diffusers/
  • Environment: Python 3.10, PyTorch >= 2.0 recommended.
  • Hardware: RTX 3090 recommended; 16GB VRAM GPUs can train with optimizations (validation off, Xformers/SDP, gradient checkpointing, 256 resolution, LoRA).
  • Docs: https://github.com/ExponentialML/Text-To-Video-Finetuning

Highlighted Details

  • Supports LoRA training compatible with Stable Diffusion WebUI extensions.
  • Includes scripts for converting trained Diffusers models to .ckpt format.
  • Offers automatic video captioning using Video-BLIP2-Preprocessor.
  • Allows finetuning from various community models like ZeroScope and Potat1.

Maintenance & Community

The repository is archived and will no longer be updated, with the author recommending the damo-vilab/i2vgen-xl repository for ongoing development. Issues and PRs are kept for posterity.

Licensing & Compatibility

The repository itself does not explicitly state a license in the README. The underlying models it references may have their own licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

This repository is archived and no longer maintained. The author directs users to an alternative repository for current development. LoRA files trained with stable_lora are not compatible with the repository's inference.py script. Merging LoRA weights is not supported.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

METER by zdou0830

0%
373
Multimodal framework for vision-and-language transformer research
Created 3 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
10 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
Created 2 years ago
Updated 1 year ago
Starred by Matei Zaharia Matei Zaharia(Cofounder of Databricks), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
9 more.

LWM by LargeWorldModel

0.1%
7k
Multimodal autoregressive model for long-context video/text
Created 1 year ago
Updated 11 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
5 more.

ai-toolkit by ostris

0.9%
6k
Training toolkit for finetuning diffusion models
Created 2 years ago
Updated 13 hours ago
Feedback? Help us improve.