t2v-turbo  by Ji4chenLi

Text-to-video generation research paper implementation

created 1 year ago
302 stars

Top 89.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides official implementations for T2V-Turbo and T2V-Turbo-v2, advanced text-to-video generation models. It targets researchers and developers looking to achieve fast, high-quality video synthesis, addressing the quality bottleneck in video consistency models through novel reward feedback and conditional guidance techniques.

How It Works

The models build upon diffusion-based video generation, incorporating techniques like mixed reward feedback and enhanced conditional guidance. T2V-Turbo-v2 specifically focuses on improving post-training through data augmentation, reward mechanisms, and refined conditional guidance, aiming for superior video quality and consistency with fewer inference steps.

Quick Start & Requirements

  • Installation: pip install accelerate transformers diffusers webdataset loralib peft pytorch_lightning open_clip_torch==2.24.0 hpsv2 image-reward peft wandb av einops packaging omegaconf opencv-python kornia moviepy imageio torchdata==0.8.0 decord torchaudio bitsandbytes langdetect scipy git+https://github.com/openai/CLIP.git
  • Optional (for performance): pip install flash-attn --no-build-isolation after cloning https://github.com/Dao-AILab/flash-attention.git. conda install xformers -c xformers.
  • Prerequisites: PyTorch, Hugging Face libraries, Gradio for demos, and specific model checkpoints (e.g., VideoCrafter2). CUDA is recommended for GPU acceleration.
  • Demo: Launch with python app.py after downloading checkpoints and installing Gradio.
  • Docs: Project pages linked in README: T2V-Turbo, T2V-Turbo-v2

Highlighted Details

  • Achieves 16-step generation for T2V-Turbo-v2 and 4-step generation for T2V-Turbo.
  • Supports multiple resolutions, including 320x512 and 256x256.
  • Offers local Gradio demos for interactive testing.
  • Includes scripts for both training and inference.

Maintenance & Community

  • Active development with releases for T2V-Turbo-v2 in October 2024.
  • Replicate demo and API available for T2V-Turbo-v2.
  • Primary contributor: Ji4chenLi.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify licensing, which may impact commercial adoption. MacOS users require specific device configuration (mps) for demos, and Intel GPU users need xpu. Training requires specific data preparation (WebVid-10M in webdataset format) and additional model downloads.

Health Check
Last commit

6 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Author of SGLang), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
1 more.

Open-Sora-Plan by PKU-YuanGroup

0.1%
12k
Open-source project aiming to reproduce Sora-like T2V model
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.