t2v-turbo by Ji4chenLi

Text-to-video generation research paper implementation

Created 1 year ago

308 stars

Top 87.3% on SourcePulse

Project Summary

This repository provides official implementations for T2V-Turbo and T2V-Turbo-v2, advanced text-to-video generation models. It targets researchers and developers looking to achieve fast, high-quality video synthesis, addressing the quality bottleneck in video consistency models through novel reward feedback and conditional guidance techniques.

How It Works

The models build upon diffusion-based video generation, incorporating techniques like mixed reward feedback and enhanced conditional guidance. T2V-Turbo-v2 specifically focuses on improving post-training through data augmentation, reward mechanisms, and refined conditional guidance, aiming for superior video quality and consistency with fewer inference steps.

Quick Start & Requirements

Installation: pip install accelerate transformers diffusers webdataset loralib peft pytorch_lightning open_clip_torch==2.24.0 hpsv2 image-reward peft wandb av einops packaging omegaconf opencv-python kornia moviepy imageio torchdata==0.8.0 decord torchaudio bitsandbytes langdetect scipy git+https://github.com/openai/CLIP.git
Optional (for performance): pip install flash-attn --no-build-isolation after cloning https://github.com/Dao-AILab/flash-attention.git. conda install xformers -c xformers.
Prerequisites: PyTorch, Hugging Face libraries, Gradio for demos, and specific model checkpoints (e.g., VideoCrafter2). CUDA is recommended for GPU acceleration.
Demo: Launch with python app.py after downloading checkpoints and installing Gradio.
Docs: Project pages linked in README: T2V-Turbo, T2V-Turbo-v2

Highlighted Details

Achieves 16-step generation for T2V-Turbo-v2 and 4-step generation for T2V-Turbo.
Supports multiple resolutions, including 320x512 and 256x256.
Offers local Gradio demos for interactive testing.
Includes scripts for both training and inference.

Maintenance & Community

Active development with releases for T2V-Turbo-v2 in October 2024.
Replicate demo and API available for T2V-Turbo-v2.
Primary contributor: Ji4chenLi.

Licensing & Compatibility

The repository does not explicitly state a license in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify licensing, which may impact commercial adoption. MacOS users require specific device configuration (mps) for demos, and Intel GPU users need xpu. Training requires specific data preparation (WebVid-10M in webdataset format) and additional model downloads.

t2v-turbo by Ji4chenLi

Explore Similar Projects

dolphin by kaleido-lab

Awesome-Open-AI-Sora by Curated-Awesome-Lists

FreeNoise by AILab-CVC

VideoTuna by VideoVerses

Allegro by rhymes-ai

VBench by Vchitect

HunyuanVideo-1.5 by Tencent-Hunyuan

Step-Video-T2V by stepfun-ai

mochi by genmoai

Wan2.1 by Wan-Video

Open-Sora by hpcaitech

MoneyPrinterTurbo by harry0703