Discover and explore top open-source AI tools and projects—updated daily.
Ji4chenLiText-to-video generation research paper implementation
Top 87.6% on SourcePulse
This repository provides official implementations for T2V-Turbo and T2V-Turbo-v2, advanced text-to-video generation models. It targets researchers and developers looking to achieve fast, high-quality video synthesis, addressing the quality bottleneck in video consistency models through novel reward feedback and conditional guidance techniques.
How It Works
The models build upon diffusion-based video generation, incorporating techniques like mixed reward feedback and enhanced conditional guidance. T2V-Turbo-v2 specifically focuses on improving post-training through data augmentation, reward mechanisms, and refined conditional guidance, aiming for superior video quality and consistency with fewer inference steps.
Quick Start & Requirements
pip install accelerate transformers diffusers webdataset loralib peft pytorch_lightning open_clip_torch==2.24.0 hpsv2 image-reward peft wandb av einops packaging omegaconf opencv-python kornia moviepy imageio torchdata==0.8.0 decord torchaudio bitsandbytes langdetect scipy git+https://github.com/openai/CLIP.gitpip install flash-attn --no-build-isolation after cloning https://github.com/Dao-AILab/flash-attention.git. conda install xformers -c xformers.python app.py after downloading checkpoints and installing Gradio.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README does not specify licensing, which may impact commercial adoption. MacOS users require specific device configuration (mps) for demos, and Intel GPU users need xpu. Training requires specific data preparation (WebVid-10M in webdataset format) and additional model downloads.
9 months ago
Inactive
Wan-Video
harry0703