HunyuanVideo by Tencent-Hunyuan

PyTorch code for video generation research

Created 1 year ago

11,578 stars

Top 4.4% on SourcePulse

View on GitHub

4 Experts Love This Project

Alex Yu

Research Scientist at OpenAI; Cofounder of Luma AI

Lianmin Zheng

Coauthor of SGLang, vLLM

Didier Lopes

Founder of OpenBB

Jiaming Song

Chief Scientist at Luma AI

Project Summary

HunyuanVideo is an open-source framework for large-scale video generation, aiming to match or exceed closed-source model performance. It targets researchers and developers in AI video generation, offering a robust foundation for creating high-quality, diverse, and text-aligned video content.

How It Works

HunyuanVideo employs a unified architecture for image and video generation using a Transformer with Full Attention. It utilizes a "Dual-stream to Single-stream" approach, processing modalities separately before fusing them. A key innovation is the use of a Decoder-Only MLLM as a text encoder, offering improved image-text alignment and detail description over traditional CLIP or T5 encoders. Video compression is handled by a 3D VAE with CausalConv3D, reducing token count for efficient diffusion transformer processing.

Quick Start & Requirements

Install: Clone the repo, create a conda environment, install PyTorch (CUDA 11.8 or 12.4), flash-attention, and other dependencies via requirements.txt.
Prerequisites: NVIDIA GPU with CUDA 11.8/12.4+, Python 3.10.9.
Hardware: Minimum 45GB GPU memory for 544x960x129f, 60GB for 720x1280x129f. 80GB recommended. Linux OS.
Links: Project Page, Paper, Diffusers Integration.

Highlighted Details

Outperforms leading closed-source models in human evaluations, particularly in motion quality.
Offers FP8 quantized weights for reduced GPU memory usage.
Supports parallel inference via xDiT for multi-GPU acceleration.
Includes a prompt rewrite module for enhanced text-to-video alignment.
Released an Image-to-Video (I2V) model based on the same framework.

Maintenance & Community

Active development with recent releases including FP8 weights and Diffusers integration.
Community contributions are highlighted, including ComfyUI wrappers and optimization projects.
Links to WeChat and Discord are available for community engagement.

Licensing & Compatibility

The repository itself is not explicitly licensed in the README. Model weights are available on Hugging Face.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README mentions a "fast version" used for current releases, which differs from the "high-quality version" used in benchmark evaluations, implying potential quality trade-offs in the released model.
Installation can be complex, with specific CUDA and PyTorch version requirements and potential float point exceptions requiring troubleshooting.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

168 stars in the last 30 days