HunyuanVideo  by Tencent-Hunyuan

PyTorch code for video generation research

Created 9 months ago
11,043 stars

Top 4.6% on SourcePulse

GitHubView on GitHub
Project Summary

HunyuanVideo is an open-source framework for large-scale video generation, aiming to match or exceed closed-source model performance. It targets researchers and developers in AI video generation, offering a robust foundation for creating high-quality, diverse, and text-aligned video content.

How It Works

HunyuanVideo employs a unified architecture for image and video generation using a Transformer with Full Attention. It utilizes a "Dual-stream to Single-stream" approach, processing modalities separately before fusing them. A key innovation is the use of a Decoder-Only MLLM as a text encoder, offering improved image-text alignment and detail description over traditional CLIP or T5 encoders. Video compression is handled by a 3D VAE with CausalConv3D, reducing token count for efficient diffusion transformer processing.

Quick Start & Requirements

  • Install: Clone the repo, create a conda environment, install PyTorch (CUDA 11.8 or 12.4), flash-attention, and other dependencies via requirements.txt.
  • Prerequisites: NVIDIA GPU with CUDA 11.8/12.4+, Python 3.10.9.
  • Hardware: Minimum 45GB GPU memory for 544x960x129f, 60GB for 720x1280x129f. 80GB recommended. Linux OS.
  • Links: Project Page, Paper, Diffusers Integration.

Highlighted Details

  • Outperforms leading closed-source models in human evaluations, particularly in motion quality.
  • Offers FP8 quantized weights for reduced GPU memory usage.
  • Supports parallel inference via xDiT for multi-GPU acceleration.
  • Includes a prompt rewrite module for enhanced text-to-video alignment.
  • Released an Image-to-Video (I2V) model based on the same framework.

Maintenance & Community

  • Active development with recent releases including FP8 weights and Diffusers integration.
  • Community contributions are highlighted, including ComfyUI wrappers and optimization projects.
  • Links to WeChat and Discord are available for community engagement.

Licensing & Compatibility

  • The repository itself is not explicitly licensed in the README. Model weights are available on Hugging Face.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The README mentions a "fast version" used for current releases, which differs from the "high-quality version" used in benchmark evaluations, implying potential quality trade-offs in the released model.
  • Installation can be complex, with specific CUDA and PyTorch version requirements and potential float point exceptions requiring troubleshooting.
Health Check
Last Commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
5
Star History
178 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.