HunyuanVideo-I2V  by Tencent-Hunyuan

Image-to-video generation framework

created 5 months ago
1,602 stars

Top 26.7% on sourcepulse

GitHubView on GitHub
Project Summary

HunyuanVideo-I2V is an open-source PyTorch framework for image-to-video generation, built upon the HunyuanVideo model. It allows users to create videos from static images, offering customizable effects via LoRA training and enhanced inference speeds through parallel processing. The project targets researchers and developers interested in advanced video generation techniques.

How It Works

The model reconstructs reference image information into the video generation process using a token replacement technique. It leverages a pre-trained Multimodal Large Language Model (MLLM) with a decoder-only architecture as the text encoder. This MLLM processes the input image to generate semantic image tokens, which are then concatenated with video latent tokens. Full attention is computed across these combined tokens, enabling the model to understand and integrate both image and text modalities for coherent video generation.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies via requirements.txt. Conda environment setup is recommended.
  • Prerequisites: NVIDIA GPU with CUDA 12.4 or 11.8, Python 3.11.9.
  • GPU Memory: Minimum 60GB for 720p inference, 79GB for 360p LoRA training. Tested on 80GB GPUs.
  • Docker: Pre-built Docker image available for CUDA 12.4.
  • Documentation: Project Page, Pretrained Models.

Highlighted Details

  • Supports 720p resolution and up to 129 frames (5 seconds) of video generation.
  • Offers customizable LoRA training for special effects.
  • Integrates with xDiT for parallel inference, significantly reducing latency on multi-GPU setups.
  • Provides options for stable vs. high-dynamic video generation via --i2v-stability and --flow-shift parameters.

Maintenance & Community

  • Active development with recent updates (March 2025) for LoRA training and parallel inference.
  • Community contributions include ComfyUI wrappers and GPU-optimized versions.
  • Links to WeChat and Discord are available for community engagement.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires substantial GPU resources (60GB+ VRAM for inference, 79GB+ for training). The specific license for the model weights and code is not clearly stated, which may impact commercial adoption.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
3
Star History
240 stars in the last 90 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Author of SGLang), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
1 more.

Open-Sora-Plan by PKU-YuanGroup

0.1%
12k
Open-source project aiming to reproduce Sora-like T2V model
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.