HunyuanVideo-I2V by Tencent-Hunyuan

Image-to-video generation framework

Created 10 months ago

1,764 stars

Top 24.1% on SourcePulse

Project Summary

HunyuanVideo-I2V is an open-source PyTorch framework for image-to-video generation, built upon the HunyuanVideo model. It allows users to create videos from static images, offering customizable effects via LoRA training and enhanced inference speeds through parallel processing. The project targets researchers and developers interested in advanced video generation techniques.

How It Works

The model reconstructs reference image information into the video generation process using a token replacement technique. It leverages a pre-trained Multimodal Large Language Model (MLLM) with a decoder-only architecture as the text encoder. This MLLM processes the input image to generate semantic image tokens, which are then concatenated with video latent tokens. Full attention is computed across these combined tokens, enabling the model to understand and integrate both image and text modalities for coherent video generation.

Quick Start & Requirements

Installation: Clone the repository and install dependencies via requirements.txt. Conda environment setup is recommended.
Prerequisites: NVIDIA GPU with CUDA 12.4 or 11.8, Python 3.11.9.
GPU Memory: Minimum 60GB for 720p inference, 79GB for 360p LoRA training. Tested on 80GB GPUs.
Docker: Pre-built Docker image available for CUDA 12.4.
Documentation: Project Page, Pretrained Models.

Highlighted Details

Supports 720p resolution and up to 129 frames (5 seconds) of video generation.
Offers customizable LoRA training for special effects.
Integrates with xDiT for parallel inference, significantly reducing latency on multi-GPU setups.
Provides options for stable vs. high-dynamic video generation via --i2v-stability and --flow-shift parameters.

Maintenance & Community

Active development with recent updates (March 2025) for LoRA training and parallel inference.
Community contributions include ComfyUI wrappers and GPU-optimized versions.
Links to WeChat and Discord are available for community engagement.

Licensing & Compatibility

The repository does not explicitly state a license in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires substantial GPU resources (60GB+ VRAM for inference, 79GB+ for training). The specific license for the model weights and code is not clearly stated, which may impact commercial adoption.

HunyuanVideo-I2V by Tencent-Hunyuan

Explore Similar Projects

ShareGPT-4o-Image by FreedomIntelligence

OmniGen2 by VectorSpaceLab

GPT4Scene-and-VLN-R1 by Qi-Zhangyang

UniWorld by PKU-YuanGroup

UltraPixel by catcathh

kandinsky-5 by kandinskylab

Lumina-mGPT-2.0 by Alpha-VLLM

ELLA by TencentQQGYLab

InternLM-XComposer by InternLM

Video-LLaVA by PKU-YuanGroup

sd-webui-EasyPhoto by aigc-apps

minimind-v by jingyaogong