HunyuanVideo-I2V  by Tencent-Hunyuan

Image-to-video generation framework

Created 6 months ago
1,678 stars

Top 25.2% on SourcePulse

GitHubView on GitHub
Project Summary

HunyuanVideo-I2V is an open-source PyTorch framework for image-to-video generation, built upon the HunyuanVideo model. It allows users to create videos from static images, offering customizable effects via LoRA training and enhanced inference speeds through parallel processing. The project targets researchers and developers interested in advanced video generation techniques.

How It Works

The model reconstructs reference image information into the video generation process using a token replacement technique. It leverages a pre-trained Multimodal Large Language Model (MLLM) with a decoder-only architecture as the text encoder. This MLLM processes the input image to generate semantic image tokens, which are then concatenated with video latent tokens. Full attention is computed across these combined tokens, enabling the model to understand and integrate both image and text modalities for coherent video generation.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies via requirements.txt. Conda environment setup is recommended.
  • Prerequisites: NVIDIA GPU with CUDA 12.4 or 11.8, Python 3.11.9.
  • GPU Memory: Minimum 60GB for 720p inference, 79GB for 360p LoRA training. Tested on 80GB GPUs.
  • Docker: Pre-built Docker image available for CUDA 12.4.
  • Documentation: Project Page, Pretrained Models.

Highlighted Details

  • Supports 720p resolution and up to 129 frames (5 seconds) of video generation.
  • Offers customizable LoRA training for special effects.
  • Integrates with xDiT for parallel inference, significantly reducing latency on multi-GPU setups.
  • Provides options for stable vs. high-dynamic video generation via --i2v-stability and --flow-shift parameters.

Maintenance & Community

  • Active development with recent updates (March 2025) for LoRA training and parallel inference.
  • Community contributions include ComfyUI wrappers and GPU-optimized versions.
  • Links to WeChat and Discord are available for community engagement.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires substantial GPU resources (60GB+ VRAM for inference, 79GB+ for training). The specific license for the model weights and code is not clearly stated, which may impact commercial adoption.

Health Check
Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
46 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.