InfiniteTalk by MeiGen-AI

Unlimited-length talking video generation

Created 5 months ago

4,308 stars

Top 11.3% on SourcePulse

Project Summary

InfiniteTalk addresses the challenge of generating realistic talking videos from audio, supporting both image-to-video and video-to-video synthesis. It is designed for researchers and developers in AI-driven media creation, offering precise lip-sync and synchronized head movements, body posture, and facial expressions for unlimited video durations.

How It Works

InfiniteTalk utilizes a novel sparse-frame video dubbing framework. It synthesizes new video content by accurately synchronizing lip movements, head motion, body posture, and facial expressions with an input audio track. This approach allows for "infinite-length" video generation while maintaining identity consistency and reducing distortions compared to previous methods like MultiTalk.

Quick Start & Requirements

Installation: Requires creating a conda environment, installing PyTorch (2.4.1 with CUDA 12.1), xformers (0.0.28), flash-attn (2.7.4.post1), and other dependencies listed in requirements.txt. FFmpeg is also required.
Prerequisites: Python 3.10, CUDA 12.1, and significant disk space for model weights (Wan2.1-I2V-14B-480P, chinese-wav2vec2-base, MeiGen-InfiniteTalk).
Setup: Model weights must be downloaded using huggingface-cli.
Resources: Supports inference on single GPU, multi-GPU, and low VRAM configurations.
Demos: Gradio and ComfyUI integrations are available.

Highlighted Details

Achieves superior lip synchronization accuracy compared to MultiTalk.
Enables unlimited video duration generation.
Reduces hand/body distortions.
Supports both 480P and 720P resolutions.

Maintenance & Community

The project has released technical reports, weights, and code. Integrations with Wan2GP and ComfyUI are noted. A to-do list indicates ongoing development, including inference acceleration and LCM distillation.

Licensing & Compatibility

The models are licensed under the Apache 2.0 License. Users are granted freedom to use generated content, provided it complies with the license terms and does not involve illegal or harmful activities.

Limitations & Caveats

While InfiniteTalk supports long video generation, camera movement mimicry in video-to-video mode is not identical to the original; SDEdit can improve accuracy but may introduce color shifts. For image-to-video, color shifts can become pronounced beyond one minute, with a workaround involving image-to-video conversion via panning/zooming. FusionX LoRA can exacerbate color shifts and reduce identity preservation over longer durations.

InfiniteTalk by MeiGen-AI

Explore Similar Projects

RealVideo by zai-org

dia2 by nari-labs

handcrafted-persona-engine by fagenorn

ComfyUI-VibeVoice by wildminder

fay-ue5 by xszyou

VITA by VITA-MLLM

TalkingHead by met4citizen

MultiTalk by MeiGen-AI

ComfyUI-WanVideoWrapper by kijai

video-retalking by OpenTalker

SadTalker by OpenTalker

Wav2Lip by Rudrabha