InfiniteTalk  by MeiGen-AI

Unlimited-length talking video generation

Created 1 month ago
1,639 stars

Top 25.7% on SourcePulse

GitHubView on GitHub
Project Summary

InfiniteTalk addresses the challenge of generating realistic talking videos from audio, supporting both image-to-video and video-to-video synthesis. It is designed for researchers and developers in AI-driven media creation, offering precise lip-sync and synchronized head movements, body posture, and facial expressions for unlimited video durations.

How It Works

InfiniteTalk utilizes a novel sparse-frame video dubbing framework. It synthesizes new video content by accurately synchronizing lip movements, head motion, body posture, and facial expressions with an input audio track. This approach allows for "infinite-length" video generation while maintaining identity consistency and reducing distortions compared to previous methods like MultiTalk.

Quick Start & Requirements

  • Installation: Requires creating a conda environment, installing PyTorch (2.4.1 with CUDA 12.1), xformers (0.0.28), flash-attn (2.7.4.post1), and other dependencies listed in requirements.txt. FFmpeg is also required.
  • Prerequisites: Python 3.10, CUDA 12.1, and significant disk space for model weights (Wan2.1-I2V-14B-480P, chinese-wav2vec2-base, MeiGen-InfiniteTalk).
  • Setup: Model weights must be downloaded using huggingface-cli.
  • Resources: Supports inference on single GPU, multi-GPU, and low VRAM configurations.
  • Demos: Gradio and ComfyUI integrations are available.

Highlighted Details

  • Achieves superior lip synchronization accuracy compared to MultiTalk.
  • Enables unlimited video duration generation.
  • Reduces hand/body distortions.
  • Supports both 480P and 720P resolutions.

Maintenance & Community

The project has released technical reports, weights, and code. Integrations with Wan2GP and ComfyUI are noted. A to-do list indicates ongoing development, including inference acceleration and LCM distillation.

Licensing & Compatibility

The models are licensed under the Apache 2.0 License. Users are granted freedom to use generated content, provided it complies with the license terms and does not involve illegal or harmful activities.

Limitations & Caveats

While InfiniteTalk supports long video generation, camera movement mimicry in video-to-video mode is not identical to the original; SDEdit can improve accuracy but may introduce color shifts. For image-to-video, color shifts can become pronounced beyond one minute, with a workaround involving image-to-video conversion via panning/zooming. FusionX LoRA can exacerbate color shifts and reduce identity preservation over longer durations.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
93
Star History
1,606 stars in the last 30 days

Explore Similar Projects

Starred by Shane Thomas Shane Thomas(Cofounder of Mastra), Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), and
2 more.

Wav2Lip by Rudrabha

0.2%
12k
Lip-syncing tool for generating videos from speech
Created 5 years ago
Updated 2 months ago
Feedback? Help us improve.