Discover and explore top open-source AI tools and projects—updated daily.
Audio-driven multi-person conversational video generation
Top 18.9% on SourcePulse
MultiTalk is an open-source framework for generating audio-driven multi-person conversational videos. It allows users to create videos featuring multiple characters interacting, singing, or performing cartoon actions, driven by audio input and optional text prompts. The primary benefit is enabling realistic, synchronized multi-character video generation from audio.
How It Works
MultiTalk utilizes a novel framework that takes multi-stream audio, a reference image, and a prompt to generate videos. It focuses on achieving consistent lip synchronization with the audio and enabling direct virtual human control via prompts. The architecture supports realistic conversations, interactive character control, and generalization to cartoon characters and singing, with flexible resolution output.
Quick Start & Requirements
pip install -r requirements.txt
. FFmpeg installation is also necessary.Wan2.1-I2V-14B-480P
, chinese-wav2vec2-base
, MeiGen-MultiTalk
) and linking or copying the MeiGen-MultiTalk
weights into the base model directory.num_persistent_param_in_dit 0
), multi-GPU inference, and offers optimizations like TeaCache (2-3x speedup) and INT8 quantization.Highlighted Details
Maintenance & Community
The project has seen recent updates (July 2025) including INT8 quantization, SageAttention2.2, and FusionX LoRA support. Community contributions are highlighted, with integrations into platforms like Replicate, Gradio demos, and ComfyUI. A Google Colab example is also available.
Licensing & Compatibility
The models are licensed under the Apache 2.0 License. The license grants freedom to use generated content, provided usage complies with the license terms and applicable laws, prohibiting harmful, illegal, or misleading content.
Limitations & Caveats
While 720p inference is mentioned, the current code primarily supports 480p, with 720p requiring multiple GPUs. Longer video generation (beyond 81 frames) may reduce prompt-following performance. The project is actively being developed, with items like LCM distillation and a 1.3B model still on the todo list.
1 month ago
Inactive