Discover and explore top open-source AI tools and projects—updated daily.
OmniCustom-projectMultimodal AI for synchronized audio-video generation
Top 70.0% on SourcePulse
OmniCustom is an open-source framework for synchronized audio-video customization, enabling users to generate videos that precisely match a reference image's visual identity and a reference audio's timbre, while allowing the speech content to be freely specified via text prompts. It targets researchers and developers in generative AI for media synthesis, offering a novel approach to controllable and synchronized AV content creation.
How It Works
OmniCustom employs a joint audio-video generation model. It takes a reference image and audio as input, preserving their respective visual and auditory characteristics. A textual prompt then dictates the speech content to be synthesized. The framework leverages and integrates several pre-trained models, including OVI for base generation, Naturalspeech 3 for timbre embeddings, InsightFace for face embeddings, and LivePortrait for reference image cropping, to achieve synchronized and customized AV output.
Quick Start & Requirements
python=3.10), install requirements (pip install -r requirements.txt), and install Flash Attention (pip install flash-attn --no-build-isolation).download_weights.py and huggingface-cli download to obtain necessary checkpoints (OmniCustom, Naturalspeech 3, InsightFace, LivePortrait, MMAudio, Wan2.2) and place them in the ckpts/ directory.OmniCustom/configs/inference/inference_fusion.yaml to control generation quality, resolution, and input balancing.bash ./inference.sh or run infer.py with specified configurations.Highlighted Details
Maintenance & Community
The provided README does not contain information regarding notable contributors, sponsorships, community channels (e.g., Discord, Slack), or a public roadmap.
Licensing & Compatibility
The license for the OmniCustom project is not specified in the README. This omission prevents an assessment of its compatibility for commercial use or integration into closed-source projects.
Limitations & Caveats
Currently, only inference codes and model checkpoints are publicly available; training codes and an evaluation benchmark are listed as future open-source targets. The substantial 80 GB VRAM requirement presents a significant barrier to entry for users without high-end hardware. The absence of a specified license is a critical adoption blocker.
1 week ago
Inactive
haoheliu