Discover and explore top open-source AI tools and projects—updated daily.
OpenMOSSScalable foundation model for synchronized video-audio generation
New!
Top 48.1% on SourcePulse
MOVA is a foundation model addressing the challenge of synchronized video and audio generation, aiming to break the "silent era" of open-source video synthesis. It targets researchers and developers seeking high-fidelity, perfectly aligned video and audio outputs, offering a significant benefit over cascaded pipelines by generating both modalities simultaneously in a single inference pass.
How It Works
MOVA employs an Asymmetric Dual-Tower Architecture, leveraging pre-trained video and audio models. These towers are fused via a bidirectional cross-attention mechanism, enabling rich modality interaction. This native bimodal generation approach avoids error accumulation inherent in cascaded systems and achieves precise lip-sync and environment-aware sound effects.
Quick Start & Requirements
conda create -n mova python=3.13 -y, conda activate mova) and install the package (pip install -e .). Training requires pip install -e ".[train]".hf download OpenMOSS-Team/MOVA-360p).Highlighted Details
Maintenance & Community
The project was released on January 29, 2026. Key features like checkpoints, multi-GPU inference, LoRA fine-tuning, NPU support, and SGLang integration are complete. Pending items include a Technical Report, Generation Workflow, and Diffusers Integration. Acknowledgements list contributions from several other open-source projects. No direct community links (Discord/Slack) or social handles are provided.
Licensing & Compatibility
The license type is not explicitly stated in the provided README. This lack of explicit licensing information poses a significant adoption blocker, particularly for commercial use or integration into closed-source projects.
Limitations & Caveats
Training 8-second, 360p videos on consumer hardware like an RTX 4090 is not recommended due to high resource requirements and slow training speeds; reducing resolution or frame count is suggested. A technical report and Diffusers integration are still pending. The absence of a clearly defined license is a critical caveat for adoption.
6 days ago
Inactive
haoheliu
lucidrains