Discover and explore top open-source AI tools and projects—updated daily.
NVlabsOmni-modal LLM for joint perception and reasoning
Top 55.4% on SourcePulse
OmniVinci: Omni-Modal LLM for Joint Understanding
OmniVinci is an open-source, omni-modal Large Language Model (LLM) designed for joint understanding across vision, audio, and language. It addresses the need for AI systems that perceive and reason across multiple sensory inputs, offering enhanced performance with significantly reduced training data compared to existing models. The project targets researchers and developers working on multimodal AI applications in fields like robotics, medical AI, and smart factories.
How It Works
OmniVinci introduces three key architectural innovations: OmniAlignNet for strengthening alignment between vision and audio embeddings in a shared latent space, Temporal Embedding Grouping for capturing relative temporal alignment between vision and audio signals, and Constrained Rotary Time Embedding for encoding absolute temporal information. These are supported by a data curation and synthesis pipeline generating 24 million single-modal and omni-modal conversations. This approach allows modalities to mutually reinforce perception and reasoning, leading to improved performance and efficiency.
Quick Start & Requirements
huggingface-cli download nvidia/omnivinci --local-dir ./omnivinci --local-dir-use-symlinks False. Set up the Python environment using bash ./environment_setup.sh (based on NVILA codebase).transformers library. The example uses torch_dtype="torch.float16" and device_map="auto", suggesting GPU acceleration is beneficial.https://huggingface.co/nvidia/omnivinciHighlighted Details
Maintenance & Community
The README does not provide specific details on community channels (e.g., Discord, Slack), roadmap, or notable sponsorships. The project is presented as an initiative from NVIDIA.
Licensing & Compatibility
The README does not explicitly state the software license. It references an arXiv paper for the research, which typically implies research-oriented usage. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The inference code requires trust_remote_code=True, necessitating careful security review. The project is presented as a recent release ("OmniVinci-9B is released!"), and detailed limitations or known issues are not explicitly listed in the provided README excerpt.
1 month ago
Inactive
open-mmlab