Discover and explore top open-source AI tools and projects—updated daily.
NVIDIAPhysical AI platform with omnimodal world models
Top 6.1% on SourcePulse
NVIDIA Cosmos is an open platform for building Physical AI applications, targeting developers of robots, autonomous vehicles, and smart infrastructure. It provides world models, datasets, and tools to enable unified processing and generation across multiple modalities, simplifying the development of complex AI systems by integrating vision, language, audio, and action capabilities into a single framework.
How It Works
Cosmos 3 utilizes a unified Mixture-of-Transformers (MoT) architecture, combining an autoregressive (AR) transformer for reasoning tasks and a diffusion transformer (DM) for multimodal generation. This core architecture, enhanced by a 3D multi-dimensional rotary position embedding (mRoPE), enables consistent processing of spatial and temporal data across images, video, audio, and action sequences. The platform exposes two runtime surfaces: the Reasoner, which processes text and vision inputs to produce text outputs for understanding and planning, and the Generator, which handles multimodal inputs to generate vision, sound, and action sequences for simulation and policy learning.
Quick Start & Requirements
Installation typically involves using the uv package manager (uv venv, uv pip install). For research and development with the Generator, diffusers and related libraries are installed. Production inference for the Generator uses vLLM-Omni (via Docker or direct install), while the Reasoner uses vLLM. Key prerequisites include Linux, NVIDIA GPUs (Ampere, Hopper, Blackwell architectures), and CUDA 13 (recommended) or CUDA 12.8. Python 3.13 is used for environment management. A Hugging Face access token is required for authentication. Setup involves downloading models, which can be compute-intensive. Links to Cosmos Website, Hugging Face collection, and specific documentation for Diffusers and vLLM-Omni are available.
Highlighted Details
Maintenance & Community
The project is actively developed by NVIDIA, with releases announced via news updates. Specific community channels (e.g., Discord, Slack) or a public roadmap are not explicitly detailed in the README. Contact for custom licensing is provided via email.
Licensing & Compatibility
The NVIDIA Cosmos source code and models are released under the OpenMDW-1.1 License. This license may have specific terms regarding usage and distribution. For custom licensing requirements, users are directed to contact cosmos-license@nvidia.com. Compatibility for commercial use or closed-source linking would depend on the specific terms of the OpenMDW-1.1 License.
Limitations & Caveats
Cosmos 3 can produce artifacts in generated outputs, particularly in long, high-resolution, or physically complex scenarios. Common issues include temporal inconsistency, unstable motion, inaccurate sound-video alignment, imperfect action-state consistency, object morphing, inaccurate 3D structure, and implausible physical dynamics. Applications requiring high-fidelity simulation, safety-critical control, or complex multi-agent behaviors necessitate additional validation and system-level safety analysis before deployment.
13 hours ago
Inactive