cosmos  by NVIDIA

Physical AI platform with omnimodal world models

Created 1 year ago
8,597 stars

Top 6.1% on SourcePulse

GitHubView on GitHub
Project Summary

NVIDIA Cosmos is an open platform for building Physical AI applications, targeting developers of robots, autonomous vehicles, and smart infrastructure. It provides world models, datasets, and tools to enable unified processing and generation across multiple modalities, simplifying the development of complex AI systems by integrating vision, language, audio, and action capabilities into a single framework.

How It Works

Cosmos 3 utilizes a unified Mixture-of-Transformers (MoT) architecture, combining an autoregressive (AR) transformer for reasoning tasks and a diffusion transformer (DM) for multimodal generation. This core architecture, enhanced by a 3D multi-dimensional rotary position embedding (mRoPE), enables consistent processing of spatial and temporal data across images, video, audio, and action sequences. The platform exposes two runtime surfaces: the Reasoner, which processes text and vision inputs to produce text outputs for understanding and planning, and the Generator, which handles multimodal inputs to generate vision, sound, and action sequences for simulation and policy learning.

Quick Start & Requirements

Installation typically involves using the uv package manager (uv venv, uv pip install). For research and development with the Generator, diffusers and related libraries are installed. Production inference for the Generator uses vLLM-Omni (via Docker or direct install), while the Reasoner uses vLLM. Key prerequisites include Linux, NVIDIA GPUs (Ampere, Hopper, Blackwell architectures), and CUDA 13 (recommended) or CUDA 12.8. Python 3.13 is used for environment management. A Hugging Face access token is required for authentication. Setup involves downloading models, which can be compute-intensive. Links to Cosmos Website, Hugging Face collection, and specific documentation for Diffusers and vLLM-Omni are available.

Highlighted Details

  • Omnimodal World Models: Cosmos 3 processes and generates language, images, video, audio, and action sequences within a unified framework.
  • Dual Surfaces: Offers a "Reasoner" for world understanding and decision-making, and a "Generator" for world simulation and synthetic data creation.
  • Model Family: Includes Cosmos3-Nano (16B) for compact applications and Cosmos3-Super (64B) for frontier-scale performance.
  • Flexible Deployment: Supports research via Hugging Face Diffusers and production inference via OpenAI-compatible APIs using vLLM-Omni (Generator) and vLLM (Reasoner).
  • Built-in Guardrails: Includes safety mechanisms for prompt screening and output blurring, with options to disable them per request or server-wide.

Maintenance & Community

The project is actively developed by NVIDIA, with releases announced via news updates. Specific community channels (e.g., Discord, Slack) or a public roadmap are not explicitly detailed in the README. Contact for custom licensing is provided via email.

Licensing & Compatibility

The NVIDIA Cosmos source code and models are released under the OpenMDW-1.1 License. This license may have specific terms regarding usage and distribution. For custom licensing requirements, users are directed to contact cosmos-license@nvidia.com. Compatibility for commercial use or closed-source linking would depend on the specific terms of the OpenMDW-1.1 License.

Limitations & Caveats

Cosmos 3 can produce artifacts in generated outputs, particularly in long, high-resolution, or physically complex scenarios. Common issues include temporal inconsistency, unstable motion, inaccurate sound-video alignment, imperfect action-state consistency, object morphing, inaccurate 3D structure, and implausible physical dynamics. Applications requiring high-fidelity simulation, safety-critical control, or complex multi-agent behaviors necessitate additional validation and system-level safety analysis before deployment.

Health Check
Last Commit

13 hours ago

Responsiveness

Inactive

Pull Requests (30d)
11
Issues (30d)
1
Star History
524 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.