cosmos-reason2  by nvidia-cosmos

Physical AI and robotics reasoning VLM

Created 6 months ago
278 stars

Top 93.5% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

NVIDIA Cosmos-Reason2 provides open, customizable reasoning vision language models (VLMs) for physical AI and robotics. These models enable embodied agents to understand and act in the real world by employing long chain-of-thought reasoning, physics understanding, and common sense, addressing complex scenarios beyond human annotations.

How It Works

Cosmos Reason models are post-trained using physical common sense and embodied reasoning data, incorporating supervised fine-tuning and reinforcement learning. They leverage chain-of-thought capabilities to interpret world dynamics and spatial-temporal relationships without requiring human annotations. This approach allows robots and vision AI agents to plan actions by reasoning about physics and common sense, excelling in diverse, long-tail physical world scenarios.

Quick Start & Requirements

  • Primary install/run command: Installation via Python virtual environment (using uv) or Docker container.
  • Non-default prerequisites: System dependencies (curl, ffmpeg, git, git-lfs, unzip), Python environment manager (uv), Hugging Face CLI, NVIDIA GPU with CUDA 12.8/13.0, NVIDIA Container Toolkit (for Docker).
  • Hardware: Minimum 24GB GPU memory for Cosmos-Reason2-2B, 32GB for Cosmos-Reason2-8B. Tested on Hopper and Blackwell architectures (NVIDIA H100, GB200, DGX Spark, NVIDIA Jetson AGX Thor).
  • Links: Hugging Face (models), Cosmos Cookbook, Transformers (>=4.57.0), vLLM (>=0.11.0).

Highlighted Details

  • Model Family: Cosmos-Reason2-2B and Cosmos-Reason2-8B are available.
  • Core Capability: Focuses on physical common sense and embodied reasoning for AI agents.
  • Spatial-Temporal Understanding: Designed to interpret complex real-world dynamics and physics.
  • Deployment: Integrates with the transformers library and is recommended for online serving via vllm.

Maintenance & Community

Recent updates include improved documentation and troubleshooting guidance. No specific community links (e.g., Discord, Slack) or notable contributor details are provided in the README.

Licensing & Compatibility

  • License Type: NVIDIA Cosmos source code is released under the Apache 2.0 License. NVIDIA Cosmos models are released under the NVIDIA Open Model License.
  • Compatibility: Custom licenses are available via cosmos-license@nvidia.com. Commercial use requires careful review of the NVIDIA Open Model License terms.

Limitations & Caveats

The repository primarily contains documentation, examples, and utilities; inference can be run independently. Official validation is limited to Hopper and Blackwell architectures; other configurations may not be supported. vLLM inference for Jetson AGX Thor is noted as "coming soon." The NVIDIA Open Model License may impose restrictions on commercial deployment.

Health Check
Last Commit

4 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
3
Star History
62 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.