Discover and explore top open-source AI tools and projects—updated daily.
nvidia-cosmosPhysical AI and robotics reasoning VLM
Top 93.5% on SourcePulse
Summary
NVIDIA Cosmos-Reason2 provides open, customizable reasoning vision language models (VLMs) for physical AI and robotics. These models enable embodied agents to understand and act in the real world by employing long chain-of-thought reasoning, physics understanding, and common sense, addressing complex scenarios beyond human annotations.
How It Works
Cosmos Reason models are post-trained using physical common sense and embodied reasoning data, incorporating supervised fine-tuning and reinforcement learning. They leverage chain-of-thought capabilities to interpret world dynamics and spatial-temporal relationships without requiring human annotations. This approach allows robots and vision AI agents to plan actions by reasoning about physics and common sense, excelling in diverse, long-tail physical world scenarios.
Quick Start & Requirements
uv) or Docker container.curl, ffmpeg, git, git-lfs, unzip), Python environment manager (uv), Hugging Face CLI, NVIDIA GPU with CUDA 12.8/13.0, NVIDIA Container Toolkit (for Docker).Highlighted Details
transformers library and is recommended for online serving via vllm.Maintenance & Community
Recent updates include improved documentation and troubleshooting guidance. No specific community links (e.g., Discord, Slack) or notable contributor details are provided in the README.
Licensing & Compatibility
Limitations & Caveats
The repository primarily contains documentation, examples, and utilities; inference can be run independently. Official validation is limited to Hopper and Blackwell architectures; other configurations may not be supported. vLLM inference for Jetson AGX Thor is noted as "coming soon." The NVIDIA Open Model License may impose restrictions on commercial deployment.
4 weeks ago
Inactive