cosmos-cookbook  by nvidia-cosmos

NVIDIA Cosmos ecosystem recipes for real-world AI applications

Created 6 months ago
278 stars

Top 93.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides post-training scripts and samples for the NVIDIA Cosmos ecosystem, featuring World Foundation Models (WFMs) designed for robotics, simulation, autonomous systems, and physical scene understanding. It targets engineers and researchers seeking to leverage WFMs for real-world, domain-specific AI applications, offering step-by-step workflows and case studies to accelerate development.

How It Works

The Cosmos Cookbook guides users through the NVIDIA Cosmos ecosystem, which includes advanced WFMs like Cosmos Predict for visuomotor control and planning, and Cosmos Reason for tasks such as 3D autonomous vehicle grounding, industrial safety, and video analysis. The approach focuses on providing practical recipes and post-training scripts that demonstrate the application of these models, enabling state-of-the-art performance in areas like robot policy learning and intelligent transportation scene understanding.

Quick Start & Requirements

  • Installation: Requires Git LFS, Python 3.10+, and uv (Python package manager). Core setup involves cloning the repository and running just install. Documentation can be served locally via just serve-external.
  • Prerequisites: Full GPU workflows are strictly supported on Ubuntu Linux with NVIDIA GPUs. macOS and Windows are not supported for GPU tasks (WSL recommended for Windows). Cloud deployment options are available.
  • Links: Full Documentation (asset link): https://github.com/user-attachments/assets/bb444b93-d6af-4e25-8bd0-ca5891b26276. Git LFS: git-lfs.com. uv: astral.sh/uv.

Highlighted Details

  • Features recipes for fine-tuning video models for visuomotor control, achieving high accuracy on benchmarks like LIBERO (98.33%) and RoboCasa (71.1% SOTA).
  • Includes workflows for 3D autonomous vehicle grounding and intelligent transportation scene understanding using Cosmos Reason.
  • Demonstrates zero-shot industrial safety compliance and hazard detection in warehouse environments.
  • Offers GPU-accelerated video analysis pipelines for summarization, Q&A, and live stream alerts.

Maintenance & Community

The project includes a contributing guide and utilizes GitHub Issues for bug reporting and feature requests. NVIDIA GTC 2026 sessions and the NVIDIA Cosmos Cookoff competition are highlighted as community engagement activities.

Licensing & Compatibility

The source code is licensed under Apache 2.0, while the NVIDIA Cosmos models are under the NVIDIA Open Model License. Custom licensing inquiries can be directed to cosmos-license@nvidia.com. The Apache 2.0 license generally permits commercial use, but the model license terms require specific review.

Limitations & Caveats

GPU-accelerated workflows are exclusively supported on Ubuntu Linux; other operating systems lack direct support for these core functionalities. The repository's reliance on numerous media files necessitates Git LFS for proper cloning and management.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
27
Issues (30d)
6
Star History
55 stars in the last 30 days

Explore Similar Projects

Starred by Andrew Ng Andrew Ng(Founder of DeepLearning.AI; Cofounder of Coursera; Professor at Stanford), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

vision-agent by landing-ai

0.1%
5k
Visual AI agent for generating runnable vision code from image/video prompts
Created 2 years ago
Updated 4 weeks ago
Feedback? Help us improve.