cosmos-transfer1  by nvidia-cosmos

World-to-world transfer model for bridging simulated and real-world environments

created 5 months ago
565 stars

Top 57.8% on sourcepulse

GitHubView on GitHub
Project Summary

Cosmos-Transfer1 is a multimodal conditional world generation model designed to bridge the gap between simulated and real-world environments for applications like robotics and autonomous vehicles. It enables users to generate visual simulations based on various input modalities, including segmentation, depth, edge, LiDAR, and HDMaps, with text prompts and optional RGB video conditioning.

How It Works

The model leverages a ControlNet-based architecture for single-modality generation and a MultiControlNet approach for multimodal inputs. This allows for flexible and precise control over generated visual simulations by combining multiple conditional signals with spatiotemporal control maps. An optional 4K upscaler is also provided for enhancing video resolution.

Quick Start & Requirements

  • Installation: Refer to INSTALL.md for environment setup.
  • Prerequisites: Multi-GPU support is recommended for inference and training. Specific model variants may have additional requirements (e.g., AV sample models).
  • Resources: Inference and training scripts are available for various pre-trained models, including multimodal, single-modality, AV-specific, and upscaling variants. Post-training and pre-training scripts are also provided.
  • Links: Product Website, Hugging Face, Paper, Paper Website.

Highlighted Details

  • Supports single-modality generation (e.g., depth, segmentation, edge, LiDAR) via ControlNet.
  • Enables multimodal generation with adaptive spatiotemporal control maps for combined inputs.
  • Includes specialized models for autonomous vehicle applications (LiDAR, HDMap).
  • Offers a 4K upscaler for 720p to 4K video resolution enhancement.

Maintenance & Community

The project is developed by NVIDIA. Further community engagement details are not specified in the README.

Licensing & Compatibility

  • Source Code: Apache 2.0 License.
  • Models: NVIDIA Open Model License.
  • Restrictions: A custom license is available upon contact. The model incorporates Llama Guard 3 for content moderation, subject to its own license.

Limitations & Caveats

Several model variants are marked as "Coming soon," indicating incomplete feature sets or availability. The project relies on third-party open-source software with separate licensing terms.

Health Check
Last commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
12
Issues (30d)
17
Star History
193 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.