cosmos-transfer1  by nvidia-cosmos

World-to-world transfer model for bridging simulated and real-world environments

Created 10 months ago
755 stars

Top 46.2% on SourcePulse

GitHubView on GitHub
Project Summary

Cosmos-Transfer1 is a multimodal conditional world generation model designed to bridge the gap between simulated and real-world environments for applications like robotics and autonomous vehicles. It enables users to generate visual simulations based on various input modalities, including segmentation, depth, edge, LiDAR, and HDMaps, with text prompts and optional RGB video conditioning.

How It Works

The model leverages a ControlNet-based architecture for single-modality generation and a MultiControlNet approach for multimodal inputs. This allows for flexible and precise control over generated visual simulations by combining multiple conditional signals with spatiotemporal control maps. An optional 4K upscaler is also provided for enhancing video resolution.

Quick Start & Requirements

  • Installation: Refer to INSTALL.md for environment setup.
  • Prerequisites: Multi-GPU support is recommended for inference and training. Specific model variants may have additional requirements (e.g., AV sample models).
  • Resources: Inference and training scripts are available for various pre-trained models, including multimodal, single-modality, AV-specific, and upscaling variants. Post-training and pre-training scripts are also provided.
  • Links: Product Website, Hugging Face, Paper, Paper Website.

Highlighted Details

  • Supports single-modality generation (e.g., depth, segmentation, edge, LiDAR) via ControlNet.
  • Enables multimodal generation with adaptive spatiotemporal control maps for combined inputs.
  • Includes specialized models for autonomous vehicle applications (LiDAR, HDMap).
  • Offers a 4K upscaler for 720p to 4K video resolution enhancement.

Maintenance & Community

The project is developed by NVIDIA. Further community engagement details are not specified in the README.

Licensing & Compatibility

  • Source Code: Apache 2.0 License.
  • Models: NVIDIA Open Model License.
  • Restrictions: A custom license is available upon contact. The model incorporates Llama Guard 3 for content moderation, subject to its own license.

Limitations & Caveats

Several model variants are marked as "Coming soon," indicating incomplete feature sets or availability. The project relies on third-party open-source software with separate licensing terms.

Health Check
Last Commit

6 days ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
1
Star History
13 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Jiaming Song Jiaming Song(Chief Scientist at Luma AI).

MoneyPrinterTurbo by harry0703

0.3%
49k
AI tool for one-click short video generation from text prompts
Created 1 year ago
Updated 4 weeks ago
Feedback? Help us improve.