cosmos-transfer1 by nvidia-cosmos

World-to-world transfer model for bridging simulated and real-world environments

Created 1 year ago

778 stars

Top 45.0% on SourcePulse

Project Summary

Cosmos-Transfer1 is a multimodal conditional world generation model designed to bridge the gap between simulated and real-world environments for applications like robotics and autonomous vehicles. It enables users to generate visual simulations based on various input modalities, including segmentation, depth, edge, LiDAR, and HDMaps, with text prompts and optional RGB video conditioning.

How It Works

The model leverages a ControlNet-based architecture for single-modality generation and a MultiControlNet approach for multimodal inputs. This allows for flexible and precise control over generated visual simulations by combining multiple conditional signals with spatiotemporal control maps. An optional 4K upscaler is also provided for enhancing video resolution.

Quick Start & Requirements

Installation: Refer to INSTALL.md for environment setup.
Prerequisites: Multi-GPU support is recommended for inference and training. Specific model variants may have additional requirements (e.g., AV sample models).
Resources: Inference and training scripts are available for various pre-trained models, including multimodal, single-modality, AV-specific, and upscaling variants. Post-training and pre-training scripts are also provided.
Links: Product Website, Hugging Face, Paper, Paper Website.

Highlighted Details

Supports single-modality generation (e.g., depth, segmentation, edge, LiDAR) via ControlNet.
Enables multimodal generation with adaptive spatiotemporal control maps for combined inputs.
Includes specialized models for autonomous vehicle applications (LiDAR, HDMap).
Offers a 4K upscaler for 720p to 4K video resolution enhancement.

Maintenance & Community

The project is developed by NVIDIA. Further community engagement details are not specified in the README.

Licensing & Compatibility

Source Code: Apache 2.0 License.
Models: NVIDIA Open Model License.
Restrictions: A custom license is available upon contact. The model incorporates Llama Guard 3 for content moderation, subject to its own license.

Limitations & Caveats

Several model variants are marked as "Coming soon," indicating incomplete feature sets or availability. The project relies on third-party open-source software with separate licensing terms.

cosmos-transfer1 by nvidia-cosmos

Explore Similar Projects

t2v-turbo by Ji4chenLi

VideoTuna by VideoVerses

YUME by stdstu12

LongVie by Vchitect

MoneyPrinterAICreate by q1uki

kandinsky-5 by kandinskylab

HY-WorldPlay by Tencent-Hunyuan

guizang-s-prompt by op7418

EasyAnimate by aigc-apps

LTX-2 by Lightricks

ShortGPT by RayVentura

MoneyPrinterTurbo by harry0703