Suite of neural tokenizers for image and video processing
Top 26.0% on sourcepulse
NVIDIA Cosmos Tokenizer provides a suite of image and video neural tokenizers designed for efficient visual data compression. It enables the development of large auto-regressive transformers and diffusion models by converting visual data into compact latent representations or discrete tokens, offering significant compression rates and improved performance over existing methods.
How It Works
Cosmos Tokenizer employs a neural network architecture to achieve high compression ratios for images (8x or 16x spatial) and videos (4x or 8x temporal), resulting in total compression factors up to 2048x. It offers both continuous latent space and discrete token outputs, allowing flexibility for different downstream model architectures. The approach prioritizes maintaining high visual quality while achieving faster processing speeds compared to state-of-the-art tokenizers.
Quick Start & Requirements
ffmpeg
, git-lfs
), and install via pip3 install -e .
. Docker is also provided.ffmpeg
, git-lfs
. Python 3.x.Highlighted Details
Maintenance & Community
NVIDIA/Cosmos
for latest updates.Licensing & Compatibility
Limitations & Caveats
The repository is now read-only, with active development and support moved to the NVIDIA/Cosmos
repository. TensorRT inference is listed as "coming soon."
5 months ago
Inactive