Cosmos-Tokenizer  by NVIDIA

Suite of neural tokenizers for image and video processing

created 9 months ago
1,656 stars

Top 26.0% on sourcepulse

GitHubView on GitHub
Project Summary

NVIDIA Cosmos Tokenizer provides a suite of image and video neural tokenizers designed for efficient visual data compression. It enables the development of large auto-regressive transformers and diffusion models by converting visual data into compact latent representations or discrete tokens, offering significant compression rates and improved performance over existing methods.

How It Works

Cosmos Tokenizer employs a neural network architecture to achieve high compression ratios for images (8x or 16x spatial) and videos (4x or 8x temporal), resulting in total compression factors up to 2048x. It offers both continuous latent space and discrete token outputs, allowing flexibility for different downstream model architectures. The approach prioritizes maintaining high visual quality while achieving faster processing speeds compared to state-of-the-art tokenizers.

Quick Start & Requirements

  • Installation: Clone the repository, install dependencies (ffmpeg, git-lfs), and install via pip3 install -e .. Docker is also provided.
  • Prerequisites: CUDA-enabled GPU (for inference), ffmpeg, git-lfs. Python 3.x.
  • Models: Pre-trained checkpoints are available on Hugging Face.
  • Resources: Requires downloading large checkpoint files.
  • Documentation: Website, Paper, Hugging Face.

Highlighted Details

  • Achieves up to 2048x total compression for video data.
  • Offers up to 12x faster processing than SOTA tokenizers.
  • Provides both continuous and discrete tokenization options.
  • Includes CLI and PyTorch inference APIs, with NeMo/TensorRT integration planned.

Maintenance & Community

  • Status: Repository is read-only as of February 10th, 2025; refer to NVIDIA/Cosmos for latest updates.
  • Contributors: Fitsum Reda, Jinwei Gu, Xian Liu, Songwei Ge, Ting-Chun Wang, Haoxiang Wang, Ming-Yu Liu.
  • Resources: NVIDIA Cosmos, NVIDIA Blog, YouTube.

Licensing & Compatibility

  • Models: NVIDIA Open Model License (commercially usable, allows derivative models).
  • Code: Apache 2.0 license.
  • Compatibility: Models are commercially usable.

Limitations & Caveats

The repository is now read-only, with active development and support moved to the NVIDIA/Cosmos repository. TensorRT inference is listed as "coming soon."

Health Check
Last commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
47 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.