Cosmos-Tokenizer by NVIDIA

Suite of neural tokenizers for image and video processing

Created 1 year ago

1,697 stars

Top 24.7% on SourcePulse

View on GitHub

2 Experts Love This Project

Alex Yu

Research Scientist at OpenAI; Cofounder of Luma AI

Phil Wang

Prolific Research Paper Implementer

Project Summary

NVIDIA Cosmos Tokenizer provides a suite of image and video neural tokenizers designed for efficient visual data compression. It enables the development of large auto-regressive transformers and diffusion models by converting visual data into compact latent representations or discrete tokens, offering significant compression rates and improved performance over existing methods.

How It Works

Cosmos Tokenizer employs a neural network architecture to achieve high compression ratios for images (8x or 16x spatial) and videos (4x or 8x temporal), resulting in total compression factors up to 2048x. It offers both continuous latent space and discrete token outputs, allowing flexibility for different downstream model architectures. The approach prioritizes maintaining high visual quality while achieving faster processing speeds compared to state-of-the-art tokenizers.

Quick Start & Requirements

Installation: Clone the repository, install dependencies (ffmpeg, git-lfs), and install via pip3 install -e .. Docker is also provided.
Prerequisites: CUDA-enabled GPU (for inference), ffmpeg, git-lfs. Python 3.x.
Models: Pre-trained checkpoints are available on Hugging Face.
Resources: Requires downloading large checkpoint files.
Documentation: Website, Paper, Hugging Face.

Highlighted Details

Achieves up to 2048x total compression for video data.
Offers up to 12x faster processing than SOTA tokenizers.
Provides both continuous and discrete tokenization options.
Includes CLI and PyTorch inference APIs, with NeMo/TensorRT integration planned.

Maintenance & Community

Status: Repository is read-only as of February 10th, 2025; refer to NVIDIA/Cosmos for latest updates.
Contributors: Fitsum Reda, Jinwei Gu, Xian Liu, Songwei Ge, Ting-Chun Wang, Haoxiang Wang, Ming-Yu Liu.
Resources: NVIDIA Cosmos, NVIDIA Blog, YouTube.

Licensing & Compatibility

Models: NVIDIA Open Model License (commercially usable, allows derivative models).
Code: Apache 2.0 license.
Compatibility: Models are commercially usable.

Limitations & Caveats

The repository is now read-only, with active development and support moved to the NVIDIA/Cosmos repository. TensorRT inference is listed as "coming soon."

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days