Image tokenization framework for autoregressive generation research
Top 94.3% on sourcepulse
This repository provides ImageFolder, a flexible framework for autoregressive image generation using advanced tokenization techniques. It targets researchers and practitioners in generative AI, offering improved image quality and robustness through novel quantization methods and latent perturbation.
How It Works
ImageFolder implements a hierarchical quantization approach combining Product Quantization (PQ) and Residual Quantization (RQ). It supports various quantization units like Vector Quantization (VQ), Lookup-Free Quantization (LFQ), and Binary Spherical Quantization (BSQ). A key feature is the integration of latent perturbation (LP) to enhance robustness, as demonstrated in the RobustTok component. This allows for plug-and-play integration of robustness improvements into existing quantization pipelines.
Quick Start & Requirements
conda env create -f environment.yml
.torchvision.datasets.ImageFolder
.Highlighted Details
Maintenance & Community
The project is associated with Adobe Research and has recent updates, including the release of RobustTok code. It cites related work from FoundationVision and ControlVAR.
Licensing & Compatibility
The repository does not explicitly state a license in the README. However, the presence of Hugging Face links and the nature of the project suggest it is intended for research use. Commercial use would require clarification.
Limitations & Caveats
Training commands and configurations are provided, but specific dataset preparation and environment setup might require careful attention. The README notes that rFID may not always correlate with gFID, suggesting potential tuning considerations for saving checkpoints.
3 months ago
Inactive