ImageFolder  by lxa9867

Image tokenization framework for autoregressive generation research

created 10 months ago
278 stars

Top 94.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides ImageFolder, a flexible framework for autoregressive image generation using advanced tokenization techniques. It targets researchers and practitioners in generative AI, offering improved image quality and robustness through novel quantization methods and latent perturbation.

How It Works

ImageFolder implements a hierarchical quantization approach combining Product Quantization (PQ) and Residual Quantization (RQ). It supports various quantization units like Vector Quantization (VQ), Lookup-Free Quantization (LFQ), and Binary Spherical Quantization (BSQ). A key feature is the integration of latent perturbation (LP) to enhance robustness, as demonstrated in the RobustTok component. This allows for plug-and-play integration of robustness improvements into existing quantization pipelines.

Quick Start & Requirements

  • Install dependencies using conda env create -f environment.yml.
  • Requires PyTorch and a CUDA-enabled GPU for training.
  • Datasets should be organized in a format compatible with torchvision.datasets.ImageFolder.
  • Pre-trained tokenizers and generators are available on Hugging Face.
  • Links: ImageFolder Project Page, XQ-GAN Weights, RobustTok Weights

Highlighted Details

  • Achieves state-of-the-art results on ImageNet with pFID scores as low as 2.28 (VP+LP) and 2.60 for VAR generators.
  • Supports fine-tuning with 'full', 'lora', or 'frozen' encoder/decoder tuning methods.
  • Integrates semantic regularization using DINOv2 or CLIP for improved generation quality.
  • Offers multi-scale quantization and product quantization for efficient tokenization.

Maintenance & Community

The project is associated with Adobe Research and has recent updates, including the release of RobustTok code. It cites related work from FoundationVision and ControlVAR.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, the presence of Hugging Face links and the nature of the project suggest it is intended for research use. Commercial use would require clarification.

Limitations & Caveats

Training commands and configurations are provided, but specific dataset preparation and environment setup might require careful attention. The README notes that rFID may not always correlate with gFID, suggesting potential tuning considerations for saving checkpoints.

Health Check
Last commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
26 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.