ImageFolder  by lxa9867

Image tokenization framework for autoregressive generation research

Created 11 months ago
288 stars

Top 91.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides ImageFolder, a flexible framework for autoregressive image generation using advanced tokenization techniques. It targets researchers and practitioners in generative AI, offering improved image quality and robustness through novel quantization methods and latent perturbation.

How It Works

ImageFolder implements a hierarchical quantization approach combining Product Quantization (PQ) and Residual Quantization (RQ). It supports various quantization units like Vector Quantization (VQ), Lookup-Free Quantization (LFQ), and Binary Spherical Quantization (BSQ). A key feature is the integration of latent perturbation (LP) to enhance robustness, as demonstrated in the RobustTok component. This allows for plug-and-play integration of robustness improvements into existing quantization pipelines.

Quick Start & Requirements

  • Install dependencies using conda env create -f environment.yml.
  • Requires PyTorch and a CUDA-enabled GPU for training.
  • Datasets should be organized in a format compatible with torchvision.datasets.ImageFolder.
  • Pre-trained tokenizers and generators are available on Hugging Face.
  • Links: ImageFolder Project Page, XQ-GAN Weights, RobustTok Weights

Highlighted Details

  • Achieves state-of-the-art results on ImageNet with pFID scores as low as 2.28 (VP+LP) and 2.60 for VAR generators.
  • Supports fine-tuning with 'full', 'lora', or 'frozen' encoder/decoder tuning methods.
  • Integrates semantic regularization using DINOv2 or CLIP for improved generation quality.
  • Offers multi-scale quantization and product quantization for efficient tokenization.

Maintenance & Community

The project is associated with Adobe Research and has recent updates, including the release of RobustTok code. It cites related work from FoundationVision and ControlVAR.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, the presence of Hugging Face links and the nature of the project suggest it is intended for research use. Commercial use would require clarification.

Limitations & Caveats

Training commands and configurations are provided, but specific dataset preparation and environment setup might require careful attention. The README notes that rFID may not always correlate with gFID, suggesting potential tuning considerations for saving checkpoints.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI) and Phil Wang Phil Wang(Prolific Research Paper Implementer).

Cosmos-Tokenizer by NVIDIA

0.1%
2k
Suite of neural tokenizers for image and video processing
Created 10 months ago
Updated 7 months ago
Feedback? Help us improve.