ImageFolder by lxa9867

Image tokenization framework for autoregressive generation research

Created 1 year ago

303 stars

Top 88.4% on SourcePulse

Project Summary

This repository provides ImageFolder, a flexible framework for autoregressive image generation using advanced tokenization techniques. It targets researchers and practitioners in generative AI, offering improved image quality and robustness through novel quantization methods and latent perturbation.

How It Works

ImageFolder implements a hierarchical quantization approach combining Product Quantization (PQ) and Residual Quantization (RQ). It supports various quantization units like Vector Quantization (VQ), Lookup-Free Quantization (LFQ), and Binary Spherical Quantization (BSQ). A key feature is the integration of latent perturbation (LP) to enhance robustness, as demonstrated in the RobustTok component. This allows for plug-and-play integration of robustness improvements into existing quantization pipelines.

Quick Start & Requirements

Install dependencies using conda env create -f environment.yml.
Requires PyTorch and a CUDA-enabled GPU for training.
Datasets should be organized in a format compatible with torchvision.datasets.ImageFolder.
Pre-trained tokenizers and generators are available on Hugging Face.
Links: ImageFolder Project Page, XQ-GAN Weights, RobustTok Weights

Highlighted Details

Achieves state-of-the-art results on ImageNet with pFID scores as low as 2.28 (VP+LP) and 2.60 for VAR generators.
Supports fine-tuning with 'full', 'lora', or 'frozen' encoder/decoder tuning methods.
Integrates semantic regularization using DINOv2 or CLIP for improved generation quality.
Offers multi-scale quantization and product quantization for efficient tokenization.

Maintenance & Community

The project is associated with Adobe Research and has recent updates, including the release of RobustTok code. It cites related work from FoundationVision and ControlVAR.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, the presence of Hugging Face links and the nature of the project suggest it is intended for research use. Commercial use would require clarification.

Limitations & Caveats

Training commands and configurations are provided, but specific dataset preparation and environment setup might require careful attention. The README notes that rFID may not always correlate with gFID, suggesting potential tuning considerations for saving checkpoints.

ImageFolder by lxa9867

Explore Similar Projects

Awesome-Autoregressive-Visual-Generation by lxa9867

TokenPacker by CircleRadon

NextStep-1 by stepfun-ai

image-gpt by teddykoker

SEED by AILab-CVC

Gemini by kyegomez

Cosmos-Tokenizer by NVIDIA

Show-o by showlab

ibot by bytedance

LlamaGen by FoundationVision

GLIGEN by gligen

pytorch-image-models by huggingface