ControlAR by hustvl

Research paper for controllable image generation using autoregressive models

Created 1 year ago

316 stars

Top 85.6% on SourcePulse

Project Summary

ControlAR introduces a novel conditional decoding strategy for autoregressive models, enabling controllable image generation with spatial conditioning. It targets researchers and practitioners in generative AI who seek to integrate fine-grained control into autoregressive pipelines, offering an alternative to diffusion-based methods.

How It Works

ControlAR treats spatial control as a sequence-to-sequence problem, integrating conditioning information directly into the autoregressive generation process without requiring special tokens or resolution-aware prompts. This approach allows for arbitrary-resolution image generation and offers flexibility in handling various control modalities like edges, depth maps, and segmentation masks.

Quick Start & Requirements

Install: Clone the repository, create a Conda environment (python=3.10), install PyTorch (2.1.2+cu118), and then install requirements (pip install -r requirements.txt). Additional dependencies include openmim, mmengine, mmcv==2.1.0, mmsegmentation>=1.0.0, and mmdet.
Prerequisites: CUDA 11.8 is recommended for PyTorch installation. Datasets (ImageNet, ADE20K, COCOStuff, MultiGen-20M) need to be downloaded and preprocessed.
Resources: Requires significant disk space for datasets (over 1TB for MultiGen-20M).
Links: arXiv Paper, HuggingFace Demo, HuggingFace Checkpoints.

Highlighted Details

Supports arbitrary-resolution image generation with autoregressive models.
Integrates spatial controls from a sequence perspective.
Offers flexibility with multiple control types (Canny, HED, Lineart, Depth, Segmentation Masks).
Recent updates include a control strength factor and larger control encoders.

Maintenance & Community

The project is associated with authors from Huazhong University of Science and Technology and The University of Hong Kong. It has been accepted to ICLR 2025.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Some control types (e.g., HED Edge, Segmentation Mask) are noted as not supporting arbitrary-resolution generation in the provided table. Training details and code are available, but the primary focus of the README is on inference and evaluation.

ControlAR by hustvl

Explore Similar Projects

Autoregressive-Models-in-Vision-Survey by ChaofanTao

UniWorld by PKU-YuanGroup

UltraPixel by catcathh

qwen2vl-flux by erwold

SemanticStyleGAN by seasonSH

Liquid by FoundationVision

Lumina-mGPT-2.0 by Alpha-VLLM

MultiDiffusion by omerbt

LlamaGen by FoundationVision

DemoFusion by PRIS-CV

image-gpt by openai

IP-Adapter by tencent-ailab