Research paper for controllable image generation using autoregressive models
Top 93.5% on sourcepulse
ControlAR introduces a novel conditional decoding strategy for autoregressive models, enabling controllable image generation with spatial conditioning. It targets researchers and practitioners in generative AI who seek to integrate fine-grained control into autoregressive pipelines, offering an alternative to diffusion-based methods.
How It Works
ControlAR treats spatial control as a sequence-to-sequence problem, integrating conditioning information directly into the autoregressive generation process without requiring special tokens or resolution-aware prompts. This approach allows for arbitrary-resolution image generation and offers flexibility in handling various control modalities like edges, depth maps, and segmentation masks.
Quick Start & Requirements
python=3.10
), install PyTorch (2.1.2+cu118
), and then install requirements (pip install -r requirements.txt
). Additional dependencies include openmim
, mmengine
, mmcv==2.1.0
, mmsegmentation>=1.0.0
, and mmdet
.Highlighted Details
Maintenance & Community
The project is associated with authors from Huazhong University of Science and Technology and The University of Hong Kong. It has been accepted to ICLR 2025.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
Some control types (e.g., HED Edge, Segmentation Mask) are noted as not supporting arbitrary-resolution generation in the provided table. Training details and code are available, but the primary focus of the README is on inference and evaluation.
3 months ago
1 day