ControlAR  by hustvl

Research paper for controllable image generation using autoregressive models

Created 11 months ago
288 stars

Top 91.2% on SourcePulse

GitHubView on GitHub
Project Summary

ControlAR introduces a novel conditional decoding strategy for autoregressive models, enabling controllable image generation with spatial conditioning. It targets researchers and practitioners in generative AI who seek to integrate fine-grained control into autoregressive pipelines, offering an alternative to diffusion-based methods.

How It Works

ControlAR treats spatial control as a sequence-to-sequence problem, integrating conditioning information directly into the autoregressive generation process without requiring special tokens or resolution-aware prompts. This approach allows for arbitrary-resolution image generation and offers flexibility in handling various control modalities like edges, depth maps, and segmentation masks.

Quick Start & Requirements

  • Install: Clone the repository, create a Conda environment (python=3.10), install PyTorch (2.1.2+cu118), and then install requirements (pip install -r requirements.txt). Additional dependencies include openmim, mmengine, mmcv==2.1.0, mmsegmentation>=1.0.0, and mmdet.
  • Prerequisites: CUDA 11.8 is recommended for PyTorch installation. Datasets (ImageNet, ADE20K, COCOStuff, MultiGen-20M) need to be downloaded and preprocessed.
  • Resources: Requires significant disk space for datasets (over 1TB for MultiGen-20M).
  • Links: arXiv Paper, HuggingFace Demo, HuggingFace Checkpoints.

Highlighted Details

  • Supports arbitrary-resolution image generation with autoregressive models.
  • Integrates spatial controls from a sequence perspective.
  • Offers flexibility with multiple control types (Canny, HED, Lineart, Depth, Segmentation Masks).
  • Recent updates include a control strength factor and larger control encoders.

Maintenance & Community

The project is associated with authors from Huazhong University of Science and Technology and The University of Hong Kong. It has been accepted to ICLR 2025.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Some control types (e.g., HED Edge, Segmentation Mask) are noted as not supporting arbitrary-resolution generation in the provided table. Training details and code are available, but the primary focus of the README is on inference and evaluation.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

IP-Adapter by tencent-ailab

0.3%
6k
Adapter for image prompt in text-to-image diffusion models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.