ControlAR  by hustvl

Research paper for controllable image generation using autoregressive models

Created 1 year ago
303 stars

Top 88.1% on SourcePulse

GitHubView on GitHub
Project Summary

ControlAR introduces a novel conditional decoding strategy for autoregressive models, enabling controllable image generation with spatial conditioning. It targets researchers and practitioners in generative AI who seek to integrate fine-grained control into autoregressive pipelines, offering an alternative to diffusion-based methods.

How It Works

ControlAR treats spatial control as a sequence-to-sequence problem, integrating conditioning information directly into the autoregressive generation process without requiring special tokens or resolution-aware prompts. This approach allows for arbitrary-resolution image generation and offers flexibility in handling various control modalities like edges, depth maps, and segmentation masks.

Quick Start & Requirements

  • Install: Clone the repository, create a Conda environment (python=3.10), install PyTorch (2.1.2+cu118), and then install requirements (pip install -r requirements.txt). Additional dependencies include openmim, mmengine, mmcv==2.1.0, mmsegmentation>=1.0.0, and mmdet.
  • Prerequisites: CUDA 11.8 is recommended for PyTorch installation. Datasets (ImageNet, ADE20K, COCOStuff, MultiGen-20M) need to be downloaded and preprocessed.
  • Resources: Requires significant disk space for datasets (over 1TB for MultiGen-20M).
  • Links: arXiv Paper, HuggingFace Demo, HuggingFace Checkpoints.

Highlighted Details

  • Supports arbitrary-resolution image generation with autoregressive models.
  • Integrates spatial controls from a sequence perspective.
  • Offers flexibility with multiple control types (Canny, HED, Lineart, Depth, Segmentation Masks).
  • Recent updates include a control strength factor and larger control encoders.

Maintenance & Community

The project is associated with authors from Huazhong University of Science and Technology and The University of Hong Kong. It has been accepted to ICLR 2025.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Some control types (e.g., HED Edge, Segmentation Mask) are noted as not supporting arbitrary-resolution generation in the provided table. Training details and code are available, but the primary focus of the README is on inference and evaluation.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

IP-Adapter by tencent-ailab

0.2%
6k
Adapter for image prompt in text-to-image diffusion models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.