ControlAR  by hustvl

Research paper for controllable image generation using autoregressive models

created 10 months ago
282 stars

Top 93.5% on sourcepulse

GitHubView on GitHub
Project Summary

ControlAR introduces a novel conditional decoding strategy for autoregressive models, enabling controllable image generation with spatial conditioning. It targets researchers and practitioners in generative AI who seek to integrate fine-grained control into autoregressive pipelines, offering an alternative to diffusion-based methods.

How It Works

ControlAR treats spatial control as a sequence-to-sequence problem, integrating conditioning information directly into the autoregressive generation process without requiring special tokens or resolution-aware prompts. This approach allows for arbitrary-resolution image generation and offers flexibility in handling various control modalities like edges, depth maps, and segmentation masks.

Quick Start & Requirements

  • Install: Clone the repository, create a Conda environment (python=3.10), install PyTorch (2.1.2+cu118), and then install requirements (pip install -r requirements.txt). Additional dependencies include openmim, mmengine, mmcv==2.1.0, mmsegmentation>=1.0.0, and mmdet.
  • Prerequisites: CUDA 11.8 is recommended for PyTorch installation. Datasets (ImageNet, ADE20K, COCOStuff, MultiGen-20M) need to be downloaded and preprocessed.
  • Resources: Requires significant disk space for datasets (over 1TB for MultiGen-20M).
  • Links: arXiv Paper, HuggingFace Demo, HuggingFace Checkpoints.

Highlighted Details

  • Supports arbitrary-resolution image generation with autoregressive models.
  • Integrates spatial controls from a sequence perspective.
  • Offers flexibility with multiple control types (Canny, HED, Lineart, Depth, Segmentation Masks).
  • Recent updates include a control strength factor and larger control encoders.

Maintenance & Community

The project is associated with authors from Huazhong University of Science and Technology and The University of Hong Kong. It has been accepted to ICLR 2025.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Some control types (e.g., HED Edge, Segmentation Mask) are noted as not supporting arbitrary-resolution generation in the provided table. Training details and code are available, but the primary focus of the README is on inference and evaluation.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
33 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
3 more.

guided-diffusion by openai

0.2%
7k
Image synthesis codebase for diffusion models
created 4 years ago
updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
4 more.

taming-transformers by CompVis

0.1%
6k
Image synthesis research paper using transformers
created 4 years ago
updated 1 year ago
Feedback? Help us improve.