Autoregressive-Models-in-Vision-Survey by ChaofanTao

Survey for autoregressive models in vision

Created 1 year ago

786 stars

Top 44.6% on SourcePulse

Project Summary

This repository provides a comprehensive survey of autoregressive models in computer vision, targeting researchers and practitioners in the field. It aims to consolidate the latest advancements, techniques, and applications of autoregressive modeling for various visual tasks, serving as a valuable resource for understanding and advancing this rapidly evolving area.

How It Works

The survey categorizes autoregressive models based on their application in vision, including image generation (unconditional, text-to-image, image-to-image), video generation, 3D generation, and multimodal tasks. It details core approaches such as pixel-wise generation, token-wise generation using various tokenization strategies (e.g., VQ-VAE, learned tokenizers), and scale-wise generation. The survey highlights how autoregressive models leverage sequential dependencies to generate high-quality visual content, often outperforming other generative paradigms in specific benchmarks.

Quick Start & Requirements

This repository is a survey and does not involve direct code execution or installation. It provides links to research papers and their associated code repositories.

Highlighted Details

Accepted to TMLR 2025.
Actively updated with new research, including papers from 2025.
Comprehensive taxonomy covering diverse visual generation tasks.
Includes links to papers, code, and related projects for each entry.

Maintenance & Community

The repository is actively maintained by the authors, welcoming contributions, feedback, and suggestions for missed papers or updates. It lists numerous academic institutions and affiliations for its contributors.

Licensing & Compatibility

The repository itself is not licensed for software use. Individual papers and code repositories linked within the survey will have their own respective licenses.

Limitations & Caveats

As a survey, this repository does not provide executable code or models. Its value is in its curated collection of research papers and their categorization, requiring users to consult individual linked resources for implementation details and performance.

Autoregressive-Models-in-Vision-Survey by ChaofanTao

Explore Similar Projects

Awesome-Autoregressive-Visual-Generation by lxa9867

ControlAR by hustvl

X-Omni by X-Omni-Team

Awesome-Evaluation-of-Visual-Generation by ziqihuangg

awesome-conditional-content-generation by haofanwang

Liquid by FoundationVision

Lumina-mGPT-2.0 by Alpha-VLLM

Generative-AI by fnzhan

VBench by Vchitect

zero123 by cvlab-columbia

mmagic by open-mmlab

Janus by deepseek-ai