Autoregressive-Models-in-Vision-Survey  by ChaofanTao

Survey for autoregressive models in vision

created 1 year ago
661 stars

Top 51.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive survey of autoregressive models in computer vision, targeting researchers and practitioners in the field. It aims to consolidate the latest advancements, techniques, and applications of autoregressive modeling for various visual tasks, serving as a valuable resource for understanding and advancing this rapidly evolving area.

How It Works

The survey categorizes autoregressive models based on their application in vision, including image generation (unconditional, text-to-image, image-to-image), video generation, 3D generation, and multimodal tasks. It details core approaches such as pixel-wise generation, token-wise generation using various tokenization strategies (e.g., VQ-VAE, learned tokenizers), and scale-wise generation. The survey highlights how autoregressive models leverage sequential dependencies to generate high-quality visual content, often outperforming other generative paradigms in specific benchmarks.

Quick Start & Requirements

This repository is a survey and does not involve direct code execution or installation. It provides links to research papers and their associated code repositories.

Highlighted Details

  • Accepted to TMLR 2025.
  • Actively updated with new research, including papers from 2025.
  • Comprehensive taxonomy covering diverse visual generation tasks.
  • Includes links to papers, code, and related projects for each entry.

Maintenance & Community

The repository is actively maintained by the authors, welcoming contributions, feedback, and suggestions for missed papers or updates. It lists numerous academic institutions and affiliations for its contributors.

Licensing & Compatibility

The repository itself is not licensed for software use. Individual papers and code repositories linked within the survey will have their own respective licenses.

Limitations & Caveats

As a survey, this repository does not provide executable code or models. Its value is in its curated collection of research papers and their categorization, requiring users to consult individual linked resources for implementation details and performance.

Health Check
Last commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
129 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.