Survey for autoregressive models in vision
Top 51.7% on sourcepulse
This repository provides a comprehensive survey of autoregressive models in computer vision, targeting researchers and practitioners in the field. It aims to consolidate the latest advancements, techniques, and applications of autoregressive modeling for various visual tasks, serving as a valuable resource for understanding and advancing this rapidly evolving area.
How It Works
The survey categorizes autoregressive models based on their application in vision, including image generation (unconditional, text-to-image, image-to-image), video generation, 3D generation, and multimodal tasks. It details core approaches such as pixel-wise generation, token-wise generation using various tokenization strategies (e.g., VQ-VAE, learned tokenizers), and scale-wise generation. The survey highlights how autoregressive models leverage sequential dependencies to generate high-quality visual content, often outperforming other generative paradigms in specific benchmarks.
Quick Start & Requirements
This repository is a survey and does not involve direct code execution or installation. It provides links to research papers and their associated code repositories.
Highlighted Details
Maintenance & Community
The repository is actively maintained by the authors, welcoming contributions, feedback, and suggestions for missed papers or updates. It lists numerous academic institutions and affiliations for its contributors.
Licensing & Compatibility
The repository itself is not licensed for software use. Individual papers and code repositories linked within the survey will have their own respective licenses.
Limitations & Caveats
As a survey, this repository does not provide executable code or models. Its value is in its curated collection of research papers and their categorization, requiring users to consult individual linked resources for implementation details and performance.
4 days ago
1 day