Visual generation model using a hybrid autoregressive transformer
Top 53.9% on sourcepulse
HART (Hybrid Autoregressive Transformer) is a visual generation model designed to produce high-quality 1024x1024 images, aiming to rival diffusion models in quality and efficiency. It targets researchers and practitioners in generative AI seeking faster, more efficient image synthesis.
How It Works
HART employs a novel hybrid tokenizer that decomposes continuous image latents into discrete tokens (for global structure) and continuous residual tokens. A scalable-resolution discrete autoregressive model handles the discrete component, while a lightweight diffusion module (37M parameters) models the residuals. This hybrid approach significantly improves reconstruction quality over discrete-only methods and enables efficient generation of high-resolution images.
Quick Start & Requirements
python=3.10
), install CUDA toolkit, and run pip install -e .
.python app.py --model_path /path/to/model --text_model_path /path/to/Qwen2 --shield_model_path /path/to/ShieldGemma2B
.sample.py
for single or multiple prompts.Highlighted Details
Maintenance & Community
The project is from the mit-han-lab. Codebase is inspired by VAR and MAR. Key contributors include Haotian Tang, Yecheng Wu, Shang Yang, Enze Xie, Junsong Chen, Junyu Chen, Zhuoyang Zhang, Han Cai, Yao Lu, and Song Han.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README does not specify any limitations or known issues. The project appears to be recent (2024).
9 months ago
1 day