hart by mit-han-lab

Visual generation model using a hybrid autoregressive transformer

Created 1 year ago

648 stars

Top 51.4% on SourcePulse

Project Summary

HART (Hybrid Autoregressive Transformer) is a visual generation model designed to produce high-quality 1024x1024 images, aiming to rival diffusion models in quality and efficiency. It targets researchers and practitioners in generative AI seeking faster, more efficient image synthesis.

How It Works

HART employs a novel hybrid tokenizer that decomposes continuous image latents into discrete tokens (for global structure) and continuous residual tokens. A scalable-resolution discrete autoregressive model handles the discrete component, while a lightweight diffusion module (37M parameters) models the residuals. This hybrid approach significantly improves reconstruction quality over discrete-only methods and enables efficient generation of high-resolution images.

Quick Start & Requirements

Install: Clone the repo, create a conda environment (python=3.10), install CUDA toolkit, and run pip install -e ..
Prerequisites: CUDA toolkit, Python 3.10. Requires downloading pre-trained models for HART (0.7b-1024px), Qwen2-VL-1.5B-Instruct, and optionally ShieldGemma-2B for safety.
Demo: Launch with python app.py --model_path /path/to/model --text_model_path /path/to/Qwen2 --shield_model_path /path/to/ShieldGemma2B.
Inference: Use sample.py for single or multiple prompts.
Docs: Paper, Demo, Project.

Highlighted Details

Achieves FID of 5.38 on MJHQ-30K, a 31% improvement over VAR.
Outperforms diffusion models in FID and CLIP score.
Offers 4.5-7.7x higher throughput and 6.9-13.4x lower MACs.
Hybrid tokenizer improves reconstruction FID from 2.11 to 0.30.

Maintenance & Community

The project is from the mit-han-lab. Codebase is inspired by VAR and MAR. Key contributors include Haotian Tang, Yecheng Wu, Shang Yang, Enze Xie, Junsong Chen, Junyu Chen, Zhuoyang Zhang, Han Cai, Yao Lu, and Song Han.

hart by mit-han-lab

Explore Similar Projects

fm-boosting by CompVis

Awesome-Generation-Acceleration by xuyang-liu16

FastGen by NVlabs

CCSR by csslc

DiffPIR by yuanzhi-zhu

taesd by madebyollin

DiffusionFastForward by mikonvergence

pytorch-stable-diffusion by hkproj

k-diffusion by crowsonkb

VAR by FoundationVision

guided-diffusion by openai

latent-diffusion by CompVis