hart  by mit-han-lab

Visual generation model using a hybrid autoregressive transformer

Created 11 months ago
632 stars

Top 52.4% on SourcePulse

GitHubView on GitHub
Project Summary

HART (Hybrid Autoregressive Transformer) is a visual generation model designed to produce high-quality 1024x1024 images, aiming to rival diffusion models in quality and efficiency. It targets researchers and practitioners in generative AI seeking faster, more efficient image synthesis.

How It Works

HART employs a novel hybrid tokenizer that decomposes continuous image latents into discrete tokens (for global structure) and continuous residual tokens. A scalable-resolution discrete autoregressive model handles the discrete component, while a lightweight diffusion module (37M parameters) models the residuals. This hybrid approach significantly improves reconstruction quality over discrete-only methods and enables efficient generation of high-resolution images.

Quick Start & Requirements

  • Install: Clone the repo, create a conda environment (python=3.10), install CUDA toolkit, and run pip install -e ..
  • Prerequisites: CUDA toolkit, Python 3.10. Requires downloading pre-trained models for HART (0.7b-1024px), Qwen2-VL-1.5B-Instruct, and optionally ShieldGemma-2B for safety.
  • Demo: Launch with python app.py --model_path /path/to/model --text_model_path /path/to/Qwen2 --shield_model_path /path/to/ShieldGemma2B.
  • Inference: Use sample.py for single or multiple prompts.
  • Docs: Paper, Demo, Project.

Highlighted Details

  • Achieves FID of 5.38 on MJHQ-30K, a 31% improvement over VAR.
  • Outperforms diffusion models in FID and CLIP score.
  • Offers 4.5-7.7x higher throughput and 6.9-13.4x lower MACs.
  • Hybrid tokenizer improves reconstruction FID from 2.11 to 0.30.

Maintenance & Community

The project is from the mit-han-lab. Codebase is inspired by VAR and MAR. Key contributors include Haotian Tang, Yecheng Wu, Shang Yang, Enze Xie, Junsong Chen, Junyu Chen, Zhuoyang Zhang, Han Cai, Yao Lu, and Song Han.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify any limitations or known issues. The project appears to be recent (2024).

Health Check
Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), and
3 more.

taesd by madebyollin

0.3%
779
Tiny AutoEncoder for Stable Diffusion latents
Created 2 years ago
Updated 5 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
1 more.

VAR by FoundationVision

0.1%
8k
Image generation research paper using visual autoregressive modeling
Created 1 year ago
Updated 4 months ago
Feedback? Help us improve.