Infinity by FoundationVision

Text-to-image model for high-resolution image synthesis using bitwise autoregressive modeling

Created 1 year ago

1,537 stars

Top 26.8% on SourcePulse

Project Summary

Infinity is a novel Bitwise Visual AutoRegressive Modeling framework for high-resolution image synthesis, targeting researchers and developers in computer vision and generative AI. It offers state-of-the-art performance, significantly outperforming diffusion models in benchmarks like GenEval and ImageReward, while achieving faster generation speeds.

How It Works

Infinity redefines autoregressive modeling with a bitwise token prediction framework, featuring an "infinite-vocabulary" tokenizer and classifier, and a bitwise self-correction mechanism. The core innovation lies in a bitwise multi-scale residual quantizer enabling extremely large vocabularies ($2^{32}$ or $2^{64}$), and an Infinite-Vocabulary Classifier (IVC) that predicts bits instead of full indices, drastically reducing parameter count and improving stability. Bitwise Self-Correction (BSC) addresses the train-test discrepancy common in autoregressive models.

Quick Start & Requirements

Installation: pip3 install -r requirements.txt
Prerequisites: torch>=2.5.1 (for FlexAttention), Hugging Face transformers for flan-t5-xl. Weights for flan-t5-xl and Infinity models must be downloaded separately.
Demo: An interactive web demo is available at https://opensource.bytedance.com/gmpt/t2i/invite.
Inference Notebooks: interactive_infer.ipynb and interactive_infer_8b.ipynb are provided for detailed inference.
Docker: A Docker image is available for local inference reproduction.

Highlighted Details

Achieved CVPR 2025 Oral.
Infinity-8B weights and code released.
Outperforms SD3-Medium and SDXL on GenEval (0.73 vs 0.62) and ImageReward (0.96 vs 0.87).
Generates 1024x1024 images in 0.8 seconds, 2.6x faster than SD3-Medium.
Offers visual tokenizer weights with varying vocabulary sizes ($2^{16}$ to $2^{64}$) and corresponding FID/PSNR scores.
Provides Infinity-2B and Infinity-8B checkpoints.

Maintenance & Community

Project page: https://foundationvision.github.io/infinity.project/
Hugging Face: https://huggingface.co/FoundationVision/infinity
arXiv: https://arxiv.org/abs/2412.04431

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

Infinity-20B checkpoints are listed as "Coming Soon".
Training scripts require distributed setup (torchrun) and potentially extensive resources for larger models and resolutions.
The use of FlexAttention necessitates a recent PyTorch version.

Health Check

Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

16 stars in the last 30 days