Infinity  by FoundationVision

Text-to-image model for high-resolution image synthesis using bitwise autoregressive modeling

Created 9 months ago
1,440 stars

Top 28.4% on SourcePulse

GitHubView on GitHub
Project Summary

Infinity is a novel Bitwise Visual AutoRegressive Modeling framework for high-resolution image synthesis, targeting researchers and developers in computer vision and generative AI. It offers state-of-the-art performance, significantly outperforming diffusion models in benchmarks like GenEval and ImageReward, while achieving faster generation speeds.

How It Works

Infinity redefines autoregressive modeling with a bitwise token prediction framework, featuring an "infinite-vocabulary" tokenizer and classifier, and a bitwise self-correction mechanism. The core innovation lies in a bitwise multi-scale residual quantizer enabling extremely large vocabularies ($2^{32}$ or $2^{64}$), and an Infinite-Vocabulary Classifier (IVC) that predicts bits instead of full indices, drastically reducing parameter count and improving stability. Bitwise Self-Correction (BSC) addresses the train-test discrepancy common in autoregressive models.

Quick Start & Requirements

  • Installation: pip3 install -r requirements.txt
  • Prerequisites: torch>=2.5.1 (for FlexAttention), Hugging Face transformers for flan-t5-xl. Weights for flan-t5-xl and Infinity models must be downloaded separately.
  • Demo: An interactive web demo is available at https://opensource.bytedance.com/gmpt/t2i/invite.
  • Inference Notebooks: interactive_infer.ipynb and interactive_infer_8b.ipynb are provided for detailed inference.
  • Docker: A Docker image is available for local inference reproduction.

Highlighted Details

  • Achieved CVPR 2025 Oral.
  • Infinity-8B weights and code released.
  • Outperforms SD3-Medium and SDXL on GenEval (0.73 vs 0.62) and ImageReward (0.96 vs 0.87).
  • Generates 1024x1024 images in 0.8 seconds, 2.6x faster than SD3-Medium.
  • Offers visual tokenizer weights with varying vocabulary sizes ($2^{16}$ to $2^{64}$) and corresponding FID/PSNR scores.
  • Provides Infinity-2B and Infinity-8B checkpoints.

Maintenance & Community

Licensing & Compatibility

  • Licensed under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

  • Infinity-20B checkpoints are listed as "Coming Soon".
  • Training scripts require distributed setup (torchrun) and potentially extensive resources for larger models and resolutions.
  • The use of FlexAttention necessitates a recent PyTorch version.
Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
45 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.