Discover and explore top open-source AI tools and projects—updated daily.
Text-to-image model for high-resolution image synthesis using bitwise autoregressive modeling
Top 28.4% on SourcePulse
Infinity is a novel Bitwise Visual AutoRegressive Modeling framework for high-resolution image synthesis, targeting researchers and developers in computer vision and generative AI. It offers state-of-the-art performance, significantly outperforming diffusion models in benchmarks like GenEval and ImageReward, while achieving faster generation speeds.
How It Works
Infinity redefines autoregressive modeling with a bitwise token prediction framework, featuring an "infinite-vocabulary" tokenizer and classifier, and a bitwise self-correction mechanism. The core innovation lies in a bitwise multi-scale residual quantizer enabling extremely large vocabularies ($2^{32}$ or $2^{64}$), and an Infinite-Vocabulary Classifier (IVC) that predicts bits instead of full indices, drastically reducing parameter count and improving stability. Bitwise Self-Correction (BSC) addresses the train-test discrepancy common in autoregressive models.
Quick Start & Requirements
pip3 install -r requirements.txt
torch>=2.5.1
(for FlexAttention), Hugging Face transformers
for flan-t5-xl
. Weights for flan-t5-xl
and Infinity models must be downloaded separately.interactive_infer.ipynb
and interactive_infer_8b.ipynb
are provided for detailed inference.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
torchrun
) and potentially extensive resources for larger models and resolutions.2 months ago
1 day