Discover and explore top open-source AI tools and projects—updated daily.
FoundationVisionText-to-image model for high-resolution image synthesis using bitwise autoregressive modeling
Top 27.7% on SourcePulse
Infinity is a novel Bitwise Visual AutoRegressive Modeling framework for high-resolution image synthesis, targeting researchers and developers in computer vision and generative AI. It offers state-of-the-art performance, significantly outperforming diffusion models in benchmarks like GenEval and ImageReward, while achieving faster generation speeds.
How It Works
Infinity redefines autoregressive modeling with a bitwise token prediction framework, featuring an "infinite-vocabulary" tokenizer and classifier, and a bitwise self-correction mechanism. The core innovation lies in a bitwise multi-scale residual quantizer enabling extremely large vocabularies ($2^{32}$ or $2^{64}$), and an Infinite-Vocabulary Classifier (IVC) that predicts bits instead of full indices, drastically reducing parameter count and improving stability. Bitwise Self-Correction (BSC) addresses the train-test discrepancy common in autoregressive models.
Quick Start & Requirements
pip3 install -r requirements.txttorch>=2.5.1 (for FlexAttention), Hugging Face transformers for flan-t5-xl. Weights for flan-t5-xl and Infinity models must be downloaded separately.interactive_infer.ipynb and interactive_infer_8b.ipynb are provided for detailed inference.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
torchrun) and potentially extensive resources for larger models and resolutions.1 week ago
1 day
gligen
openai
openai