Fast text-to-image synthesis with Diffusion Transformers
Top 15.7% on sourcepulse
PixArt-α is a PyTorch implementation of a Diffusion Transformer (DiT) model for photorealistic text-to-image synthesis, designed for significantly faster training and competitive generation quality compared to existing large-scale models. It targets researchers and developers in the AI-generated content (AIGC) community seeking to build high-quality, low-cost generative models.
How It Works
PixArt-α employs a Transformer architecture for diffusion models, incorporating cross-attention to inject text conditioning efficiently. Its training strategy is decomposed into three distinct steps: optimizing pixel dependency, text-image alignment, and image aesthetic quality. The model leverages high-informative data, specifically dense pseudo-captions generated by a large Vision-Language model, to enhance text-image alignment. This approach results in a 0.6B parameter model that achieves competitive FID scores with substantially reduced training time and cost.
Quick Start & Requirements
git clone https://github.com/PixArt-alpha/PixArt-alpha.git
cd PixArt-alpha
pip install -r requirements.txt
diffusers
integration supports as low as 8GB.python app/app.py
), Docker support, and Hugging Face/Google Colab demos are provided.Highlighted Details
Maintenance & Community
The project is actively developed with recent updates including PixArt-δ (LCM and ControlNet) releases and diffusers integration. A Discord community is available for discussions and contributions.
Licensing & Compatibility
The repository's primary license is not explicitly stated in the README. However, the integration with Hugging Face diffusers
suggests compatibility with its ecosystem. Specific model weights may have different licenses.
Limitations & Caveats
The base repository's inference requires significant GPU memory (23GB+). While the diffusers
integration addresses lower VRAM requirements (8GB), users should verify specific model compatibility. The project also includes experimental features and ongoing development for newer versions like PixArt-Σ.
9 months ago
Inactive