PyTorch implementation of DALL-E, OpenAI's text-to-image transformer
Top 9.2% on sourcepulse
This repository provides a PyTorch implementation of OpenAI's DALL-E, a text-to-image generation model. It allows researchers and developers to replicate, train, and experiment with DALL-E, offering flexibility in VAE choices and attention mechanisms.
How It Works
The project implements DALL-E as a transformer model that takes text tokens and visual tokens (generated by a Discrete VAE) as input to produce images. It supports various VAE backends, including OpenAI's pretrained VAE and Taming Transformer's VQGAN, and offers advanced features like reversible networks for deeper models and different sparse attention mechanisms (axial, conv-like) to manage computational cost.
Quick Start & Requirements
pip install dalle-pytorch
Highlighted Details
Maintenance & Community
The project has seen contributions from several individuals and is actively maintained. Community support and discussions are likely available through GitHub issues and potentially linked community channels.
Licensing & Compatibility
The repository's license is not explicitly stated in the provided README snippet. Users should verify licensing for commercial use or integration into closed-source projects.
Limitations & Caveats
Training DALL-E from scratch requires significant computational resources and large datasets. The README notes that achieving results comparable to OpenAI's original paper requires a depth of 64 layers, which is computationally intensive. The project also mentions moving towards DALL-E 2, indicating potential future deprecation of this specific implementation.
1 year ago
Inactive