PyTorch implementation of DALL-E 2, a text-to-image synthesis neural network
Top 4.6% on sourcepulse
This repository provides a PyTorch implementation of OpenAI's DALL-E 2 text-to-image synthesis model. It focuses on replicating the diffusion prior network, which generates image embeddings from text embeddings using CLIP, enabling higher variety in image generation. The project is aimed at researchers and developers interested in state-of-the-art generative models.
How It Works
The implementation follows DALL-E 2's architecture, utilizing a diffusion prior network that predicts CLIP image embeddings from CLIP text embeddings. This is achieved through a diffusion process, where a causal transformer acts as the denoising network. The project also includes components for training the CLIP model itself and a decoder (U-Net based) for generating images from the learned embeddings, supporting cascaded diffusion for higher resolutions and latent diffusion with VQGAN-VAE.
Quick Start & Requirements
pip install dalle2-pytorch
Highlighted Details
DecoderTrainer
, DiffusionPriorTrainer
) for simplified training loops and EMA management.Maintenance & Community
The project acknowledges significant contributions from various individuals and Stability AI for sponsorship. It is actively developed, with a comprehensive "Todo" list indicating ongoing work and planned features. Community engagement is encouraged via the LAION community.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. However, given the nature of the project and its dependencies (PyTorch, Hugging Face libraries), it is likely permissive, but users should verify for commercial use.
Limitations & Caveats
The README notes that as of May 2022, the implementation is no longer state-of-the-art, with newer architectures like Imagen surpassing it. Training can be computationally intensive, requiring significant GPU resources and time. Some advanced features like latent diffusion require pre-trained VQGAN-VAE models.
1 year ago
Inactive