DALLE2-pytorch by lucidrains

PyTorch implementation of DALL-E 2, a text-to-image synthesis neural network

Created 3 years ago

11,341 stars

Top 4.5% on SourcePulse

View on GitHub

15 Experts Love This Project

Phil Wang

Prolific Research Paper Implementer

Benjamin Bolte

Cofounder of K-Scale Labs

Victor Taelin

Author of Bend, Kind, HVM

Jesse Clark

Cofounder of Marqo

and 11 more!

Project Summary

This repository provides a PyTorch implementation of OpenAI's DALL-E 2 text-to-image synthesis model. It focuses on replicating the diffusion prior network, which generates image embeddings from text embeddings using CLIP, enabling higher variety in image generation. The project is aimed at researchers and developers interested in state-of-the-art generative models.

How It Works

The implementation follows DALL-E 2's architecture, utilizing a diffusion prior network that predicts CLIP image embeddings from CLIP text embeddings. This is achieved through a diffusion process, where a causal transformer acts as the denoising network. The project also includes components for training the CLIP model itself and a decoder (U-Net based) for generating images from the learned embeddings, supporting cascaded diffusion for higher resolutions and latent diffusion with VQGAN-VAE.

Quick Start & Requirements

Install via pip: pip install dalle2-pytorch
Requires PyTorch.
GPU with CUDA is highly recommended for training and inference.
Links: Official Docs, Hugging Face Checkpoints

Highlighted Details

Implements the diffusion prior network, a key component for DALL-E 2's generation variety.
Supports cascaded diffusion models for high-resolution image synthesis.
Integrates latent diffusion with VQGAN-VAE for efficient generation.
Includes inpainting capabilities using the RePaint formulation.
Offers trainer classes (DecoderTrainer, DiffusionPriorTrainer) for simplified training loops and EMA management.

Maintenance & Community

The project acknowledges significant contributions from various individuals and Stability AI for sponsorship. It is actively developed, with a comprehensive "Todo" list indicating ongoing work and planned features. Community engagement is encouraged via the LAION community.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, given the nature of the project and its dependencies (PyTorch, Hugging Face libraries), it is likely permissive, but users should verify for commercial use.

Limitations & Caveats

The README notes that as of May 2022, the implementation is no longer state-of-the-art, with newer architectures like Imagen surpassing it. Training can be computationally intensive, requiring significant GPU resources and time. Some advanced features like latent diffusion require pre-trained VQGAN-VAE models.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days