DALLE2-pytorch  by lucidrains

PyTorch implementation of DALL-E 2, a text-to-image synthesis neural network

created 3 years ago
11,301 stars

Top 4.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of OpenAI's DALL-E 2 text-to-image synthesis model. It focuses on replicating the diffusion prior network, which generates image embeddings from text embeddings using CLIP, enabling higher variety in image generation. The project is aimed at researchers and developers interested in state-of-the-art generative models.

How It Works

The implementation follows DALL-E 2's architecture, utilizing a diffusion prior network that predicts CLIP image embeddings from CLIP text embeddings. This is achieved through a diffusion process, where a causal transformer acts as the denoising network. The project also includes components for training the CLIP model itself and a decoder (U-Net based) for generating images from the learned embeddings, supporting cascaded diffusion for higher resolutions and latent diffusion with VQGAN-VAE.

Quick Start & Requirements

Highlighted Details

  • Implements the diffusion prior network, a key component for DALL-E 2's generation variety.
  • Supports cascaded diffusion models for high-resolution image synthesis.
  • Integrates latent diffusion with VQGAN-VAE for efficient generation.
  • Includes inpainting capabilities using the RePaint formulation.
  • Offers trainer classes (DecoderTrainer, DiffusionPriorTrainer) for simplified training loops and EMA management.

Maintenance & Community

The project acknowledges significant contributions from various individuals and Stability AI for sponsorship. It is actively developed, with a comprehensive "Todo" list indicating ongoing work and planned features. Community engagement is encouraged via the LAION community.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, given the nature of the project and its dependencies (PyTorch, Hugging Face libraries), it is likely permissive, but users should verify for commercial use.

Limitations & Caveats

The README notes that as of May 2022, the implementation is no longer state-of-the-art, with newer architectures like Imagen surpassing it. Training can be computationally intensive, requiring significant GPU resources and time. Some advanced features like latent diffusion require pre-trained VQGAN-VAE models.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
75 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Travis Fischer Travis Fischer(Founder of Agentic), and
3 more.

consistency_models by openai

0.0%
6k
PyTorch code for consistency models research paper
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.