DALLE-pytorch by lucidrains

PyTorch implementation of DALL-E, OpenAI's text-to-image transformer

Created 5 years ago

5,634 stars

Top 8.9% on SourcePulse

View on GitHub

12 Experts Love This Project

Elvis Saravia

Founder of DAIR.AI

Luis Capelo

Cofounder of Lightning AI

Ben Firshman

Cofounder of Replicate

Chenlin Meng

Cofounder of Pika

and 8 more!

Project Summary

This repository provides a PyTorch implementation of OpenAI's DALL-E, a text-to-image generation model. It allows researchers and developers to replicate, train, and experiment with DALL-E, offering flexibility in VAE choices and attention mechanisms.

How It Works

The project implements DALL-E as a transformer model that takes text tokens and visual tokens (generated by a Discrete VAE) as input to produce images. It supports various VAE backends, including OpenAI's pretrained VAE and Taming Transformer's VQGAN, and offers advanced features like reversible networks for deeper models and different sparse attention mechanisms (axial, conv-like) to manage computational cost.

Quick Start & Requirements

Install via pip: pip install dalle-pytorch
Requires PyTorch. GPU with CUDA is highly recommended for training.
Official quick start and documentation available at https://github.com/lucidrains/DALLE-pytorch/wiki.

Highlighted Details

Supports training custom VAEs or using OpenAI's and Taming Transformer's pretrained VAEs.
Implements reversible networks for scaling transformer depth and various sparse attention types for efficiency.
Includes functionality for text generation and CLIP-based ranking of generated images.
Offers robust distributed training support via DeepSpeed and Horovod.

Maintenance & Community

The project has seen contributions from several individuals and is actively maintained. Community support and discussions are likely available through GitHub issues and potentially linked community channels.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README snippet. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

Training DALL-E from scratch requires significant computational resources and large datasets. The README notes that achieving results comparable to OpenAI's original paper requires a depth of 64 layers, which is computationally intensive. The project also mentions moving towards DALL-E 2, indicating potential future deprecation of this specific implementation.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days