DALLE-pytorch  by lucidrains

PyTorch implementation of DALL-E, OpenAI's text-to-image transformer

Created 4 years ago
5,628 stars

Top 9.1% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of OpenAI's DALL-E, a text-to-image generation model. It allows researchers and developers to replicate, train, and experiment with DALL-E, offering flexibility in VAE choices and attention mechanisms.

How It Works

The project implements DALL-E as a transformer model that takes text tokens and visual tokens (generated by a Discrete VAE) as input to produce images. It supports various VAE backends, including OpenAI's pretrained VAE and Taming Transformer's VQGAN, and offers advanced features like reversible networks for deeper models and different sparse attention mechanisms (axial, conv-like) to manage computational cost.

Quick Start & Requirements

Highlighted Details

  • Supports training custom VAEs or using OpenAI's and Taming Transformer's pretrained VAEs.
  • Implements reversible networks for scaling transformer depth and various sparse attention types for efficiency.
  • Includes functionality for text generation and CLIP-based ranking of generated images.
  • Offers robust distributed training support via DeepSpeed and Horovod.

Maintenance & Community

The project has seen contributions from several individuals and is actively maintained. Community support and discussions are likely available through GitHub issues and potentially linked community channels.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README snippet. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

Training DALL-E from scratch requires significant computational resources and large datasets. The README notes that achieving results comparable to OpenAI's original paper requires a depth of 64 layers, which is computationally intensive. The project also mentions moving towards DALL-E 2, indicating potential future deprecation of this specific implementation.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.4%
4k
Image synthesis research paper using a linear diffusion transformer
Created 11 months ago
Updated 5 days ago
Starred by Robin Rombach Robin Rombach(Cofounder of Black Forest Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
Created 2 years ago
Updated 1 year ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
57 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.