DALLE-pytorch  by lucidrains

PyTorch implementation of DALL-E, OpenAI's text-to-image transformer

created 4 years ago
5,623 stars

Top 9.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of OpenAI's DALL-E, a text-to-image generation model. It allows researchers and developers to replicate, train, and experiment with DALL-E, offering flexibility in VAE choices and attention mechanisms.

How It Works

The project implements DALL-E as a transformer model that takes text tokens and visual tokens (generated by a Discrete VAE) as input to produce images. It supports various VAE backends, including OpenAI's pretrained VAE and Taming Transformer's VQGAN, and offers advanced features like reversible networks for deeper models and different sparse attention mechanisms (axial, conv-like) to manage computational cost.

Quick Start & Requirements

Highlighted Details

  • Supports training custom VAEs or using OpenAI's and Taming Transformer's pretrained VAEs.
  • Implements reversible networks for scaling transformer depth and various sparse attention types for efficiency.
  • Includes functionality for text generation and CLIP-based ranking of generated images.
  • Offers robust distributed training support via DeepSpeed and Horovod.

Maintenance & Community

The project has seen contributions from several individuals and is actively maintained. Community support and discussions are likely available through GitHub issues and potentially linked community channels.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README snippet. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

Training DALL-E from scratch requires significant computational resources and large datasets. The README notes that achieving results comparable to OpenAI's original paper requires a depth of 64 layers, which is computationally intensive. The project also mentions moving towards DALL-E 2, indicating potential future deprecation of this specific implementation.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
25 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.