DALLE-mtf by EleutherAI

DALL-E implementation for large-scale training

Created 5 years ago

430 stars

Top 69.0% on SourcePulse

View on GitHub

2 Experts Love This Project

Andreas Jansson

Cofounder of Replicate

Leo Gao

Cofounder of EleutherAI

Project Summary

This repository provides an implementation of OpenAI's DALL-E model within the Mesh-TensorFlow framework, targeting large-scale training. It aims to enable researchers and practitioners to train models comparable to or larger than the original 12 billion parameter DALL-E, with a focus on efficient distributed training.

How It Works

The project leverages Mesh-TensorFlow for distributed training, allowing for efficient scaling across multiple accelerators. It follows the DALL-E architecture, which involves a VAE (Variational Autoencoder) to compress images into discrete tokens and a transformer model to generate these tokens conditioned on text. The Mesh-TensorFlow framework facilitates the partitioning of model and data across a TPU mesh, enabling training of very large models.

Quick Start & Requirements

Install: Clone the repository and install dependencies via pip3 install -r requirements.txt.
Prerequisites: Requires Google Cloud Platform account, a storage bucket, and TPUs. Untested on GPUs but theoretically supported.
Setup: Use ctpu up --vm-only to create a VM connected to your GCP resources.
Documentation: Configuration details for VAE and DALL-E training are provided in the README.

Highlighted Details

Implements DALL-E architecture in Mesh-TensorFlow for large-scale training.
Includes a VAE pretraining pipeline for image tokenization.
Supports custom dataset formatting with JSONL files and image directories.
Configuration examples for VAE and DALL-E models are detailed.

Maintenance & Community

This project is from EleutherAI, a research collective focused on open-source AI. Specific contributor details beyond Ben Wang and Aran Komatsuzaki are not explicitly listed.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking would require clarification of the license.

Limitations & Caveats

The project is marked as "[WIP]" (Work In Progress). No pre-trained models are available yet. Training is primarily designed for TPUs, with GPU support being theoretical. A public, large-scale dataset for DALL-E training is still in development.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days