DALL-E implementation for large-scale training
Top 69.6% on sourcepulse
This repository provides an implementation of OpenAI's DALL-E model within the Mesh-TensorFlow framework, targeting large-scale training. It aims to enable researchers and practitioners to train models comparable to or larger than the original 12 billion parameter DALL-E, with a focus on efficient distributed training.
How It Works
The project leverages Mesh-TensorFlow for distributed training, allowing for efficient scaling across multiple accelerators. It follows the DALL-E architecture, which involves a VAE (Variational Autoencoder) to compress images into discrete tokens and a transformer model to generate these tokens conditioned on text. The Mesh-TensorFlow framework facilitates the partitioning of model and data across a TPU mesh, enabling training of very large models.
Quick Start & Requirements
pip3 install -r requirements.txt
.ctpu up --vm-only
to create a VM connected to your GCP resources.Highlighted Details
Maintenance & Community
This project is from EleutherAI, a research collective focused on open-source AI. Specific contributor details beyond Ben Wang and Aran Komatsuzaki are not explicitly listed.
Licensing & Compatibility
The repository's license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking would require clarification of the license.
Limitations & Caveats
The project is marked as "[WIP]" (Work In Progress). No pre-trained models are available yet. Training is primarily designed for TPUs, with GPU support being theoretical. A public, large-scale dataset for DALL-E training is still in development.
3 years ago
Inactive