DALLE-mtf  by EleutherAI

DALL-E implementation for large-scale training

created 4 years ago
434 stars

Top 69.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides an implementation of OpenAI's DALL-E model within the Mesh-TensorFlow framework, targeting large-scale training. It aims to enable researchers and practitioners to train models comparable to or larger than the original 12 billion parameter DALL-E, with a focus on efficient distributed training.

How It Works

The project leverages Mesh-TensorFlow for distributed training, allowing for efficient scaling across multiple accelerators. It follows the DALL-E architecture, which involves a VAE (Variational Autoencoder) to compress images into discrete tokens and a transformer model to generate these tokens conditioned on text. The Mesh-TensorFlow framework facilitates the partitioning of model and data across a TPU mesh, enabling training of very large models.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies via pip3 install -r requirements.txt.
  • Prerequisites: Requires Google Cloud Platform account, a storage bucket, and TPUs. Untested on GPUs but theoretically supported.
  • Setup: Use ctpu up --vm-only to create a VM connected to your GCP resources.
  • Documentation: Configuration details for VAE and DALL-E training are provided in the README.

Highlighted Details

  • Implements DALL-E architecture in Mesh-TensorFlow for large-scale training.
  • Includes a VAE pretraining pipeline for image tokenization.
  • Supports custom dataset formatting with JSONL files and image directories.
  • Configuration examples for VAE and DALL-E models are detailed.

Maintenance & Community

This project is from EleutherAI, a research collective focused on open-source AI. Specific contributor details beyond Ben Wang and Aran Komatsuzaki are not explicitly listed.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking would require clarification of the license.

Limitations & Caveats

The project is marked as "[WIP]" (Work In Progress). No pre-trained models are available yet. Training is primarily designed for TPUs, with GPU support being theoretical. A public, large-scale dataset for DALL-E training is still in development.

Health Check
Last commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
created 3 years ago
updated 7 months ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai) and Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code).

GPT2 by ConnorJL

0%
1k
GPT2 training implementation, supporting TPUs and GPUs
created 6 years ago
updated 2 years ago
Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
5 more.

gpt-neo by EleutherAI

0.0%
8k
GPT-2/3-style model implementation using mesh-tensorflow
created 5 years ago
updated 3 years ago
Feedback? Help us improve.