dalle-mini by borisdayma

Text-to-image model for generating images from text prompts

Created 4 years ago

14,815 stars

Top 3.4% on SourcePulse

View on GitHub

20 Experts Love This Project

Andrej Karpathy

Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n

David Ha

Cofounder of Sakana AI

Wei-Lin Chiang

Cofounder of LMArena

Benjamin Bolte

Cofounder of K-Scale Labs

and 16 more!

Project Summary

This repository provides DALL·E Mini, an open-source model for generating images from text prompts. It's designed for researchers and developers interested in text-to-image synthesis, offering a functional implementation that can be run locally or via hosted services.

How It Works

DALL·E Mini employs a VQGAN-f16-16384 model for image encoding/decoding and a transformer-based sequence-to-sequence model for text-to-image generation. This architecture draws inspiration from foundational papers in text-to-image synthesis and transformer variants, aiming for efficient and high-quality image generation from textual descriptions.

Quick Start & Requirements

Installation: pip install dalle-mini for inference. For development: pip install -e ".[dev]".
Dependencies: Python. Specific hardware requirements (e.g., GPU, VRAM) are not explicitly detailed but are implied for practical use.
Resources: A notebook for step-by-step pipeline experimentation is available. Trained models are hosted on Hugging Face Model Hub.

Highlighted Details

Offers a functional implementation of DALL·E Mini for local use.
Provides links to hosted versions (Craiyon) and community projects (DALL·E Playground, DALL·E Flow).
Extensive references to foundational research papers in AI and computer vision.
Trained models are available on Hugging Face Model Hub.

Maintenance & Community

The project is active, with contributions from a notable list of authors and thanks to various communities and organizations like Hugging Face and Google TPU Research Cloud. Community interaction is encouraged via the LAION Discord.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. This requires further investigation for commercial use or closed-source integration.

Limitations & Caveats

The README does not specify hardware requirements for running the model locally, nor does it detail performance benchmarks or limitations of the generated images. The absence of a clear license is a significant caveat for adoption.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

14 stars in the last 30 days