dalle-mini  by borisdayma

Text-to-image model for generating images from text prompts

created 4 years ago
14,815 stars

Top 3.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides DALL·E Mini, an open-source model for generating images from text prompts. It's designed for researchers and developers interested in text-to-image synthesis, offering a functional implementation that can be run locally or via hosted services.

How It Works

DALL·E Mini employs a VQGAN-f16-16384 model for image encoding/decoding and a transformer-based sequence-to-sequence model for text-to-image generation. This architecture draws inspiration from foundational papers in text-to-image synthesis and transformer variants, aiming for efficient and high-quality image generation from textual descriptions.

Quick Start & Requirements

  • Installation: pip install dalle-mini for inference. For development: pip install -e ".[dev]".
  • Dependencies: Python. Specific hardware requirements (e.g., GPU, VRAM) are not explicitly detailed but are implied for practical use.
  • Resources: A notebook for step-by-step pipeline experimentation is available. Trained models are hosted on Hugging Face Model Hub.

Highlighted Details

  • Offers a functional implementation of DALL·E Mini for local use.
  • Provides links to hosted versions (Craiyon) and community projects (DALL·E Playground, DALL·E Flow).
  • Extensive references to foundational research papers in AI and computer vision.
  • Trained models are available on Hugging Face Model Hub.

Maintenance & Community

The project is active, with contributions from a notable list of authors and thanks to various communities and organizations like Hugging Face and Google TPU Research Cloud. Community interaction is encouraged via the LAION Discord.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. This requires further investigation for commercial use or closed-source integration.

Limitations & Caveats

The README does not specify hardware requirements for running the model locally, nor does it detail performance benchmarks or limitations of the generated images. The absence of a clear license is a significant caveat for adoption.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
0
Star History
58 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.