dalle-mini  by borisdayma

Text-to-image model for generating images from text prompts

Created 4 years ago
14,809 stars

Top 3.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides DALL·E Mini, an open-source model for generating images from text prompts. It's designed for researchers and developers interested in text-to-image synthesis, offering a functional implementation that can be run locally or via hosted services.

How It Works

DALL·E Mini employs a VQGAN-f16-16384 model for image encoding/decoding and a transformer-based sequence-to-sequence model for text-to-image generation. This architecture draws inspiration from foundational papers in text-to-image synthesis and transformer variants, aiming for efficient and high-quality image generation from textual descriptions.

Quick Start & Requirements

  • Installation: pip install dalle-mini for inference. For development: pip install -e ".[dev]".
  • Dependencies: Python. Specific hardware requirements (e.g., GPU, VRAM) are not explicitly detailed but are implied for practical use.
  • Resources: A notebook for step-by-step pipeline experimentation is available. Trained models are hosted on Hugging Face Model Hub.

Highlighted Details

  • Offers a functional implementation of DALL·E Mini for local use.
  • Provides links to hosted versions (Craiyon) and community projects (DALL·E Playground, DALL·E Flow).
  • Extensive references to foundational research papers in AI and computer vision.
  • Trained models are available on Hugging Face Model Hub.

Maintenance & Community

The project is active, with contributions from a notable list of authors and thanks to various communities and organizations like Hugging Face and Google TPU Research Cloud. Community interaction is encouraged via the LAION Discord.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. This requires further investigation for commercial use or closed-source integration.

Limitations & Caveats

The README does not specify hardware requirements for running the model locally, nor does it detail performance benchmarks or limitations of the generated images. The absence of a clear license is a significant caveat for adoption.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.