Mini-DALLE3  by Zeqiang-Lai

Text-to-image research paper using LLMs for interactive prompting

created 1 year ago
313 stars

Top 87.4% on sourcepulse

GitHubView on GitHub
Project Summary

Mini-DALLE3 offers an experimental, interactive text-to-image and text-to-text experience, aiming to replicate the interleaved capabilities of DALL-E 3 and ChatGPT. It targets users seeking a conversational interface for image generation and manipulation, providing a novel way to integrate LLM-driven text and visual content creation.

How It Works

The project leverages large language models (LLMs) to process user prompts and generate images using Stable Diffusion XL (SDXL) via IP-Adapter. This approach allows for an interleaved conversational flow where text and image generation can occur dynamically within the same interaction, mimicking a more natural and intuitive user experience.

Quick Start & Requirements

  • Install by downloading checkpoints and placing them in checkpoints/models/sdxl_models.
  • Run the Gradio web demo with: export OPENAI_API_KEY="your key" && python -m minidalle3.web.
  • Supports various LLMs (e.g., ChatGPT, ChatGLM, Baichuan, InternLM) by setting OPENAI_API_BASE and running specific LLM modules.
  • Requires an OpenAI API key or a compatible LLM endpoint.

Highlighted Details

  • Interactive, interleaved text-to-image and text-to-text generation.
  • Gradio-based web demo for easy access.
  • Supports multiple LLM backends beyond ChatGPT.

Maintenance & Community

The project is authored by Zeqiang Lai, Xizhou Zhu, Xizhou Zhu, Jifeng Dai, and Yu Qiao. Further community engagement details are not provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Llama LLM support is not yet implemented, and Qwen has not been tested. Several planned features, including multi-image generation, image selection, and prompt refinement, are still in the TODO list, indicating an experimental and incomplete state.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.