Mini-DALLE3 by Zeqiang-Lai

Text-to-image research paper using LLMs for interactive prompting

Created 2 years ago

313 stars

Top 86.3% on SourcePulse

Project Summary

Mini-DALLE3 offers an experimental, interactive text-to-image and text-to-text experience, aiming to replicate the interleaved capabilities of DALL-E 3 and ChatGPT. It targets users seeking a conversational interface for image generation and manipulation, providing a novel way to integrate LLM-driven text and visual content creation.

How It Works

The project leverages large language models (LLMs) to process user prompts and generate images using Stable Diffusion XL (SDXL) via IP-Adapter. This approach allows for an interleaved conversational flow where text and image generation can occur dynamically within the same interaction, mimicking a more natural and intuitive user experience.

Quick Start & Requirements

Install by downloading checkpoints and placing them in checkpoints/models/sdxl_models.
Run the Gradio web demo with: export OPENAI_API_KEY="your key" && python -m minidalle3.web.
Supports various LLMs (e.g., ChatGPT, ChatGLM, Baichuan, InternLM) by setting OPENAI_API_BASE and running specific LLM modules.
Requires an OpenAI API key or a compatible LLM endpoint.

Highlighted Details

Interactive, interleaved text-to-image and text-to-text generation.
Gradio-based web demo for easy access.
Supports multiple LLM backends beyond ChatGPT.

Maintenance & Community

The project is authored by Zeqiang Lai, Xizhou Zhu, Xizhou Zhu, Jifeng Dai, and Yu Qiao. Further community engagement details are not provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Llama LLM support is not yet implemented, and Qwen has not been tested. Several planned features, including multi-image generation, image selection, and prompt refinement, are still in the TODO list, indicating an experimental and incomplete state.

Mini-DALLE3 by Zeqiang-Lai

Explore Similar Projects

X-LLM by phellonchen

LLM-groundedDiffusion by TonyLianLong

Chinese-LLaVA by LinkSoul-AI

Image2Paragraph by showlab

joycaption by fpgaminer

org-ai by rksm

UForm by unum-cloud

VisCPM by OpenBMB

generative-ai-go by google

Awesome-Text-to-Image by Yutong-Zhou-cv

smollm by huggingface

VisualGLM-6B by zai-org