AI pipeline for long-form text-to-image generation
Top 50.2% on sourcepulse
This project provides a pipeline for generating long-form, illustrated stories using GPT-3 and Stable Diffusion. It's designed for users who want to create visually rich content from lengthy text, automating the process of generating and integrating illustrations.
How It Works
The pipeline leverages GPT-3 to generate illustration ideas from input text, translating these into optimized "prompt-English" suitable for Stable Diffusion. These prompts are then fed into Stable Diffusion to generate images. The process is designed for efficiency, utilizing multi-GPU processing to maximize throughput.
Quick Start & Requirements
OPENAI_TOKEN
environment variable.bash run.sh -f <name_of_txtfile_in_texts_dir>
.CompVis/stable-diffusion-v1-4
.nltk
and run nltk.download('punkt')
.Highlighted Details
.docx
file containing images and prompts.Maintenance & Community
The project is presented as a "weekend hackathon project" with an open invitation for comments and pull requests. There are no explicit mentions of maintainers, sponsorships, or community channels like Discord/Slack.
Licensing & Compatibility
The README does not explicitly state a license. Given the use of OpenAI API and Stable Diffusion models, users should verify compatibility with their terms of service and any underlying model licenses.
Limitations & Caveats
The project is primarily optimized for a specific hardware configuration (2x 24GB GPUs) and requires manual adjustment for other setups. Several output formats and prompt generation methods are listed as "Todo" or "Future" enhancements, indicating the current functionality is limited.
2 years ago
1+ week