long_stable_diffusion by sharonzhou

AI pipeline for long-form text-to-image generation

Created 3 years ago

689 stars

Top 49.4% on SourcePulse

View on GitHub

3 Experts Love This Project

Chaoyu Yang

Founder of Bento

Georgios Konstantopoulos

CTO, General Partner at Paradigm

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

This project provides a pipeline for generating long-form, illustrated stories using GPT-3 and Stable Diffusion. It's designed for users who want to create visually rich content from lengthy text, automating the process of generating and integrating illustrations.

How It Works

The pipeline leverages GPT-3 to generate illustration ideas from input text, translating these into optimized "prompt-English" suitable for Stable Diffusion. These prompts are then fed into Stable Diffusion to generate images. The process is designed for efficiency, utilizing multi-GPU processing to maximize throughput.

Quick Start & Requirements

Install dependencies and set OPENAI_TOKEN environment variable.
Run with bash run.sh -f <name_of_txtfile_in_texts_dir>.
Requires 2 GPUs with 24GB memory each (configurable).
Needs access to CompVis/stable-diffusion-v1-4.
For "extracts" method, install nltk and run nltk.download('punkt').
Official documentation: [Not explicitly linked, but implied by README structure]

Highlighted Details

Automates the creation of illustrated stories from long-form text.
Supports two methods for prompt generation: "sections" and "extracts".
Outputs results in a .docx file containing images and prompts.
Optimized for multi-GPU processing (specifically 2x 24GB GPUs).
Includes a "translation layer" for improving Stable Diffusion prompts.

Maintenance & Community

The project is presented as a "weekend hackathon project" with an open invitation for comments and pull requests. There are no explicit mentions of maintainers, sponsorships, or community channels like Discord/Slack.

Licensing & Compatibility

The README does not explicitly state a license. Given the use of OpenAI API and Stable Diffusion models, users should verify compatibility with their terms of service and any underlying model licenses.

Limitations & Caveats

The project is primarily optimized for a specific hardware configuration (2x 24GB GPUs) and requires manual adjustment for other setups. Several output formats and prompt generation methods are listed as "Todo" or "Future" enhancements, indicating the current functionality is limited.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days