long_stable_diffusion  by sharonzhou

AI pipeline for long-form text-to-image generation

created 2 years ago
690 stars

Top 50.2% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a pipeline for generating long-form, illustrated stories using GPT-3 and Stable Diffusion. It's designed for users who want to create visually rich content from lengthy text, automating the process of generating and integrating illustrations.

How It Works

The pipeline leverages GPT-3 to generate illustration ideas from input text, translating these into optimized "prompt-English" suitable for Stable Diffusion. These prompts are then fed into Stable Diffusion to generate images. The process is designed for efficiency, utilizing multi-GPU processing to maximize throughput.

Quick Start & Requirements

  • Install dependencies and set OPENAI_TOKEN environment variable.
  • Run with bash run.sh -f <name_of_txtfile_in_texts_dir>.
  • Requires 2 GPUs with 24GB memory each (configurable).
  • Needs access to CompVis/stable-diffusion-v1-4.
  • For "extracts" method, install nltk and run nltk.download('punkt').
  • Official documentation: [Not explicitly linked, but implied by README structure]

Highlighted Details

  • Automates the creation of illustrated stories from long-form text.
  • Supports two methods for prompt generation: "sections" and "extracts".
  • Outputs results in a .docx file containing images and prompts.
  • Optimized for multi-GPU processing (specifically 2x 24GB GPUs).
  • Includes a "translation layer" for improving Stable Diffusion prompts.

Maintenance & Community

The project is presented as a "weekend hackathon project" with an open invitation for comments and pull requests. There are no explicit mentions of maintainers, sponsorships, or community channels like Discord/Slack.

Licensing & Compatibility

The README does not explicitly state a license. Given the use of OpenAI API and Stable Diffusion models, users should verify compatibility with their terms of service and any underlying model licenses.

Limitations & Caveats

The project is primarily optimized for a specific hardware configuration (2x 24GB GPUs) and requires manual adjustment for other setups. Several output formats and prompt generation methods are listed as "Todo" or "Future" enhancements, indicating the current functionality is limited.

Health Check
Last commit

2 years ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.