Dreambooth-Stable-Diffusion by JoePenna

Dreambooth implementation for Stable Diffusion via Textual Inversion

Created 3 years ago

3,220 stars

Top 14.6% on SourcePulse

View on GitHub

7 Experts Love This Project

Luis Capelo

Cofounder of Lightning AI

Carol Willing

Core Contributor to CPython, Jupyter

Charlie Holtz

Founder of Melty

Ben Firshman

Cofounder of Replicate

and 3 more!

Project Summary

This repository provides an implementation of Dreambooth for Stable Diffusion, enabling users to train custom faces, objects, and styles into diffusion models. It is primarily targeted at filmmakers, concept artists, and digital artists who need to integrate specific subjects or aesthetics into their creative workflows. The benefit is the ability to generate novel imagery with personalized elements, streamlining the concept and production pipeline.

How It Works

This implementation leverages Textual Inversion techniques, building upon the work of Gal et al., while incorporating ideas from Dreambooth for regularization and prior loss preservation. The core approach involves fine-tuning a Stable Diffusion model with a small set of user-provided images of a specific subject or style, using a unique textual token to represent it. This method aims to efficiently embed new concepts without requiring extensive computational resources or massive datasets.

Quick Start & Requirements

Installation: Clone the repository and install dependencies using pip install -r requirements.txt within a Python 3.10 virtual environment.
Prerequisites: Python 3.10, Git, PyTorch (version 1.13.1+cu117 recommended for CUDA 11.7), and a Stable Diffusion checkpoint file (e.g., v1-5-pruned-emaonly-pruned.ckpt).
Cloud Setup: Jupyter notebooks are provided for cloud platforms like RunPod and Vast.ai, requiring instances with at least 24GB VRAM (e.g., RTX 3090, RTX 4090, A5000).
Documentation: Extensive guides are available for cloud and local setup, configuration, and debugging.

Highlighted Details

Supports training on GPUs with 24GB VRAM, with cloud-based solutions provided for users without local high-end hardware.
Offers detailed guidance on captioning training images and managing multiple subjects/concepts using folder structures and special tokens (S, C).
Includes debugging tips and strategies for common issues like poor likeness or style bleed-through, emphasizing prompt engineering.
The project is a fork of an earlier implementation, with the author noting a shift in focus and a renaming to "The Repo Formerly Known As 'Dreambooth'".

Maintenance & Community

The project has seen contributions from multiple individuals, with an active Discord community mentioned for further help and discussion. The author, Joe Penna (MysteryGuitarMan), is a filmmaker using the tool for professional projects.

Licensing & Compatibility

The repository's license is not explicitly stated in the README, but it is based on work that is typically MIT licensed. However, users are cautioned against training on others' art without permission, and against using artists' names in prompts, suggesting a strong ethical stance.

Limitations & Caveats

This implementation may shift generated images towards the training subject, potentially affecting other similar classes. Training two subjects consecutively is not straightforward. The resulting model files can be large (11-12GB) before pruning, though a pruner is provided. The author notes that YouTube tutorials may be outdated due to frequent Docker image updates on platforms like RunPod.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days