Dreambooth-Stable-Diffusion  by JoePenna

Dreambooth implementation for Stable Diffusion via Textual Inversion

created 2 years ago
3,225 stars

Top 15.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an implementation of Dreambooth for Stable Diffusion, enabling users to train custom faces, objects, and styles into diffusion models. It is primarily targeted at filmmakers, concept artists, and digital artists who need to integrate specific subjects or aesthetics into their creative workflows. The benefit is the ability to generate novel imagery with personalized elements, streamlining the concept and production pipeline.

How It Works

This implementation leverages Textual Inversion techniques, building upon the work of Gal et al., while incorporating ideas from Dreambooth for regularization and prior loss preservation. The core approach involves fine-tuning a Stable Diffusion model with a small set of user-provided images of a specific subject or style, using a unique textual token to represent it. This method aims to efficiently embed new concepts without requiring extensive computational resources or massive datasets.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies using pip install -r requirements.txt within a Python 3.10 virtual environment.
  • Prerequisites: Python 3.10, Git, PyTorch (version 1.13.1+cu117 recommended for CUDA 11.7), and a Stable Diffusion checkpoint file (e.g., v1-5-pruned-emaonly-pruned.ckpt).
  • Cloud Setup: Jupyter notebooks are provided for cloud platforms like RunPod and Vast.ai, requiring instances with at least 24GB VRAM (e.g., RTX 3090, RTX 4090, A5000).
  • Documentation: Extensive guides are available for cloud and local setup, configuration, and debugging.

Highlighted Details

  • Supports training on GPUs with 24GB VRAM, with cloud-based solutions provided for users without local high-end hardware.
  • Offers detailed guidance on captioning training images and managing multiple subjects/concepts using folder structures and special tokens (S, C).
  • Includes debugging tips and strategies for common issues like poor likeness or style bleed-through, emphasizing prompt engineering.
  • The project is a fork of an earlier implementation, with the author noting a shift in focus and a renaming to "The Repo Formerly Known As 'Dreambooth'".

Maintenance & Community

The project has seen contributions from multiple individuals, with an active Discord community mentioned for further help and discussion. The author, Joe Penna (MysteryGuitarMan), is a filmmaker using the tool for professional projects.

Licensing & Compatibility

The repository's license is not explicitly stated in the README, but it is based on work that is typically MIT licensed. However, users are cautioned against training on others' art without permission, and against using artists' names in prompts, suggesting a strong ethical stance.

Limitations & Caveats

This implementation may shift generated images towards the training subject, potentially affecting other similar classes. Training two subjects consecutively is not straightforward. The resulting model files can be large (11-12GB) before pruning, though a pruner is provided. The author notes that YouTube tutorials may be outdated due to frequent Docker image updates on platforms like RunPod.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.