Dreambooth-Stable-Diffusion  by XavierXiao

Dreambooth implementation for Stable Diffusion

created 2 years ago
7,751 stars

Top 6.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository implements Google's Dreambooth technique for fine-tuning Stable Diffusion models with a small set of custom images. It allows users to personalize text-to-image generation by teaching the model new subjects or styles, building upon the Stable Diffusion architecture with minimal modifications to the Textual Inversion codebase.

How It Works

The implementation fine-tunes the entire Stable Diffusion Unet model, unlike Textual Inversion which only optimizes word embeddings. This approach, inspired by the original Dreambooth paper, enables more profound customization. It leverages gradient checkpointing for reduced GPU memory usage and uses a placeholder token (hardcoded as "sks") to represent the new subject within prompts. Regularization images, generated using a class-specific prompt, are used to prevent the model from overfitting and forgetting general concepts.

Quick Start & Requirements

  • Install: Follow instructions from the Textual Inversion or original Stable Diffusion repositories to set up the ldm environment.
  • Pre-trained Weights: Download Stable Diffusion weights (e.g., sd-v1-4-full-ema.ckpt) from HuggingFace.
  • Training: python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume /path/to/sd-v1-4-full-ema.ckpt -n <job name> --gpus 0, --data_root /root/to/training/images --reg_data_root /root/to/regularization/images --class_word <xxx>
  • Prerequisites: Requires pre-trained Stable Diffusion model weights. Training is reported to take ~15 minutes on two A6000 GPUs.

Highlighted Details

  • Fine-tunes the entire Unet, offering deeper customization than embedding-only methods.
  • Supports gradient checkpointing to reduce GPU memory requirements.
  • Uses a placeholder token (e.g., "sks") for subject injection into prompts.
  • Requires pre-generated regularization images for effective training.

Maintenance & Community

The repository is maintained by XavierXiao. No specific community channels or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. It is based on the Textual Inversion repository, which is typically under permissive licenses like MIT, but this should be verified. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The placeholder token is hardcoded, requiring manual code changes to modify. The effectiveness of regularization images can vary, and generating realistic ones for certain classes might be challenging. The project is based on older Stable Diffusion versions.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
58 stars in the last 90 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.