Dreambooth-Stable-Diffusion by XavierXiao

Dreambooth implementation for Stable Diffusion

Created 3 years ago

7,758 stars

Top 6.7% on SourcePulse

View on GitHub

9 Experts Love This Project

Tim J. Baek

Founder of Open WebUI

and 5 more!

Project Summary

This repository implements Google's Dreambooth technique for fine-tuning Stable Diffusion models with a small set of custom images. It allows users to personalize text-to-image generation by teaching the model new subjects or styles, building upon the Stable Diffusion architecture with minimal modifications to the Textual Inversion codebase.

How It Works

The implementation fine-tunes the entire Stable Diffusion Unet model, unlike Textual Inversion which only optimizes word embeddings. This approach, inspired by the original Dreambooth paper, enables more profound customization. It leverages gradient checkpointing for reduced GPU memory usage and uses a placeholder token (hardcoded as "sks") to represent the new subject within prompts. Regularization images, generated using a class-specific prompt, are used to prevent the model from overfitting and forgetting general concepts.

Quick Start & Requirements

Install: Follow instructions from the Textual Inversion or original Stable Diffusion repositories to set up the ldm environment.
Pre-trained Weights: Download Stable Diffusion weights (e.g., sd-v1-4-full-ema.ckpt) from HuggingFace.
Training: python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume /path/to/sd-v1-4-full-ema.ckpt -n <job name> --gpus 0, --data_root /root/to/training/images --reg_data_root /root/to/regularization/images --class_word <xxx>
Prerequisites: Requires pre-trained Stable Diffusion model weights. Training is reported to take ~15 minutes on two A6000 GPUs.

Highlighted Details

Fine-tunes the entire Unet, offering deeper customization than embedding-only methods.
Supports gradient checkpointing to reduce GPU memory requirements.
Uses a placeholder token (e.g., "sks") for subject injection into prompts.
Requires pre-generated regularization images for effective training.

Maintenance & Community

The repository is maintained by XavierXiao. No specific community channels or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. It is based on the Textual Inversion repository, which is typically under permissive licenses like MIT, but this should be verified. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The placeholder token is hardcoded, requiring manual code changes to modify. The effectiveness of regularization images can vary, and generating realistic ones for certain classes might be challenging. The project is based on older Stable Diffusion versions.

Health Check

Last Commit

3 years ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days