Dreambooth implementation for Stable Diffusion
Top 6.8% on sourcepulse
This repository implements Google's Dreambooth technique for fine-tuning Stable Diffusion models with a small set of custom images. It allows users to personalize text-to-image generation by teaching the model new subjects or styles, building upon the Stable Diffusion architecture with minimal modifications to the Textual Inversion codebase.
How It Works
The implementation fine-tunes the entire Stable Diffusion Unet model, unlike Textual Inversion which only optimizes word embeddings. This approach, inspired by the original Dreambooth paper, enables more profound customization. It leverages gradient checkpointing for reduced GPU memory usage and uses a placeholder token (hardcoded as "sks") to represent the new subject within prompts. Regularization images, generated using a class-specific prompt, are used to prevent the model from overfitting and forgetting general concepts.
Quick Start & Requirements
ldm
environment.sd-v1-4-full-ema.ckpt
) from HuggingFace.python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume /path/to/sd-v1-4-full-ema.ckpt -n <job name> --gpus 0, --data_root /root/to/training/images --reg_data_root /root/to/regularization/images --class_word <xxx>
Highlighted Details
Maintenance & Community
The repository is maintained by XavierXiao. No specific community channels or roadmap are mentioned in the README.
Licensing & Compatibility
The README does not explicitly state a license. It is based on the Textual Inversion repository, which is typically under permissive licenses like MIT, but this should be verified. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The placeholder token is hardcoded, requiring manual code changes to modify. The effectiveness of regularization images can vary, and generating realistic ones for certain classes might be challenging. The project is based on older Stable Diffusion versions.
2 years ago
Inactive