T2I-Adapter for controllable text-to-image diffusion models (SD-XL)
Top 13.3% on sourcepulse
This repository provides T2I-Adapter, a method for enhancing controllability in text-to-image diffusion models, specifically Stable Diffusion XL (SDXL). It offers lightweight adapters (around 77M parameters) that can be combined with pre-trained SDXL models to guide image generation using various conditioning inputs like sketches, Canny edges, line art, and depth maps. This approach allows for efficient fine-tuning and enables users to leverage the high-quality generation capabilities of SDXL with precise control.
How It Works
T2I-Adapter introduces small, trainable adapter modules that are injected into the diffusion model's architecture. These adapters process conditioning information (e.g., edge maps, pose skeletons) and fuse it with the text prompt's representation. The core advantage is that the large pre-trained diffusion model (SDXL) remains frozen, while only the adapters are trained. This significantly reduces computational cost and memory requirements for fine-tuning, enabling the addition of new control modalities without retraining the entire model.
Quick Start & Requirements
pip install -r requirements.txt
and pip install git+https://github.com/huggingface/diffusers.git@t2iadapterxl
controlnet_aux==0.0.7
, transformers
, accelerate
, safetensors
. Inference requires at least 15GB of GPU memory.Highlighted Details
Maintenance & Community
The project is a collaboration between Tencent ARC Lab and Hugging Face. Updates are regularly posted, including the integration of SDXL support and new adapter types. Links to Hugging Face demos and tutorials are provided.
Licensing & Compatibility
The repository does not explicitly state a license in the README. However, the models are hosted on Hugging Face, which typically uses permissive licenses. Compatibility with commercial or closed-source projects would require verification of the specific model licenses.
Limitations & Caveats
The README mentions that some SDXL adapters are still under development and may require further improvement due to limited computing resources. Inference requires a substantial amount of GPU memory (15GB+). The repository was recently shrunk using bfg
, which may cause issues for existing clones.
1 year ago
1 week