T2I-Adapter by TencentARC

T2I-Adapter for controllable text-to-image diffusion models (SD-XL)

Created 2 years ago

3,782 stars

Top 12.7% on SourcePulse

View on GitHub

5 Experts Love This Project

Alex Yu

Research Scientist at OpenAI; Cofounder of Luma AI

Chenlin Meng

Cofounder of Pika

Omar Sanseviero

DevRel at Google DeepMind

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

and 1 more!

Project Summary

This repository provides T2I-Adapter, a method for enhancing controllability in text-to-image diffusion models, specifically Stable Diffusion XL (SDXL). It offers lightweight adapters (around 77M parameters) that can be combined with pre-trained SDXL models to guide image generation using various conditioning inputs like sketches, Canny edges, line art, and depth maps. This approach allows for efficient fine-tuning and enables users to leverage the high-quality generation capabilities of SDXL with precise control.

How It Works

T2I-Adapter introduces small, trainable adapter modules that are injected into the diffusion model's architecture. These adapters process conditioning information (e.g., edge maps, pose skeletons) and fuse it with the text prompt's representation. The core advantage is that the large pre-trained diffusion model (SDXL) remains frozen, while only the adapters are trained. This significantly reduces computational cost and memory requirements for fine-tuning, enabling the addition of new control modalities without retraining the entire model.

Quick Start & Requirements

Installation: pip install -r requirements.txt and pip install git+https://github.com/huggingface/diffusers.git@t2iadapterxl
Prerequisites: Python >= 3.8, PyTorch >= 2.0.1, controlnet_aux==0.0.7, transformers, accelerate, safetensors. Inference requires at least 15GB of GPU memory.
Models: Models are automatically downloaded or can be manually downloaded from a provided URL.
Demos & Docs: Huggingface Gradio demos and tutorials are available for various adapters.

Highlighted Details

Supports SDXL with lightweight adapters (77M parameters), inheriting SDXL's high-quality generation.
Offers adapters for sketch, Canny, lineart, OpenPose, and depth (Midas, Zoe).
Enables composable adapters (CoAdapter) for combining multiple control signals.
Stability AI's Stable Doodle sketch-to-image tool is based on T2I-Adapter and SDXL.

Maintenance & Community

The project is a collaboration between Tencent ARC Lab and Hugging Face. Updates are regularly posted, including the integration of SDXL support and new adapter types. Links to Hugging Face demos and tutorials are provided.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, the models are hosted on Hugging Face, which typically uses permissive licenses. Compatibility with commercial or closed-source projects would require verification of the specific model licenses.

Limitations & Caveats

The README mentions that some SDXL adapters are still under development and may require further improvement due to limited computing resources. Inference requires a substantial amount of GPU memory (15GB+). The repository was recently shrunk using bfg, which may cause issues for existing clones.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

12 stars in the last 30 days