EasyControl by Xiaojiu-z

DiT framework for efficient, flexible diffusion model control

Created 10 months ago

1,709 stars

Top 24.6% on SourcePulse

Project Summary

EasyControl provides a unified framework for adding efficient and flexible conditional control to Diffusion Transformer (DiT) models, addressing limitations in existing DiT ecosystems. It targets researchers and developers working with DiT architectures, enabling plug-and-play functionality, multi-condition coordination, and improved generation flexibility for tasks like style transfer and image manipulation.

How It Works

EasyControl integrates control mechanisms via a lightweight Condition Injection LoRA module. It employs a Position-Aware Training Paradigm and combines Causal Attention with KV Cache technology. This approach enhances model compatibility, allowing for plug-and-play integration and style-preserving control, while also supporting diverse resolutions, aspect ratios, and multi-condition combinations with improved inference efficiency.

Quick Start & Requirements

Install: Create a conda environment (conda create -n easycontrol python=3.10), activate it (conda activate easycontrol), and install dependencies (pip install -r requirements.txt).
Prerequisites: Python 3.10, PyTorch with CUDA support. Recommended hardware: 1x NVIDIA H100/H800/A100 with ~80GB GPU memory for training.
Download Models: Models can be downloaded from Hugging Face or via provided Python scripts.
Docs/Demo: Hugging Face demo available at https://huggingface.co/spaces/Xiaojiu-Z/EasyControl.

Highlighted Details

Supports single and multi-condition control (e.g., Canny, Depth, Pose, Subject, Inpainting).
Offers a Ghibli-style portrait generation LoRA.
Integrates with CFG-Zero* for boosted image fidelity and controllability.
ComfyUI Node support via jax-explorer.

Maintenance & Community

Active development with recent releases of training code, simple API, and pre-trained checkpoints.
Community integration via Hugging Face Spaces.
Contact for collaboration available.

Licensing & Compatibility

Code released under Apache License 2.0 for academic and commercial use.
Released checkpoints are for research purposes only.

Limitations & Caveats

The recommended hardware for training is substantial (H100/A100 with 80GB VRAM). While inference code is released, the Gradio demo notes hardware constraints may limit high-resolution generation on personal machines.

EasyControl by Xiaojiu-z

Explore Similar Projects

TCD by jabir-zheng

svdiff-pytorch by mkshing

Universal-Guided-Diffusion by arpitbansal297

InstaFlow by gnobitab

kandinsky-5 by kandinskylab

maxdiffusion by AI-Hypercomputer

ddpo-pytorch by kvablack

In-Context-LoRA by ali-vilab

OminiControl by Yuanshi9815

EasyAnimate by aigc-apps

MAGI-1 by SandAI-org

NExT-GPT by NExT-GPT