EasyControl  by Xiaojiu-z

DiT framework for efficient, flexible diffusion model control

created 4 months ago
1,623 stars

Top 26.5% on sourcepulse

GitHubView on GitHub
Project Summary

EasyControl provides a unified framework for adding efficient and flexible conditional control to Diffusion Transformer (DiT) models, addressing limitations in existing DiT ecosystems. It targets researchers and developers working with DiT architectures, enabling plug-and-play functionality, multi-condition coordination, and improved generation flexibility for tasks like style transfer and image manipulation.

How It Works

EasyControl integrates control mechanisms via a lightweight Condition Injection LoRA module. It employs a Position-Aware Training Paradigm and combines Causal Attention with KV Cache technology. This approach enhances model compatibility, allowing for plug-and-play integration and style-preserving control, while also supporting diverse resolutions, aspect ratios, and multi-condition combinations with improved inference efficiency.

Quick Start & Requirements

  • Install: Create a conda environment (conda create -n easycontrol python=3.10), activate it (conda activate easycontrol), and install dependencies (pip install -r requirements.txt).
  • Prerequisites: Python 3.10, PyTorch with CUDA support. Recommended hardware: 1x NVIDIA H100/H800/A100 with ~80GB GPU memory for training.
  • Download Models: Models can be downloaded from Hugging Face or via provided Python scripts.
  • Docs/Demo: Hugging Face demo available at https://huggingface.co/spaces/Xiaojiu-Z/EasyControl.

Highlighted Details

  • Supports single and multi-condition control (e.g., Canny, Depth, Pose, Subject, Inpainting).
  • Offers a Ghibli-style portrait generation LoRA.
  • Integrates with CFG-Zero* for boosted image fidelity and controllability.
  • ComfyUI Node support via jax-explorer.

Maintenance & Community

  • Active development with recent releases of training code, simple API, and pre-trained checkpoints.
  • Community integration via Hugging Face Spaces.
  • Contact for collaboration available.

Licensing & Compatibility

  • Code released under Apache License 2.0 for academic and commercial use.
  • Released checkpoints are for research purposes only.

Limitations & Caveats

The recommended hardware for training is substantial (H100/A100 with 80GB VRAM). While inference code is released, the Gradio demo notes hardware constraints may limit high-resolution generation on personal machines.

Health Check
Last commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
4
Star History
214 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.