Image creation/editing via instruction-based content filling (research paper)
Top 32.3% on sourcepulse
ACE++ unifies reference image generation, local editing, and controllable generation into a single framework, enabling one model to adapt to a wider range of tasks. It targets users needing versatile image manipulation capabilities, offering improved control and consistency over generated or edited images.
How It Works
ACE++ is a post-training model built upon the FLUX.1-Fill-dev foundation. It introduces specialized LoRA models for portrait consistency, subject consistency, and local editing (redrawing masked areas while preserving structure). An additional FFT model offers broader image-to-image task support, though with a performance trade-off compared to LoRA models. The FFT model uniquely incorporates 64 additional channels to represent latent pixel space from edited images, modifying the base model's input channels from 384 to 448.
Quick Start & Requirements
git clone https://github.com/ali-vilab/ACE_plus.git
and pip install -r repo_requirements.txt
.FLUX_FILL_PATH
and paths for specific ACE++ models (portrait, subject, local editing, or FFT).scepter
(for some FFT tasks), and potentially community nodes for depth/contour extraction.python demo_lora.py
or python demo_fft.py
after setting environment variables.workflow/ComfyUI-ACE_Plus/
.Highlighted Details
Maintenance & Community
The project is from Tongyi Lab, Alibaba Group. Recent updates include code and model releases, a HuggingFace demo, training code, ComfyUI workflows, and an FFT model. The team acknowledges community feedback on artifacts and stability, with future focus shifting to post-training on the Wan series of foundational models due to challenges with FLUX.
Licensing & Compatibility
ACE++ is a post-training model based on FLUX.1-dev. Users must adhere to the FLUX.1-dev open-source license. Test materials are for academic research and communication.
Limitations & Caveats
Instruction following for tasks like object deletion or addition can be flawed, with repainting recommended for such edits. Generated results, particularly hands, may exhibit artifacts and distortions. The FFT model's performance may be lower than LoRA models for specific tasks.
3 months ago
1 day