ACE_plus by ali-vilab

Image creation/editing via instruction-based content filling (research paper)

Created 1 year ago

1,357 stars

Top 29.2% on SourcePulse

Project Summary

ACE++ unifies reference image generation, local editing, and controllable generation into a single framework, enabling one model to adapt to a wider range of tasks. It targets users needing versatile image manipulation capabilities, offering improved control and consistency over generated or edited images.

How It Works

ACE++ is a post-training model built upon the FLUX.1-Fill-dev foundation. It introduces specialized LoRA models for portrait consistency, subject consistency, and local editing (redrawing masked areas while preserving structure). An additional FFT model offers broader image-to-image task support, though with a performance trade-off compared to LoRA models. The FFT model uniquely incorporates 64 additional channels to represent latent pixel space from edited images, modifying the base model's input channels from 384 to 448.

Quick Start & Requirements

Install: git clone https://github.com/ali-vilab/ACE_plus.git and pip install -r repo_requirements.txt.
Base Model: Requires FLUX.1-Fill-dev, downloadable from HuggingFace.
Environment Variables: Set FLUX_FILL_PATH and paths for specific ACE++ models (portrait, subject, local editing, or FFT).
Dependencies: Python, scepter (for some FFT tasks), and potentially community nodes for depth/contour extraction.
Demo: python demo_lora.py or python demo_fft.py after setting environment variables.
ComfyUI: Workflows are provided in workflow/ComfyUI-ACE_Plus/.
Docs: HuggingFace Demo

Highlighted Details

Offers three specialized LoRA models (Portrait, Subject, LocalEditing) and a general-purpose FFT model.
Supports various tasks including character ID consistency, subject consistency, face swapping, and regional editing.
Provides ComfyUI workflows and tutorials for community integration.
Training code is available for custom data fine-tuning, with LoRA training requiring ~38-40GB GPU memory.

Maintenance & Community

The project is from Tongyi Lab, Alibaba Group. Recent updates include code and model releases, a HuggingFace demo, training code, ComfyUI workflows, and an FFT model. The team acknowledges community feedback on artifacts and stability, with future focus shifting to post-training on the Wan series of foundational models due to challenges with FLUX.

Licensing & Compatibility

ACE++ is a post-training model based on FLUX.1-dev. Users must adhere to the FLUX.1-dev open-source license. Test materials are for academic research and communication.

Limitations & Caveats

Instruction following for tasks like object deletion or addition can be flawed, with repainting recommended for such edits. Generated results, particularly hands, may exhibit artifacts and distortions. The FFT model's performance may be lower than LoRA models for specific tasks.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days