ACE_plus  by ali-vilab

Image creation/editing via instruction-based content filling (research paper)

created 7 months ago
1,247 stars

Top 32.3% on sourcepulse

GitHubView on GitHub
Project Summary

ACE++ unifies reference image generation, local editing, and controllable generation into a single framework, enabling one model to adapt to a wider range of tasks. It targets users needing versatile image manipulation capabilities, offering improved control and consistency over generated or edited images.

How It Works

ACE++ is a post-training model built upon the FLUX.1-Fill-dev foundation. It introduces specialized LoRA models for portrait consistency, subject consistency, and local editing (redrawing masked areas while preserving structure). An additional FFT model offers broader image-to-image task support, though with a performance trade-off compared to LoRA models. The FFT model uniquely incorporates 64 additional channels to represent latent pixel space from edited images, modifying the base model's input channels from 384 to 448.

Quick Start & Requirements

  • Install: git clone https://github.com/ali-vilab/ACE_plus.git and pip install -r repo_requirements.txt.
  • Base Model: Requires FLUX.1-Fill-dev, downloadable from HuggingFace.
  • Environment Variables: Set FLUX_FILL_PATH and paths for specific ACE++ models (portrait, subject, local editing, or FFT).
  • Dependencies: Python, scepter (for some FFT tasks), and potentially community nodes for depth/contour extraction.
  • Demo: python demo_lora.py or python demo_fft.py after setting environment variables.
  • ComfyUI: Workflows are provided in workflow/ComfyUI-ACE_Plus/.
  • Docs: HuggingFace Demo

Highlighted Details

  • Offers three specialized LoRA models (Portrait, Subject, LocalEditing) and a general-purpose FFT model.
  • Supports various tasks including character ID consistency, subject consistency, face swapping, and regional editing.
  • Provides ComfyUI workflows and tutorials for community integration.
  • Training code is available for custom data fine-tuning, with LoRA training requiring ~38-40GB GPU memory.

Maintenance & Community

The project is from Tongyi Lab, Alibaba Group. Recent updates include code and model releases, a HuggingFace demo, training code, ComfyUI workflows, and an FFT model. The team acknowledges community feedback on artifacts and stability, with future focus shifting to post-training on the Wan series of foundational models due to challenges with FLUX.

Licensing & Compatibility

ACE++ is a post-training model based on FLUX.1-dev. Users must adhere to the FLUX.1-dev open-source license. Test materials are for academic research and communication.

Limitations & Caveats

Instruction following for tasks like object deletion or addition can be flawed, with repainting recommended for such edits. Generated results, particularly hands, may exhibit artifacts and distortions. The FFT model's performance may be lower than LoRA models for specific tasks.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
4
Star History
211 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
1 more.

EditAnything by sail-sg

0.1%
3k
Image editing research paper using segmentation and diffusion
created 2 years ago
updated 5 months ago
Feedback? Help us improve.