PhotoMaker  by TencentARC

Photo customization research paper using stacked ID embedding

created 1 year ago
10,050 stars

Top 5.1% on sourcepulse

GitHubView on GitHub
Project Summary

PhotoMaker enables rapid, LoRA-free customization of realistic human photos by stacking identity embeddings. It targets researchers and artists seeking to generate diverse, high-fidelity images with precise text control, integrating seamlessly with existing Stable Diffusion XL pipelines and LoRA modules.

How It Works

PhotoMaker employs a stacked ID embedding approach, integrating identity information directly into the diffusion process without requiring additional LoRA training. This method allows for efficient personalization, preserving key facial features and enabling stylistic control through text prompts and compatibility with other LoRA modules.

Quick Start & Requirements

  • Install: pip install -r requirements.txt followed by pip install git+https://github.com/TencentARC/PhotoMaker.git
  • Prerequisites: Python >= 3.8, PyTorch >= 2.0.0. bfloat16 support is recommended for optimal speed; use torch.float16 if unsupported. Minimum GPU memory: 11GB.
  • Resources: Official demos available on Hugging Face Spaces and Replicate.
  • Docs: Paper, Project Page

Highlighted Details

  • PhotoMaker V2 offers improved ID fidelity and maintains V1's generation quality and editability.
  • Integrates with ControlNet, T2I-Adapter, and IP-Adapter for enhanced control.
  • Compatible with other LoRA modules for advanced stylization.
  • Supports customization via ComfyUI nodes and various community implementations.

Maintenance & Community

  • Developed by Tencent ARC Lab and Nankai University MCG-NKU.
  • Inspired by IP-Adapter, FastComposer, and T2I-Adapter.
  • Active community support with numerous third-party integrations (WebUI, ComfyUI, Windows, Replicate).

Licensing & Compatibility

  • The repository does not explicitly state a license. The underlying models and dependencies may have their own licenses.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The primary repository does not specify a license, creating ambiguity for commercial use.
  • While V2 improves ID fidelity, users are advised to upload multiple photos for best results.
Health Check
Last commit

9 months ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
1
Star History
177 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.