InfiniteYou  by bytedance

Photo recrafting research paper with identity preservation using Diffusion Transformers

Created 7 months ago
2,600 stars

Top 18.0% on SourcePulse

GitHubView on GitHub
Project Summary

InfiniteYou (InfU) is a framework for high-fidelity, identity-preserving image generation using Diffusion Transformers (DiTs), specifically targeting users who need to recraft photos while maintaining personal identity. It addresses limitations in existing methods like poor text-image alignment and generation quality by introducing InfuseNet and a multi-stage training strategy.

How It Works

InfU injects identity features into a DiT base model (FLUX.1-dev) via residual connections using InfuseNet. This approach enhances identity similarity without compromising generation capabilities. A multi-stage training process, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further refines text-image alignment, image quality, and mitigates face copy-pasting issues.

Quick Start & Requirements

  • Install via pip install -r requirements.txt.
  • Requires Python 3.x.
  • VRAM requirements: ~43GB (bf16), ~30GB (CPU offload), ~24GB (8-bit quantization), ~16GB (8-bit + CPU offload).
  • Local inference script: python test.py --id_image <path> --prompt <text>.
  • Online Hugging Face demo available.
  • Official ComfyUI node: bytedance/ComfyUI_InfiniteYou.
  • Official project page and paper available.

Highlighted Details

  • Achieves state-of-the-art performance, surpassing baselines like FLUX.1-dev IP-Adapter and PuLID-FLUX in identity similarity, text-image alignment, and image quality.
  • Features a plug-and-play design, compatible with various FLUX.1-dev variants, ControlNets, LoRAs, and IP-Adapter for extended flexibility.
  • Offers two model variants: aes_stage2 (better aesthetics/alignment) and sim_stage1 (higher ID similarity).
  • Includes optional LoRAs for realism and anti-blur, and memory reduction options (8-bit quantization, CPU offloading).

Maintenance & Community

  • Project initiated by ByteDance.
  • Official ComfyUI node available, with several unofficial contributions.
  • GGUF version and custom LoRAs are available from the community.

Licensing & Compatibility

  • Code licensed under Apache License 2.0.
  • Model licensed under Creative Commons Attribution-NonCommercial 4.0 International Public License (academic research purposes only).
  • Dependencies like InsightFace and FLUX.1-dev must follow their original licenses. Commercial use is restricted by the model license.

Limitations & Caveats

The model is licensed for non-commercial, academic research purposes only. Users must ensure compliance with local laws and the licenses of all dependencies.

Health Check
Last Commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
29 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.4%
4k
Image synthesis research paper using a linear diffusion transformer
Created 11 months ago
Updated 5 days ago
Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Assaf Elovic Assaf Elovic(Cofounder of Tavily), and
2 more.

facechain by modelscope

0.1%
9k
AI toolchain for generating personalized digital-twin portraits
Created 2 years ago
Updated 3 months ago
Feedback? Help us improve.