InfiniteYou by bytedance

Photo recrafting research paper with identity preservation using Diffusion Transformers

Created 1 year ago

2,667 stars

Top 17.4% on SourcePulse

Project Summary

InfiniteYou (InfU) is a framework for high-fidelity, identity-preserving image generation using Diffusion Transformers (DiTs), specifically targeting users who need to recraft photos while maintaining personal identity. It addresses limitations in existing methods like poor text-image alignment and generation quality by introducing InfuseNet and a multi-stage training strategy.

How It Works

InfU injects identity features into a DiT base model (FLUX.1-dev) via residual connections using InfuseNet. This approach enhances identity similarity without compromising generation capabilities. A multi-stage training process, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further refines text-image alignment, image quality, and mitigates face copy-pasting issues.

Quick Start & Requirements

Install via pip install -r requirements.txt.
Requires Python 3.x.
VRAM requirements: ~43GB (bf16), ~30GB (CPU offload), ~24GB (8-bit quantization), ~16GB (8-bit + CPU offload).
Local inference script: python test.py --id_image <path> --prompt <text>.
Online Hugging Face demo available.
Official ComfyUI node: bytedance/ComfyUI_InfiniteYou.
Official project page and paper available.

Highlighted Details

Achieves state-of-the-art performance, surpassing baselines like FLUX.1-dev IP-Adapter and PuLID-FLUX in identity similarity, text-image alignment, and image quality.
Features a plug-and-play design, compatible with various FLUX.1-dev variants, ControlNets, LoRAs, and IP-Adapter for extended flexibility.
Offers two model variants: aes_stage2 (better aesthetics/alignment) and sim_stage1 (higher ID similarity).
Includes optional LoRAs for realism and anti-blur, and memory reduction options (8-bit quantization, CPU offloading).

Maintenance & Community

Project initiated by ByteDance.
Official ComfyUI node available, with several unofficial contributions.
GGUF version and custom LoRAs are available from the community.

Licensing & Compatibility

Code licensed under Apache License 2.0.
Model licensed under Creative Commons Attribution-NonCommercial 4.0 International Public License (academic research purposes only).
Dependencies like InsightFace and FLUX.1-dev must follow their original licenses. Commercial use is restricted by the model license.

Limitations & Caveats

The model is licensed for non-commercial, academic research purposes only. Users must ensure compliance with local laws and the licenses of all dependencies.

InfiniteYou by bytedance

Explore Similar Projects

ComfyUI-OmniGen by 1038lab

ComfyUI-HyperLoRA by bytedance

karlo by kakaobrain

InstaFlow by gnobitab

GLM-Image by zai-org

LightningDiT by hustvl

text2image-gui by n00mkrad

img2img-turbo by GaParmar

Sana by NVlabs

PhotoMaker by TencentARC

InstantID by instantX-research

facechain by modelscope