InfiniteYou  by bytedance

Photo recrafting research paper with identity preservation using Diffusion Transformers

created 5 months ago
2,523 stars

Top 19.0% on sourcepulse

GitHubView on GitHub
Project Summary

InfiniteYou (InfU) is a framework for high-fidelity, identity-preserving image generation using Diffusion Transformers (DiTs), specifically targeting users who need to recraft photos while maintaining personal identity. It addresses limitations in existing methods like poor text-image alignment and generation quality by introducing InfuseNet and a multi-stage training strategy.

How It Works

InfU injects identity features into a DiT base model (FLUX.1-dev) via residual connections using InfuseNet. This approach enhances identity similarity without compromising generation capabilities. A multi-stage training process, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further refines text-image alignment, image quality, and mitigates face copy-pasting issues.

Quick Start & Requirements

  • Install via pip install -r requirements.txt.
  • Requires Python 3.x.
  • VRAM requirements: ~43GB (bf16), ~30GB (CPU offload), ~24GB (8-bit quantization), ~16GB (8-bit + CPU offload).
  • Local inference script: python test.py --id_image <path> --prompt <text>.
  • Online Hugging Face demo available.
  • Official ComfyUI node: bytedance/ComfyUI_InfiniteYou.
  • Official project page and paper available.

Highlighted Details

  • Achieves state-of-the-art performance, surpassing baselines like FLUX.1-dev IP-Adapter and PuLID-FLUX in identity similarity, text-image alignment, and image quality.
  • Features a plug-and-play design, compatible with various FLUX.1-dev variants, ControlNets, LoRAs, and IP-Adapter for extended flexibility.
  • Offers two model variants: aes_stage2 (better aesthetics/alignment) and sim_stage1 (higher ID similarity).
  • Includes optional LoRAs for realism and anti-blur, and memory reduction options (8-bit quantization, CPU offloading).

Maintenance & Community

  • Project initiated by ByteDance.
  • Official ComfyUI node available, with several unofficial contributions.
  • GGUF version and custom LoRAs are available from the community.

Licensing & Compatibility

  • Code licensed under Apache License 2.0.
  • Model licensed under Creative Commons Attribution-NonCommercial 4.0 International Public License (academic research purposes only).
  • Dependencies like InsightFace and FLUX.1-dev must follow their original licenses. Commercial use is restricted by the model license.

Limitations & Caveats

The model is licensed for non-commercial, academic research purposes only. Users must ensure compliance with local laws and the licenses of all dependencies.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
396 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Wei-Lin Chiang Wei-Lin Chiang(Cofounder of LMArena), and
7 more.

dalle-mini by borisdayma

0.1%
15k
Text-to-image model for generating images from text prompts
created 4 years ago
updated 1 year ago
Feedback? Help us improve.