XVerse  by bytedance

Consistent multi-subject image synthesis with fine-grained control

Created 2 months ago
593 stars

Top 54.9% on SourcePulse

GitHubView on GitHub
Project Summary

XVerse provides a novel method for multi-subject image synthesis, enabling independent control over individual subjects' identities and semantic attributes without affecting global image features. It targets researchers and users needing precise control in personalized image generation, offering high-fidelity and editable outputs.

How It Works

XVerse transforms reference images into token-specific text-stream modulations via offsets. This approach allows for granular control over subject characteristics and semantics, ensuring consistency and editability in generated images. The method leverages DiT (Diffusion Transformer) architecture for its effectiveness in image generation tasks.

Quick Start & Requirements

  • Installation: Requires Python 3.10.16, PyTorch 2.6.0 (with CUDA 12.4), flash-attn 2.7.4.post1, and httpx 0.23.3. Dependencies are installed via pip install -r requirements.txt.
  • Checkpoints: Download required checkpoints, including a face recognition model (model_ir_se50.pth) from InsightFace_Pytorch, and set environment variables for model paths.
  • Demo: Run the local Gradio demo with python run_gradio.py.
  • Inference: Use python inference_single_sample.py with specified parameters for single or multiple subjects.
  • Low-VRAM: Supports inference on 16GB VRAM (--use_lower_vram True) or 24GB VRAM (--use_low_vram True). Quantized models (bnb-nf4, GGUF) further reduce VRAM requirements.
  • Links: Hugging Face Space demo available.

Highlighted Details

  • Supports quantized diffusion models (bnb-nf4, GGUF) for reduced VRAM usage.
  • Offers low-VRAM inference modes for consumer-grade GPUs (16GB/24GB VRAM).
  • Enables precise, independent control over multiple subjects within a single image.
  • Includes a benchmark dataset (XVerseBench) for evaluation.

Maintenance & Community

The project is actively developed with recent updates in July 2025. A Hugging Face Space demo is available.

Licensing & Compatibility

The code is licensed under Apache 2.0. The dataset is CC0, but users must also comply with the license of dreambench++, from which it is adapted.

Limitations & Caveats

Quantized models may lead to performance degradation, requiring parameter re-adjustment. CPU offloading significantly reduces inference speed. The project is under active development, with features like a benchmark leaderboard and ComfyUI implementation still pending.

Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
3
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

IP-Adapter by tencent-ailab

0.3%
6k
Adapter for image prompt in text-to-image diffusion models
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
11 more.

IF by deep-floyd

0.0%
8k
Text-to-image model for photorealistic synthesis and language understanding
Created 2 years ago
Updated 1 year ago
Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Assaf Elovic Assaf Elovic(Cofounder of Tavily), and
2 more.

facechain by modelscope

0.1%
9k
AI toolchain for generating personalized digital-twin portraits
Created 2 years ago
Updated 3 months ago
Feedback? Help us improve.