Consistent multi-subject image synthesis with fine-grained control
Top 58.6% on sourcepulse
XVerse provides a novel method for multi-subject image synthesis, enabling independent control over individual subjects' identities and semantic attributes without affecting global image features. It targets researchers and users needing precise control in personalized image generation, offering high-fidelity and editable outputs.
How It Works
XVerse transforms reference images into token-specific text-stream modulations via offsets. This approach allows for granular control over subject characteristics and semantics, ensuring consistency and editability in generated images. The method leverages DiT (Diffusion Transformer) architecture for its effectiveness in image generation tasks.
Quick Start & Requirements
pip install -r requirements.txt
.model_ir_se50.pth
) from InsightFace_Pytorch, and set environment variables for model paths.python run_gradio.py
.python inference_single_sample.py
with specified parameters for single or multiple subjects.--use_lower_vram True
) or 24GB VRAM (--use_low_vram True
). Quantized models (bnb-nf4, GGUF) further reduce VRAM requirements.Highlighted Details
Maintenance & Community
The project is actively developed with recent updates in July 2025. A Hugging Face Space demo is available.
Licensing & Compatibility
The code is licensed under Apache 2.0. The dataset is CC0, but users must also comply with the license of dreambench++, from which it is adapted.
Limitations & Caveats
Quantized models may lead to performance degradation, requiring parameter re-adjustment. CPU offloading significantly reduces inference speed. The project is under active development, with features like a benchmark leaderboard and ComfyUI implementation still pending.
1 week ago
Inactive