XVerse by bytedance

Consistent multi-subject image synthesis with fine-grained control

Created 8 months ago

620 stars

Top 53.2% on SourcePulse

Project Summary

XVerse provides a novel method for multi-subject image synthesis, enabling independent control over individual subjects' identities and semantic attributes without affecting global image features. It targets researchers and users needing precise control in personalized image generation, offering high-fidelity and editable outputs.

How It Works

XVerse transforms reference images into token-specific text-stream modulations via offsets. This approach allows for granular control over subject characteristics and semantics, ensuring consistency and editability in generated images. The method leverages DiT (Diffusion Transformer) architecture for its effectiveness in image generation tasks.

Quick Start & Requirements

Installation: Requires Python 3.10.16, PyTorch 2.6.0 (with CUDA 12.4), flash-attn 2.7.4.post1, and httpx 0.23.3. Dependencies are installed via pip install -r requirements.txt.
Checkpoints: Download required checkpoints, including a face recognition model (model_ir_se50.pth) from InsightFace_Pytorch, and set environment variables for model paths.
Demo: Run the local Gradio demo with python run_gradio.py.
Inference: Use python inference_single_sample.py with specified parameters for single or multiple subjects.
Low-VRAM: Supports inference on 16GB VRAM (--use_lower_vram True) or 24GB VRAM (--use_low_vram True). Quantized models (bnb-nf4, GGUF) further reduce VRAM requirements.
Links: Hugging Face Space demo available.

Highlighted Details

Supports quantized diffusion models (bnb-nf4, GGUF) for reduced VRAM usage.
Offers low-VRAM inference modes for consumer-grade GPUs (16GB/24GB VRAM).
Enables precise, independent control over multiple subjects within a single image.
Includes a benchmark dataset (XVerseBench) for evaluation.

Maintenance & Community

The project is actively developed with recent updates in July 2025. A Hugging Face Space demo is available.

Licensing & Compatibility

The code is licensed under Apache 2.0. The dataset is CC0, but users must also comply with the license of dreambench++, from which it is adapted.

Limitations & Caveats

Quantized models may lead to performance degradation, requiring parameter re-adjustment. CPU offloading significantly reduces inference speed. The project is under active development, with features like a benchmark leaderboard and ComfyUI implementation still pending.

Health Check

Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days