Subject-to-image model for single/multi-subject customization
Top 33.7% on sourcepulse
UNO is a diffusion model for subject-driven image generation, enabling high-consistency results for both single and multi-subject conditioning. It targets researchers and developers in generative AI seeking advanced controllability in image synthesis.
How It Works
UNO is an iteratively trained, multi-image conditioned subject-to-image model. It leverages diffusion transformers for in-context generation and incorporates progressive cross-modal alignment and universal rotary position embeddings. This approach allows for high-consistency data synthesis and enhanced controllability, outperforming traditional text-to-image models in subject-specific generation.
Quick Start & Requirements
pip install -r requirements.txt
followed by pip install -e .
(for inference) or pip install -e .[train]
(for training).huggingface-cli download
.python app.py
. For low VRAM usage (≈16GB), use python app.py --offload --name flux-dev-fp8
.python inference.py --prompt "..." --image_paths "..."
Highlighted Details
Maintenance & Community
The project is actively developed by ByteDance's Intelligent Creation Team. Updates include FP8 mode support, a Gradio demo, and the release of training code, inference code, and model checkpoints. Community contributions include several ComfyUI node implementations.
Licensing & Compatibility
Limitations & Caveats
UNO exhibits room for improvement in generalization due to dataset constraints. The CC BY-NC 4.0 license restricts commercial use of the models.
3 months ago
1 day