UNO by bytedance

Subject-to-image model for single/multi-subject customization

Created 9 months ago

1,343 stars

Top 29.7% on SourcePulse

Project Summary

UNO is a diffusion model for subject-driven image generation, enabling high-consistency results for both single and multi-subject conditioning. It targets researchers and developers in generative AI seeking advanced controllability in image synthesis.

How It Works

UNO is an iteratively trained, multi-image conditioned subject-to-image model. It leverages diffusion transformers for in-context generation and incorporates progressive cross-modal alignment and universal rotary position embeddings. This approach allows for high-consistency data synthesis and enhanced controllability, outperforming traditional text-to-image models in subject-specific generation.

Quick Start & Requirements

Installation: pip install -r requirements.txt followed by pip install -e . (for inference) or pip install -e .[train] (for training).
Prerequisites: Python >= 3.10 <= 3.12. Requires specific PyTorch versions for AMD GPUs, NVIDIA RTX 50 series, or macOS MPS.
Checkpoints: Approximately 37 GB of disk space required. Checkpoints can be downloaded automatically during inference or manually via huggingface-cli download.
Demo: Run python app.py. For low VRAM usage (≈16GB), use python app.py --offload --name flux-dev-fp8.
Inference: python inference.py --prompt "..." --image_paths "..."
Documentation: Project Page

Highlighted Details

Supports low VRAM usage (≈16GB) with FP8 mode and offloading.
Capable of handling various aspect ratios and resolutions beyond its 512 training buckets.
Offers both single-subject and multi-subject generation within a unified model.
Includes example inference scripts and evaluation on the Dreambooth benchmark.

Maintenance & Community

The project is actively developed by ByteDance's Intelligent Creation Team. Updates include FP8 mode support, a Gradio demo, and the release of training code, inference code, and model checkpoints. Community contributions include several ComfyUI node implementations.

Licensing & Compatibility

Code License: Apache 2.0
Model License: CC BY-NC 4.0 (Non-commercial use)
Base Model License: Must adhere to original FLUX.1-dev licensing terms.
Compatibility: Suitable for academic research. Commercial use is restricted by the CC BY-NC 4.0 license for the models.

Limitations & Caveats

UNO exhibits room for improvement in generalization due to dataset constraints. The CC BY-NC 4.0 license restricts commercial use of the models.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days