UNO  by bytedance

Subject-to-image model for single/multi-subject customization

Created 9 months ago
1,343 stars

Top 29.7% on SourcePulse

GitHubView on GitHub
Project Summary

UNO is a diffusion model for subject-driven image generation, enabling high-consistency results for both single and multi-subject conditioning. It targets researchers and developers in generative AI seeking advanced controllability in image synthesis.

How It Works

UNO is an iteratively trained, multi-image conditioned subject-to-image model. It leverages diffusion transformers for in-context generation and incorporates progressive cross-modal alignment and universal rotary position embeddings. This approach allows for high-consistency data synthesis and enhanced controllability, outperforming traditional text-to-image models in subject-specific generation.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt followed by pip install -e . (for inference) or pip install -e .[train] (for training).
  • Prerequisites: Python >= 3.10 <= 3.12. Requires specific PyTorch versions for AMD GPUs, NVIDIA RTX 50 series, or macOS MPS.
  • Checkpoints: Approximately 37 GB of disk space required. Checkpoints can be downloaded automatically during inference or manually via huggingface-cli download.
  • Demo: Run python app.py. For low VRAM usage (≈16GB), use python app.py --offload --name flux-dev-fp8.
  • Inference: python inference.py --prompt "..." --image_paths "..."
  • Documentation: Project Page

Highlighted Details

  • Supports low VRAM usage (≈16GB) with FP8 mode and offloading.
  • Capable of handling various aspect ratios and resolutions beyond its 512 training buckets.
  • Offers both single-subject and multi-subject generation within a unified model.
  • Includes example inference scripts and evaluation on the Dreambooth benchmark.

Maintenance & Community

The project is actively developed by ByteDance's Intelligent Creation Team. Updates include FP8 mode support, a Gradio demo, and the release of training code, inference code, and model checkpoints. Community contributions include several ComfyUI node implementations.

Licensing & Compatibility

  • Code License: Apache 2.0
  • Model License: CC BY-NC 4.0 (Non-commercial use)
  • Base Model License: Must adhere to original FLUX.1-dev licensing terms.
  • Compatibility: Suitable for academic research. Commercial use is restricted by the CC BY-NC 4.0 license for the models.

Limitations & Caveats

UNO exhibits room for improvement in generalization due to dataset constraints. The CC BY-NC 4.0 license restricts commercial use of the models.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elvis Saravia Elvis Saravia(Founder of DAIR.AI).

NExT-GPT by NExT-GPT

0.1%
4k
Any-to-any multimodal LLM research paper
Created 2 years ago
Updated 8 months ago
Feedback? Help us improve.