UNO  by bytedance

Subject-to-image model for single/multi-subject customization

created 4 months ago
1,178 stars

Top 33.7% on sourcepulse

GitHubView on GitHub
Project Summary

UNO is a diffusion model for subject-driven image generation, enabling high-consistency results for both single and multi-subject conditioning. It targets researchers and developers in generative AI seeking advanced controllability in image synthesis.

How It Works

UNO is an iteratively trained, multi-image conditioned subject-to-image model. It leverages diffusion transformers for in-context generation and incorporates progressive cross-modal alignment and universal rotary position embeddings. This approach allows for high-consistency data synthesis and enhanced controllability, outperforming traditional text-to-image models in subject-specific generation.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt followed by pip install -e . (for inference) or pip install -e .[train] (for training).
  • Prerequisites: Python >= 3.10 <= 3.12. Requires specific PyTorch versions for AMD GPUs, NVIDIA RTX 50 series, or macOS MPS.
  • Checkpoints: Approximately 37 GB of disk space required. Checkpoints can be downloaded automatically during inference or manually via huggingface-cli download.
  • Demo: Run python app.py. For low VRAM usage (≈16GB), use python app.py --offload --name flux-dev-fp8.
  • Inference: python inference.py --prompt "..." --image_paths "..."
  • Documentation: Project Page

Highlighted Details

  • Supports low VRAM usage (≈16GB) with FP8 mode and offloading.
  • Capable of handling various aspect ratios and resolutions beyond its 512 training buckets.
  • Offers both single-subject and multi-subject generation within a unified model.
  • Includes example inference scripts and evaluation on the Dreambooth benchmark.

Maintenance & Community

The project is actively developed by ByteDance's Intelligent Creation Team. Updates include FP8 mode support, a Gradio demo, and the release of training code, inference code, and model checkpoints. Community contributions include several ComfyUI node implementations.

Licensing & Compatibility

  • Code License: Apache 2.0
  • Model License: CC BY-NC 4.0 (Non-commercial use)
  • Base Model License: Must adhere to original FLUX.1-dev licensing terms.
  • Compatibility: Suitable for academic research. Commercial use is restricted by the CC BY-NC 4.0 license for the models.

Limitations & Caveats

UNO exhibits room for improvement in generalization due to dataset constraints. The CC BY-NC 4.0 license restricts commercial use of the models.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
5
Star History
235 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
3 more.

guided-diffusion by openai

0.2%
7k
Image synthesis codebase for diffusion models
created 4 years ago
updated 1 year ago
Feedback? Help us improve.