OneDiffusion  by lehduong

Versatile diffusion model for bidirectional image synthesis and understanding (CVPR 2025 paper)

Created 9 months ago
651 stars

Top 51.2% on SourcePulse

GitHubView on GitHub
Project Summary

OneDiffusion is a versatile diffusion model designed for large-scale, bidirectional image synthesis and understanding across diverse tasks. It targets researchers and practitioners in computer vision who need a unified framework for tasks like text-to-image generation, image editing, and multiview synthesis, offering a single model capable of handling multiple modalities and operations.

How It Works

OneDiffusion leverages a unified diffusion architecture that supports various conditional inputs and outputs. It employs a flexible prompt-based interface, allowing users to specify tasks and conditions using natural language and image inputs. The model's strength lies in its ability to perform zero-shot task combinations by integrating different task tokens and conditioning information, enabling novel applications without task-specific fine-tuning.

Quick Start & Requirements

  • Installation: Use Conda to create an environment and install dependencies:
    conda create -n onediffusion_env python=3.8
    conda activate onediffusion_env
    pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu118
    pip install "git+https://github.com/facebookresearch/pytorch3d.git"
    pip install -r requirements.txt
    
  • Prerequisites: CUDA 11.8, Python 3.8.
  • Demo: Requires a GPU with at least 21GB VRAM (Molmo captioner), 27GB (LLaVA), or 12GB (manual captioning).
  • Resources: Official Huggingface space available for demo.

Highlighted Details

  • Supports text-to-image, ID customization, multiview generation, condition-to-image, subject-driven generation, and text-guided image editing.
  • Achieves subject-driven generation after fine-tuning on Subject-200K and OmniEdit datasets.
  • Demonstrates zero-shot task combinations, though robustness may vary.

Maintenance & Community

  • Official repository for the CVPR 2025 paper "One Diffusion to Generate Them All".
  • Huggingface space released for demo.

Licensing & Compatibility

  • Model weights are released under a CC BY-NC license due to training on non-commercially licensed datasets.
  • Not suitable for commercial use.

Limitations & Caveats

Performance on zero-shot task combinations may not be robust, and prompt/caption order can affect behavior. Fine-tuning is recommended for combined tasks to improve performance and simplify usage.

Health Check
Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.