Versatile diffusion model for bidirectional image synthesis and understanding (CVPR 2025 paper)
Top 52.4% on sourcepulse
OneDiffusion is a versatile diffusion model designed for large-scale, bidirectional image synthesis and understanding across diverse tasks. It targets researchers and practitioners in computer vision who need a unified framework for tasks like text-to-image generation, image editing, and multiview synthesis, offering a single model capable of handling multiple modalities and operations.
How It Works
OneDiffusion leverages a unified diffusion architecture that supports various conditional inputs and outputs. It employs a flexible prompt-based interface, allowing users to specify tasks and conditions using natural language and image inputs. The model's strength lies in its ability to perform zero-shot task combinations by integrating different task tokens and conditioning information, enabling novel applications without task-specific fine-tuning.
Quick Start & Requirements
conda create -n onediffusion_env python=3.8
conda activate onediffusion_env
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu118
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
pip install -r requirements.txt
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Performance on zero-shot task combinations may not be robust, and prompt/caption order can affect behavior. Fine-tuning is recommended for combined tasks to improve performance and simplify usage.
7 months ago
1 day