Discover and explore top open-source AI tools and projects—updated daily.
thu-mlUnified diffusion framework for multi-modal generation
Top 28.3% on SourcePulse
UniDiffuser is a unified diffusion framework designed to handle multiple data modalities (image, text) within a single model. It addresses the challenge of training separate diffusion models for marginal, conditional, and joint distributions by unifying them as a single noise prediction task. This approach benefits researchers and practitioners working with multi-modal generative AI who seek a versatile and efficient solution.
How It Works
UniDiffuser employs a Transformer-based architecture (U-ViT) to parameterize the diffusion model. The core innovation lies in perturbing data across all modalities simultaneously and inputting modality-specific timesteps. The model then predicts the noise for all perturbed modalities. This unified approach, leveraging a shared Transformer backbone, allows for efficient simultaneous learning of image, text, text-to-image, image-to-text, and joint image-text generation without requiring separate models or significant architectural modifications.
Quick Start & Requirements
conda to create an environment and pip to install dependencies. Key packages include torch, accelerate, transformers, clip, and optionally xformers and triton for performance.autoencoder_kl.pth, caption_decoder.pth, and uvit_v0.pth or uvit_v1.pth from Hugging Face and place them in a models directory.sample_multi_v1.py (or sample_multi_v0.py) with specified modes like t2i, i2t, joint, etc.UniDiffuserPipeline in the diffusers library.Highlighted Details
Maintenance & Community
diffusers library suggests active community support and adoption.Licensing & Compatibility
Limitations & Caveats
The README does not specify any explicit limitations or known bugs. However, the dependency on specific versions of libraries like torch and accelerate might pose compatibility challenges with newer versions. The project is based on research papers, implying it may be subject to ongoing development and potential breaking changes.
2 years ago
Inactive
zqiu24
afiaka87
YangLing0818
ai-forever
deep-floyd
Sygil-Dev