Research paper for compositional generation using diffusion models
Top 64.9% on sourcepulse
This project provides a PyTorch implementation for compositional visual generation using diffusion models, enabling users to combine concepts via conjunction (AND) and negation (NOT) operators. It targets researchers and developers working with conditional diffusion models like Stable Diffusion and Point-E, offering enhanced control over generated outputs.
How It Works
The core innovation lies in applying logical operators to conditioning signals within diffusion models. By manipulating prompt weights and leveraging negative prompts, the system guides the generation process to adhere to complex, combined descriptions, achieving more precise and controllable visual synthesis.
Quick Start & Requirements
pip install -e .
, pip install diffusers==0.10.2
, pip install open3d==0.16.0
.Highlighted Details
stable-diffusion-webui-conjunction
.Maintenance & Community
The project is associated with ECCV 2022 and MIT CSAIL. Updates in late 2022 added support for Point-E and newer Stable Diffusion versions. Discussions are available on Reddit.
Licensing & Compatibility
The repository does not explicitly state a license in the README. However, its dependencies (e.g., diffusers
) are typically under permissive licenses like MIT. Users should verify licensing for commercial use.
Limitations & Caveats
The codebase is built upon older versions of GLIDE and Improved-Diffusion, which may require specific dependency versions. The provided inference examples use specific diffusers
and open3d
versions, suggesting potential compatibility issues with newer releases.
3 months ago
1 day