Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch by energy-based-model

Research paper for compositional generation using diffusion models

Created 3 years ago

484 stars

Top 63.5% on SourcePulse

Project Summary

This project provides a PyTorch implementation for compositional visual generation using diffusion models, enabling users to combine concepts via conjunction (AND) and negation (NOT) operators. It targets researchers and developers working with conditional diffusion models like Stable Diffusion and Point-E, offering enhanced control over generated outputs.

How It Works

The core innovation lies in applying logical operators to conditioning signals within diffusion models. By manipulating prompt weights and leveraging negative prompts, the system guides the generation process to adhere to complex, combined descriptions, achieving more precise and controllable visual synthesis.

Quick Start & Requirements

Install: Clone the repository and run pip install -e ., pip install diffusers==0.10.2, pip install open3d==0.16.0.
Prerequisites: Python 3.8+, Conda environment, PyTorch.
Resources: Inference demos available via Google Colab. Training requires datasets (CLEVR Objects/Relations, MS-COCO) which can be auto-downloaded or manually obtained via provided Dropbox links.
Links: Project Page, Paper, Google Colab, Huggingface

Highlighted Details

Supports compositional generation with Stable Diffusion, Point-E, and GLIDE.
Integrates with stable-diffusion-webui-conjunction.
Provides both training and inference code.
Demonstrates results for both 2D images and 3D meshes.

Maintenance & Community

The project is associated with ECCV 2022 and MIT CSAIL. Updates in late 2022 added support for Point-E and newer Stable Diffusion versions. Discussions are available on Reddit.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, its dependencies (e.g., diffusers) are typically under permissive licenses like MIT. Users should verify licensing for commercial use.

Limitations & Caveats

The codebase is built upon older versions of GLIDE and Improved-Diffusion, which may require specific dependency versions. The provided inference examples use specific diffusers and open3d versions, suggesting potential compatibility issues with newer releases.

Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch by energy-based-model

Explore Similar Projects

T2I-CompBench by Karine-Huang

Awesome-Controllable-T2I-Diffusion-Models by PRIV-Creation

Wuerstchen by dome272

SkyPaint-AI-Diffusion by SkyWorkAIGC

minimal-text-diffusion by madaan

CogView4 by zai-org

TF-ICON by Shilin-LU

pytorch-stable-diffusion by hkproj

LLaDA by ML-GSAI

stable-diffusion by pesser

DALLE2-pytorch by lucidrains

latent-diffusion by CompVis