Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch  by energy-based-model

Research paper for compositional generation using diffusion models

created 3 years ago
477 stars

Top 64.9% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a PyTorch implementation for compositional visual generation using diffusion models, enabling users to combine concepts via conjunction (AND) and negation (NOT) operators. It targets researchers and developers working with conditional diffusion models like Stable Diffusion and Point-E, offering enhanced control over generated outputs.

How It Works

The core innovation lies in applying logical operators to conditioning signals within diffusion models. By manipulating prompt weights and leveraging negative prompts, the system guides the generation process to adhere to complex, combined descriptions, achieving more precise and controllable visual synthesis.

Quick Start & Requirements

  • Install: Clone the repository and run pip install -e ., pip install diffusers==0.10.2, pip install open3d==0.16.0.
  • Prerequisites: Python 3.8+, Conda environment, PyTorch.
  • Resources: Inference demos available via Google Colab. Training requires datasets (CLEVR Objects/Relations, MS-COCO) which can be auto-downloaded or manually obtained via provided Dropbox links.
  • Links: Project Page, Paper, Google Colab, Huggingface

Highlighted Details

  • Supports compositional generation with Stable Diffusion, Point-E, and GLIDE.
  • Integrates with stable-diffusion-webui-conjunction.
  • Provides both training and inference code.
  • Demonstrates results for both 2D images and 3D meshes.

Maintenance & Community

The project is associated with ECCV 2022 and MIT CSAIL. Updates in late 2022 added support for Point-E and newer Stable Diffusion versions. Discussions are available on Reddit.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, its dependencies (e.g., diffusers) are typically under permissive licenses like MIT. Users should verify licensing for commercial use.

Limitations & Caveats

The codebase is built upon older versions of GLIDE and Improved-Diffusion, which may require specific dependency versions. The provided inference examples use specific diffusers and open3d versions, suggesting potential compatibility issues with newer releases.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.