DatasetDM  by showlab

Research paper for synthesizing data with diffusion models and perception annotations

created 2 years ago
318 stars

Top 86.3% on sourcepulse

GitHubView on GitHub
Project Summary

DatasetDM provides official code for synthesizing high-quality perception data with annotations using diffusion models, targeting researchers and practitioners in computer vision. It enables the generation of diverse datasets for tasks like instance segmentation, semantic segmentation, and depth estimation, significantly enhancing model training with synthetic data.

How It Works

DatasetDM leverages diffusion models, specifically Stable Diffusion 1.4, to generate synthetic images. It incorporates a P-Decoder for generating segmentation masks and utilizes GPT-4 to enhance prompt diversity, leading to more varied and realistic synthetic data. This approach allows for targeted data generation for specific tasks and datasets, improving the efficiency and effectiveness of data augmentation.

Quick Start & Requirements

  • Installation: Create a conda environment (conda create -n DatasetDM python=3.8), install PyTorch 1.9.1 with CUDA 11.1 (pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html), and then install other requirements (python -m pip install -r requirements.txt).
  • Prerequisites: PyTorch 1.9.1, CUDA 11.1, Python 3.8. Requires downloading Stable Diffusion 1.4 weights (approx. 4.5GB) and placing them in ./dataset/ckpts. A specific version of diffusers (0.3.0) is recommended, or using the modified version in ./model/diffusers.
  • Dataset Preparation: Requires specific directory structures for VOC, Cityscapes, COCO2017, VirtualKITTI2, NYU-Depth-V2, KITTI, and DeepFashion-MM datasets, along with corresponding prompt text files.
  • Links: Project Website (not provided), Paper (NeurIPS2023), Google Drive for weights.

Highlighted Details

  • Supports Instance Segmentation (COCO2017), Semantic Segmentation (VOC, Cityscapes), Depth Estimation, Open Pose, DeepFashion Segmentation, Open Segmentation, and Long-tail Segmentation.
  • Utilizes GPT-4 for enhanced prompt diversity in data generation.
  • Provides scripts for both training P-decoders and generating synthetic data for various tasks and datasets.
  • Includes data augmentation techniques like image splicing.

Maintenance & Community

The project is associated with NeurIPS 2023. No specific community links (Discord, Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project notes potential errors due to Diffuser version updates and recommends a specific older version (0.3.0) or using their modified code. The release of code was initially planned within three months of September 2023.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.