DatasetDM by showlab

Research paper for synthesizing data with diffusion models and perception annotations

Created 2 years ago

323 stars

Top 84.3% on SourcePulse

Project Summary

DatasetDM provides official code for synthesizing high-quality perception data with annotations using diffusion models, targeting researchers and practitioners in computer vision. It enables the generation of diverse datasets for tasks like instance segmentation, semantic segmentation, and depth estimation, significantly enhancing model training with synthetic data.

How It Works

DatasetDM leverages diffusion models, specifically Stable Diffusion 1.4, to generate synthetic images. It incorporates a P-Decoder for generating segmentation masks and utilizes GPT-4 to enhance prompt diversity, leading to more varied and realistic synthetic data. This approach allows for targeted data generation for specific tasks and datasets, improving the efficiency and effectiveness of data augmentation.

Quick Start & Requirements

Installation: Create a conda environment (conda create -n DatasetDM python=3.8), install PyTorch 1.9.1 with CUDA 11.1 (pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html), and then install other requirements (python -m pip install -r requirements.txt).
Prerequisites: PyTorch 1.9.1, CUDA 11.1, Python 3.8. Requires downloading Stable Diffusion 1.4 weights (approx. 4.5GB) and placing them in ./dataset/ckpts. A specific version of diffusers (0.3.0) is recommended, or using the modified version in ./model/diffusers.
Dataset Preparation: Requires specific directory structures for VOC, Cityscapes, COCO2017, VirtualKITTI2, NYU-Depth-V2, KITTI, and DeepFashion-MM datasets, along with corresponding prompt text files.
Links: Project Website (not provided), Paper (NeurIPS2023), Google Drive for weights.

Highlighted Details

Supports Instance Segmentation (COCO2017), Semantic Segmentation (VOC, Cityscapes), Depth Estimation, Open Pose, DeepFashion Segmentation, Open Segmentation, and Long-tail Segmentation.
Utilizes GPT-4 for enhanced prompt diversity in data generation.
Provides scripts for both training P-decoders and generating synthetic data for various tasks and datasets.
Includes data augmentation techniques like image splicing.

Maintenance & Community

The project is associated with NeurIPS 2023. No specific community links (Discord, Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project notes potential errors due to Diffuser version updates and recommends a specific older version (0.3.0) or using their modified code. The release of code was initially planned within three months of September 2023.

DatasetDM by showlab

Explore Similar Projects

awesome-synthetic-datasets by davanstrien

Segment-Any-Point-Cloud by youquanl

minimal-diffusion by VSehwag

RADIO by NVlabs

RadFM by chaoyi-wu

concept-graphs by concept-graphs

Make-It-3D by junshutang

synthetic-computer-vision by unrealcv

Palette-Image-to-Image-Diffusion-Models by Janspiry

torchio by TorchIO-project

X-AnyLabeling by CVHub520

guided-diffusion by openai