Research paper implementation for tabular data generation via diffusion models
Top 65.5% on sourcepulse
This repository provides the official implementation for "TabDDPM: Modelling Tabular Data with Diffusion Models," an ICML 2023 paper. It offers a diffusion model-based approach for generating synthetic tabular data, targeting researchers and practitioners in machine learning and data science who need robust synthetic data generation capabilities. The benefit lies in leveraging diffusion models for potentially higher quality and more diverse synthetic tabular data compared to traditional methods.
How It Works
TabDDPM utilizes a diffusion probabilistic model adapted for tabular data. The core idea involves a forward diffusion process that gradually adds noise to the tabular data and a reverse denoising process that learns to remove this noise, thereby generating new data samples. This approach aims to capture complex data distributions and dependencies within tabular datasets more effectively than GAN-based or VAE-based methods.
Quick Start & Requirements
conda create -n tddpm python=3.9.7
conda activate tddpm
pip install torch==1.10.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The repository requires a specific older version of PyTorch (1.10.1) with CUDA 11.1, which may pose compatibility challenges with newer hardware or software stacks. The lack of explicit licensing information could be a concern for commercial use.
1 year ago
1 day