synthcity  by vanderschaarlab

Library for generating/evaluating synthetic tabular data for privacy, fairness, augmentation

created 3 years ago
576 stars

Top 56.9% on sourcepulse

GitHubView on GitHub
Project Summary

Synthcity is a comprehensive Python library for generating and evaluating synthetic tabular, time-series, survival, and image data. It targets researchers and practitioners needing to augment datasets, ensure privacy, or test fairness, offering a wide array of state-of-the-art generative models and robust evaluation metrics.

How It Works

Synthcity employs a plugin-based architecture, allowing easy integration of diverse generative models including GANs (AdsGAN, CTGAN, PATEGAN), VAEs (TVAE, RTVAE), Normalizing Flows, Bayesian Networks, and LLM-based models (GReaT). It supports specialized generators for time-series and survival data, alongside privacy-focused methods like DECAF and DP-GAN. The library also provides a rich suite of evaluation metrics for data quality, privacy, and fairness.

Quick Start & Requirements

Highlighted Details

  • Supports tabular, time-series, survival analysis, and image data generation.
  • Integrates numerous generative models, including GANs, VAEs, Normalizing Flows, Bayesian Networks, and LLM-based models.
  • Offers extensive evaluation metrics for data quality, privacy (k-anonymity, l-diversity), and fairness.
  • Includes specialized generators for time-series (TimeGAN, FourierFlows) and survival analysis (SurvivalGAN).

Maintenance & Community

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

  • Does not handle missing data; imputation is required beforehand (e.g., using HyperImpute).
  • Image generation architectures are noted as not state-of-the-art.
Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
34 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Luca Antiga Luca Antiga(CTO of Lightning AI).

mmagic by open-mmlab

0.1%
7k
AIGC toolbox for image/video editing and generation
created 6 years ago
updated 1 year ago
Feedback? Help us improve.