synthcity  by vanderschaarlab

Library for generating/evaluating synthetic tabular data for privacy, fairness, augmentation

Created 3 years ago
593 stars

Top 54.9% on SourcePulse

GitHubView on GitHub
Project Summary

Synthcity is a comprehensive Python library for generating and evaluating synthetic tabular, time-series, survival, and image data. It targets researchers and practitioners needing to augment datasets, ensure privacy, or test fairness, offering a wide array of state-of-the-art generative models and robust evaluation metrics.

How It Works

Synthcity employs a plugin-based architecture, allowing easy integration of diverse generative models including GANs (AdsGAN, CTGAN, PATEGAN), VAEs (TVAE, RTVAE), Normalizing Flows, Bayesian Networks, and LLM-based models (GReaT). It supports specialized generators for time-series and survival data, alongside privacy-focused methods like DECAF and DP-GAN. The library also provides a rich suite of evaluation metrics for data quality, privacy, and fairness.

Quick Start & Requirements

Highlighted Details

  • Supports tabular, time-series, survival analysis, and image data generation.
  • Integrates numerous generative models, including GANs, VAEs, Normalizing Flows, Bayesian Networks, and LLM-based models.
  • Offers extensive evaluation metrics for data quality, privacy (k-anonymity, l-diversity), and fairness.
  • Includes specialized generators for time-series (TimeGAN, FourierFlows) and survival analysis (SurvivalGAN).

Maintenance & Community

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

  • Does not handle missing data; imputation is required beforehand (e.g., using HyperImpute).
  • Image generation architectures are noted as not state-of-the-art.
Health Check
Last Commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
16 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.