Framework for generating high-quality structured tabular data
Top 19.8% on sourcepulse
This framework generates high-quality structured tabular synthetic data, suitable for data sharing, model training, and system testing, while preserving data characteristics without sensitive information. It targets data scientists and engineers needing privacy-preserving data solutions.
How It Works
SDG integrates multiple statistical and LLM-based synthesis algorithms, including CTGAN for billion-level data and a novel LLM model for zero-shot generation and off-table feature inference. A Data Processor module handles pre- and post-processing for various data types, null values, and custom transformations, enhancing data quality and model compatibility.
Quick Start & Requirements
docker pull idsteam/sdgx:latest
) or pip (pip install sdgx
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is actively developed, with features like LLM integration and advanced data processing being recent additions. Users should refer to the latest documentation and examples for optimal usage and potential evolving capabilities.
1 week ago
1 day