synthetic-credit-default-syncora  by syncora-ai

Synthetic financial data for credit risk modeling

Created 1 month ago
1,283 stars

Top 31.0% on SourcePulse

GitHubView on GitHub
Project Summary

This dataset provides a high-fidelity synthetic version of the UCI Credit Card Default dataset, tailored for credit risk modeling, machine learning classification, and explainable AI in the financial sector. It's designed for AI engineers, developers, and financial data scientists seeking realistic, privacy-safe data for training AI models without regulatory concerns.

How It Works

The dataset is generated using Syncora.ai, a platform for creating privacy-safe synthetic data. It models real-world financial behavior from Taiwan, preserving statistical realism and feature relationships from the original UCI dataset. This approach ensures 0% privacy leakage while maintaining high utility for machine learning tasks, making it suitable for sensitive financial applications.

Quick Start & Requirements

  • Data can be accessed directly from the repository.
  • No specific software prerequisites are mentioned beyond standard data science libraries (e.g., Pandas, Scikit-learn) for analysis.

Highlighted Details

  • High similarity to real-world data distributions.
  • 0% privacy leakage, 100% synthetic.
  • Preserves feature relationships for ML-readiness.
  • Suitable for binary classification, feature engineering, XAI, and model benchmarking.

Maintenance & Community

  • Developed by Syncora.ai.
  • No specific community links or roadmap are provided in the README.

Licensing & Compatibility

  • The dataset is 100% synthetic and safe for public use in education, research, open-source contributions, and AI development.
  • No specific license is mentioned, but it's described as "safe for public use."

Limitations & Caveats

The dataset is based on a 2005 UCI dataset, and while synthetic, its direct applicability to current, rapidly evolving financial markets may require validation. The README does not detail the specific generation parameters or validation metrics used by Syncora.ai.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Didier Lopes Didier Lopes(Founder of OpenBB), and
5 more.

qlib by microsoft

1.4%
31k
AI platform for quantitative investment research and production
Created 5 years ago
Updated 3 days ago
Feedback? Help us improve.