deep-significance by Kaleidophon

SDK for statistical significance testing of deep neural networks

Created 4 years ago

338 stars

Top 81.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Sebastian Raschka

Author of "Build a Large Language Model (From Scratch)"

Project Summary

This library provides statistical significance testing for deep neural networks, addressing the common issue of drawing conclusions from single performance scores rather than rigorous statistical analysis. It is targeted at machine learning practitioners and researchers who need to reliably compare model performance, offering methods to mitigate the impact of stochastic factors and hyperparameter sensitivity inherent in deep learning.

How It Works

The core of the library is the "Almost Stochastic Order" (ASO) test, which compares score distributions without assuming specific data distributions. Unlike p-value based tests, ASO quantifies the violation of stochastic dominance between two distributions, providing a score ($\epsilon_\text{min}$) that indicates superiority. Lower $\epsilon_\text{min}$ values suggest higher confidence in one model's performance over another. The library also includes traditional bootstrap and permutation tests, along with Bonferroni correction for multiple comparisons and bootstrap power analysis for sample size determination.

Quick Start & Requirements

Installation: pip3 install deepsig
Dependencies: Python, NumPy, PyTorch, TensorFlow, Jax (for tensor compatibility).
Usage: The aso() function is the primary interface for comparing two sets of scores. multi_aso() handles comparisons across multiple models.

Highlighted Details

Implements "Almost Stochastic Order" (ASO) for robust model comparison, addressing non-convex loss landscapes and stochastic factors in deep learning.
Supports PyTorch, TensorFlow, Jax, and NumPy arrays directly.
Includes functions for Bonferroni correction, bootstrap power analysis, and uncertainty reduction estimation.
Offers multi-threading support via joblib for faster computations.

Maintenance & Community

The project is actively maintained, with contributions from the NLPnorth group at IT University Copenhagen. The README links to several papers that have utilized the library, indicating ongoing adoption and research use.

Licensing & Compatibility

The library is released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The README emphasizes that conclusions drawn from significance tests are only as reliable as the number of scores collected. It also notes that while ASO is generally preferred over traditional tests in deep learning, the choice of $\epsilon_\text{min}$ threshold can impact Type I error rates, with a recommendation of $\tau < 0.2$ for more confidence.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days