deep-significance  by Kaleidophon

SDK for statistical significance testing of deep neural networks

created 4 years ago
336 stars

Top 83.0% on sourcepulse

GitHubView on GitHub
Project Summary

This library provides statistical significance testing for deep neural networks, addressing the common issue of drawing conclusions from single performance scores rather than rigorous statistical analysis. It is targeted at machine learning practitioners and researchers who need to reliably compare model performance, offering methods to mitigate the impact of stochastic factors and hyperparameter sensitivity inherent in deep learning.

How It Works

The core of the library is the "Almost Stochastic Order" (ASO) test, which compares score distributions without assuming specific data distributions. Unlike p-value based tests, ASO quantifies the violation of stochastic dominance between two distributions, providing a score ($\epsilon_\text{min}$) that indicates superiority. Lower $\epsilon_\text{min}$ values suggest higher confidence in one model's performance over another. The library also includes traditional bootstrap and permutation tests, along with Bonferroni correction for multiple comparisons and bootstrap power analysis for sample size determination.

Quick Start & Requirements

  • Installation: pip3 install deepsig
  • Dependencies: Python, NumPy, PyTorch, TensorFlow, Jax (for tensor compatibility).
  • Usage: The aso() function is the primary interface for comparing two sets of scores. multi_aso() handles comparisons across multiple models.

Highlighted Details

  • Implements "Almost Stochastic Order" (ASO) for robust model comparison, addressing non-convex loss landscapes and stochastic factors in deep learning.
  • Supports PyTorch, TensorFlow, Jax, and NumPy arrays directly.
  • Includes functions for Bonferroni correction, bootstrap power analysis, and uncertainty reduction estimation.
  • Offers multi-threading support via joblib for faster computations.

Maintenance & Community

The project is actively maintained, with contributions from the NLPnorth group at IT University Copenhagen. The README links to several papers that have utilized the library, indicating ongoing adoption and research use.

Licensing & Compatibility

The library is released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The README emphasizes that conclusions drawn from significance tests are only as reliable as the number of scores collected. It also notes that while ASO is generally preferred over traditional tests in deep learning, the choice of $\epsilon_\text{min}$ threshold can impact Type I error rates, with a recommendation of $\tau < 0.2$ for more confidence.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.