tsfresh  by blue-yonder

Python package for time series feature extraction

created 8 years ago
8,876 stars

Top 5.8% on sourcepulse

GitHubView on GitHub
Project Summary

tsfresh is a Python package designed for automatic feature extraction from time series data, targeting data scientists and researchers. It aims to reduce the time spent on manual feature engineering by extracting hundreds of descriptive features and then filtering out irrelevant ones using a statistically sound hypothesis testing framework, enabling more efficient model building and analysis.

How It Works

The package systematically extracts a wide array of features from time series, encompassing statistical, signal processing, and nonlinear dynamics measures. Its core innovation lies in a built-in, statistically rigorous feature selection mechanism based on hypothesis testing. This process identifies and retains features that are demonstrably relevant to the given regression or classification task, controlling the rate of irrelevant features.

Quick Start & Requirements

  • Install: pip install tsfresh
  • For matrixprofile compatibility: conda create --name tsfresh__py_3.8 python=3.8 && conda activate tsfresh__py_3.8 && pip install tsfresh[matrixprofile]
  • Docker image: docker pull nbraun/tsfresh
  • Documentation: http://tsfresh.readthedocs.io

Highlighted Details

  • Extracts hundreds of features, including basic statistics and complex nonlinear dynamics measures.
  • Employs a statistically controlled filtering procedure to remove irrelevant features.
  • Compatible with scikit-learn, pandas, and numpy.
  • Supports adding custom features.
  • Can run on local machines or clusters.

Maintenance & Community

The project has received funding from the German Federal Ministry of Education and Research. Contribution guidelines are available for those interested in expanding the library.

Licensing & Compatibility

The README does not explicitly state the license. However, the project's nature and typical open-source Python libraries suggest a permissive license, likely compatible with commercial use.

Limitations & Caveats

Reproducing features computed with older matrixprofile calculators requires a specific Python 3.8 environment. The README implies a focus on supervised learning tasks for the filtering mechanism, though an unsupervised anomaly detection paper is cited.

Health Check
Last commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
0
Star History
154 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.