mljar-supervised  by mljar

Automated ML pipelines for tabular data

Created 7 years ago
3,250 stars

Top 14.6% on SourcePulse

GitHubView on GitHub
Project Summary

Automated Machine Learning (AutoML) for tabular data, mljar-supervised aims to significantly reduce the time data scientists spend on repetitive tasks like data preprocessing, model selection, hyperparameter tuning, and report generation. It provides a comprehensive framework that abstracts these complexities, enabling users to build, understand, and deploy ML models more efficiently.

How It Works

This Python package employs a multi-modal approach, offering distinct modes (Explain, Perform, Compete, Optuna) tailored to different user needs. It integrates a wide array of algorithms, including tree-based models (Random Forest, LightGBM, XGBoost), linear models, and neural networks. Core functionalities include advanced feature engineering (e.g., Golden Features, text/time transformations), sophisticated hyperparameter optimization via random search with hill climbing or the Optuna framework, and robust ensembling techniques like greedy algorithms and stacking. A key differentiator is its deep focus on model explainability, providing detailed insights through decision tree visualizations, SHAP values, and permutation importance, all automatically compiled into comprehensive Markdown reports.

Quick Start & Requirements

Highlighted Details

  • Supports a diverse set of algorithms: Baseline, Linear, Random Forest, Extra Trees, LightGBM, XGBoost, CatBoost, Neural Networks, and Nearest Neighbors.
  • Comprehensive feature engineering capabilities, including imputation, categorical encoding, Golden Features, and text/time transformations.
  • Extensive explainability features: automatic generation of SHAP plots, permutation importance, and visualization of simple decision trees.
  • Automatic generation of detailed Markdown reports for each AutoML experiment, including model performance, metrics, and visualizations.
  • Fairness-aware training (v1.0.0+) allows optimization with sensitive features and includes bias mitigation techniques.

Maintenance & Community

The project is developed by MLJAR. Specific details regarding active contributors, community channels (like Discord/Slack), or sponsorships are not explicitly detailed in the provided README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: The MIT license permits commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The README does not explicitly list known limitations or alpha status. For the Optuna mode, it's noted that only the best model is saved after tuning, not intermediate models explored during the search.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
5
Star History
7 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.