mljar-supervised by mljar

Automated ML pipelines for tabular data

Created 7 years ago

3,273 stars

Top 14.2% on SourcePulse

View on GitHub

5 Experts Love This Project

Jared Palmer

SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX

Jeff Hammerbacher

Cofounder of Cloudera

Luis Capelo

Cofounder of Lightning AI

Casper Hansen

Author of AutoAWQ

and 1 more!

Project Summary

Automated Machine Learning (AutoML) for tabular data, mljar-supervised aims to significantly reduce the time data scientists spend on repetitive tasks like data preprocessing, model selection, hyperparameter tuning, and report generation. It provides a comprehensive framework that abstracts these complexities, enabling users to build, understand, and deploy ML models more efficiently.

How It Works

This Python package employs a multi-modal approach, offering distinct modes (Explain, Perform, Compete, Optuna) tailored to different user needs. It integrates a wide array of algorithms, including tree-based models (Random Forest, LightGBM, XGBoost), linear models, and neural networks. Core functionalities include advanced feature engineering (e.g., Golden Features, text/time transformations), sophisticated hyperparameter optimization via random search with hill climbing or the Optuna framework, and robust ensembling techniques like greedy algorithms and stacking. A key differentiator is its deep focus on model explainability, providing detailed insights through decision tree visualizations, SHAP values, and permutation importance, all automatically compiled into comprehensive Markdown reports.

Quick Start & Requirements

Primary install: pip install mljar-supervised
Prerequisites: Python >=3.9, NumPy >=2.0,<3.
Documentation: https://supervised.mljar.com/
Source Code: https://github.com/mljar/mljar-supervised

Highlighted Details

Supports a diverse set of algorithms: Baseline, Linear, Random Forest, Extra Trees, LightGBM, XGBoost, CatBoost, Neural Networks, and Nearest Neighbors.
Comprehensive feature engineering capabilities, including imputation, categorical encoding, Golden Features, and text/time transformations.
Extensive explainability features: automatic generation of SHAP plots, permutation importance, and visualization of simple decision trees.
Automatic generation of detailed Markdown reports for each AutoML experiment, including model performance, metrics, and visualizations.
Fairness-aware training (v1.0.0+) allows optimization with sensitive features and includes bias mitigation techniques.

Maintenance & Community

The project is developed by MLJAR. Specific details regarding active contributors, community channels (like Discord/Slack), or sponsorships are not explicitly detailed in the provided README.

Licensing & Compatibility

License: MIT License.
Compatibility: The MIT license permits commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The README does not explicitly list known limitations or alpha status. For the Optuna mode, it's noted that only the best model is saved after tuning, not intermediate models explored during the search.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days