GPBoost  by fabsig

Tree-boosting combined with Gaussian process and mixed effects models

Created 5 years ago
634 stars

Top 52.3% on SourcePulse

GitHubView on GitHub
Project Summary

GPBoost combines tree-boosting with Gaussian process and mixed effects models to improve prediction accuracy and model flexibility. It targets data scientists and researchers working with tabular data who need to incorporate non-linearities, complex interactions, and spatial or grouped dependencies, offering a unified framework that bridges the gap between traditional boosting and latent Gaussian models.

How It Works

GPBoost models the response variable as a sum of a non-linear function (an ensemble of trees) and latent Gaussian effects (Gaussian processes or grouped random effects). This approach leverages the high predictive power of tree-boosting for fixed effects while incorporating the dependency structures and uncertainty quantification capabilities of Gaussian processes and mixed effects models. The algorithms iteratively learn covariance parameters and update the tree ensemble using gradient and Newton boosting steps.

Quick Start & Requirements

  • Installation: pip install gpboost or R CMD INSTALL gpboost_0.1.0.tar.gz (from source).
  • Prerequisites: Python 3.6+ or R 3.3+.
  • Resources: No specific hardware requirements mentioned, but performance will scale with data size and model complexity.
  • Documentation: https://gpboost.readthedocs.io

Highlighted Details

  • Combines non-parametric tree-boosting with Gaussian processes and grouped random effects.
  • Handles non-Gaussian likelihoods via the LaGaBoost algorithm.
  • Offers improved prediction accuracy over independent boosting or LMMs/GPs alone.
  • Supports modeling of high-cardinality categorical variables and spatial/spatio-temporal data.

Maintenance & Community

The project is primarily developed by Fabio Sigrist. Companion articles were published in JMLR and TPAMI in October 2022. Open issues on GitHub include requests for ONNX conversion, multivariate models, areal models, multiclass classification, sample weights, and GPU support for GPs.

Licensing & Compatibility

Licensed under the Apache License 2.0. This permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is under active development, with several features listed as "open issues" or methodological improvements, including GPU support, multivariate models, and specific spatial models like CAR/SAR.

Health Check
Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Travis Addair Travis Addair(Cofounder of Predibase), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
10 more.

hummingbird by microsoft

0.0%
3k
Compiler for trained ML models into tensor computation
Created 5 years ago
Updated 2 months ago
Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Omar Khattab Omar Khattab(Coauthor of DSPy, ColBERT; Professor at MIT), and
15 more.

gpt-neo by EleutherAI

0.0%
8k
GPT-2/3-style model implementation using mesh-tensorflow
Created 5 years ago
Updated 3 years ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
6 more.

numpy-ml by ddbourgin

0.1%
16k
ML algorithms implemented in NumPy
Created 6 years ago
Updated 1 year ago
Feedback? Help us improve.