GPBoost  by fabsig

Tree-boosting combined with Gaussian process and mixed effects models

created 5 years ago
623 stars

Top 53.8% on sourcepulse

GitHubView on GitHub
Project Summary

GPBoost combines tree-boosting with Gaussian process and mixed effects models to improve prediction accuracy and model flexibility. It targets data scientists and researchers working with tabular data who need to incorporate non-linearities, complex interactions, and spatial or grouped dependencies, offering a unified framework that bridges the gap between traditional boosting and latent Gaussian models.

How It Works

GPBoost models the response variable as a sum of a non-linear function (an ensemble of trees) and latent Gaussian effects (Gaussian processes or grouped random effects). This approach leverages the high predictive power of tree-boosting for fixed effects while incorporating the dependency structures and uncertainty quantification capabilities of Gaussian processes and mixed effects models. The algorithms iteratively learn covariance parameters and update the tree ensemble using gradient and Newton boosting steps.

Quick Start & Requirements

  • Installation: pip install gpboost or R CMD INSTALL gpboost_0.1.0.tar.gz (from source).
  • Prerequisites: Python 3.6+ or R 3.3+.
  • Resources: No specific hardware requirements mentioned, but performance will scale with data size and model complexity.
  • Documentation: https://gpboost.readthedocs.io

Highlighted Details

  • Combines non-parametric tree-boosting with Gaussian processes and grouped random effects.
  • Handles non-Gaussian likelihoods via the LaGaBoost algorithm.
  • Offers improved prediction accuracy over independent boosting or LMMs/GPs alone.
  • Supports modeling of high-cardinality categorical variables and spatial/spatio-temporal data.

Maintenance & Community

The project is primarily developed by Fabio Sigrist. Companion articles were published in JMLR and TPAMI in October 2022. Open issues on GitHub include requests for ONNX conversion, multivariate models, areal models, multiclass classification, sample weights, and GPU support for GPs.

Licensing & Compatibility

Licensed under the Apache License 2.0. This permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is under active development, with several features listed as "open issues" or methodological improvements, including GPU support, multivariate models, and specific spatial models like CAR/SAR.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
1
Star History
24 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
2 more.

hummingbird by microsoft

0.0%
3k
Compiler for trained ML models into tensor computation
created 5 years ago
updated 2 weeks ago
Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
5 more.

gpt-neo by EleutherAI

0.0%
8k
GPT-2/3-style model implementation using mesh-tensorflow
created 5 years ago
updated 3 years ago
Feedback? Help us improve.