nba-prediction  by cmunch1

NBA game win probability predictor app

created 2 years ago
261 stars

Top 98.0% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an end-to-end machine learning deployment for predicting NBA game win probabilities, targeting individuals seeking to demonstrate ML deployment skills or explore sports analytics. It offers a daily prediction service via a Streamlit app, aiming to improve betting strategy profitability.

How It Works

The system leverages historical NBA game data, primarily from games.csv, to train gradient boosted tree models (XGBoost and LightGBM). Key features are engineered as rolling statistics (e.g., average points, win streaks) for teams, calculated from previous games to avoid data leakage. Model probabilities are calibrated using sklearn.CalibratedClassifierCV to align with true probabilities. A daily pipeline, orchestrated by GitHub Actions, scrapes new game data, updates features, and retrains the model, with predictions served through a Streamlit application.

Quick Start & Requirements

  • Install/Run: The project is primarily notebook-driven. Key components can be run via Python scripts or Jupyter notebooks. Deployment is via Streamlit Cloud.
  • Prerequisites: Python 3.x, Pandas, XGBoost, LightGBM, Scikit-learn, Optuna, Neptune.ai, Selenium/ScrapingAnt, BeautifulSoup, Streamlit. A ScrapingAnt account (free tier available) is recommended for production data scraping due to anti-bot measures on NBA.com.
  • Resources: Requires access to historical NBA data (Kaggle provided, plus live scraping). Experiment tracking via Neptune.ai requires an account.
  • Links: Project Repository: https://github.com/cmunch1/nba-prediction, Live App: https://cmunch1-nba-prediction.streamlit.app/

Highlighted Details

  • Demonstrates end-to-end ML deployment workflow, including data scraping, feature engineering, model training, calibration, and web app deployment.
  • Utilizes Shapley values for adversarial validation to assess train/test data distribution similarity.
  • Features automated daily data scraping and model retraining via GitHub Actions.
  • Includes probability calibration to ensure model outputs reflect true win probabilities.

Maintenance & Community

The project is a personal portfolio piece. The author notes a temporary removal of Hopsworks feature store and model registry due to stability concerns. Feedback is actively sought via LinkedIn and Twitter.

Licensing & Compatibility

The repository does not explicitly state a license. Standard GitHub repository practices apply. Commercial use would require clarification on licensing terms.

Limitations & Caveats

The project is described as a "work in progress" with ongoing iterations. While the model achieved 61.5% accuracy on the 2022-2023 regular season, it lags behind top public models (65.6%). The author acknowledges that a real-world betting strategy is significantly more complex than this model alone.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.