nba-prediction by cmunch1

NBA game win probability predictor app

Created 3 years ago

273 stars

Top 94.6% on SourcePulse

Project Summary

This project provides an end-to-end machine learning deployment for predicting NBA game win probabilities, targeting individuals seeking to demonstrate ML deployment skills or explore sports analytics. It offers a daily prediction service via a Streamlit app, aiming to improve betting strategy profitability.

How It Works

The system leverages historical NBA game data, primarily from games.csv, to train gradient boosted tree models (XGBoost and LightGBM). Key features are engineered as rolling statistics (e.g., average points, win streaks) for teams, calculated from previous games to avoid data leakage. Model probabilities are calibrated using sklearn.CalibratedClassifierCV to align with true probabilities. A daily pipeline, orchestrated by GitHub Actions, scrapes new game data, updates features, and retrains the model, with predictions served through a Streamlit application.

Quick Start & Requirements

Install/Run: The project is primarily notebook-driven. Key components can be run via Python scripts or Jupyter notebooks. Deployment is via Streamlit Cloud.
Prerequisites: Python 3.x, Pandas, XGBoost, LightGBM, Scikit-learn, Optuna, Neptune.ai, Selenium/ScrapingAnt, BeautifulSoup, Streamlit. A ScrapingAnt account (free tier available) is recommended for production data scraping due to anti-bot measures on NBA.com.
Resources: Requires access to historical NBA data (Kaggle provided, plus live scraping). Experiment tracking via Neptune.ai requires an account.
Links: Project Repository: https://github.com/cmunch1/nba-prediction, Live App: https://cmunch1-nba-prediction.streamlit.app/

Highlighted Details

Demonstrates end-to-end ML deployment workflow, including data scraping, feature engineering, model training, calibration, and web app deployment.
Utilizes Shapley values for adversarial validation to assess train/test data distribution similarity.
Features automated daily data scraping and model retraining via GitHub Actions.
Includes probability calibration to ensure model outputs reflect true win probabilities.

Maintenance & Community

The project is a personal portfolio piece. The author notes a temporary removal of Hopsworks feature store and model registry due to stability concerns. Feedback is actively sought via LinkedIn and Twitter.

Licensing & Compatibility

The repository does not explicitly state a license. Standard GitHub repository practices apply. Commercial use would require clarification on licensing terms.

Limitations & Caveats

The project is described as a "work in progress" with ongoing iterations. While the model achieved 61.5% accuracy on the 2022-2023 regular season, it lags behind top public models (65.6%). The author acknowledges that a real-world betting strategy is significantly more complex than this model alone.

nba-prediction by cmunch1

Explore Similar Projects

upgini by upgini

liars-bar-llm by LYiHub

LotteryPrediction by yangboz

nflscrapR by maksimhorowitz

ProphitBet-Soccer-Bets-Predictor by kochlisGit

igel by nidhaloff

analytics-handbook by devinpleuler

football_analytics by eddwebster

python-aiplatform by googleapis

NBA-Machine-Learning-Sports-Betting by kyleskom

qxresearch-event-1 by qxresearch

wandb by wandb