numpy-ml  by ddbourgin

ML algorithms implemented in NumPy

created 6 years ago
16,129 stars

Top 3.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive collection of machine learning algorithms implemented purely in NumPy. It serves as a valuable resource for researchers and practitioners seeking to understand, experiment with, or build upon fundamental ML concepts without relying on higher-level libraries like TensorFlow or PyTorch. The primary benefit is its legibility and educational value, showcasing the underlying mechanics of various algorithms.

How It Works

The project's core philosophy is to leverage NumPy for all computations, ensuring clarity and accessibility. Algorithms are implemented from scratch, often mirroring MATLAB's array-oriented approach where applicable (e.g., im2col, col2im). This design choice prioritizes understandability over raw performance, making it an excellent tool for learning and prototyping.

Quick Start & Requirements

  • Install for experimentation: git clone https://github.com/ddbourgin/numpy-ml.git && cd numpy-ml && virtualenv npml && source npml/bin/activate && pip3 install -r requirements-dev.txt
  • Install as a package: pip3 install -u numpy_ml
  • For RL environments: pip3 install -u 'numpy_ml[rl]'
  • Requirements: Python 3, NumPy. OpenAI Gym for RL components.
  • Documentation: https://github.com/ddbourgin/numpy-ml

Highlighted Details

  • Extensive coverage of classical ML models (GMM, HMM, LDA, linear models, tree-based models).
  • Detailed neural network components including layers, optimizers, losses, and activations.
  • Implementations for sequence models (n-grams), multi-armed bandits, and reinforcement learning.
  • Includes various preprocessing techniques from signal processing to text feature extraction.

Maintenance & Community

The project appears to be a personal collection with contributions welcomed via pull requests. The primary requirement for contributions is adherence to NumPy-only implementation.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This absence may pose compatibility issues for commercial or closed-source projects.

Limitations & Caveats

The project's explicit goal is "inefficient but somewhat legible," meaning performance is not optimized and may be significantly slower than libraries utilizing C/CUDA backends. The lack of a specified license is a notable caveat for adoption.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
88 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.