This repository provides handwritten Python implementations of all algorithms from Li Hang's influential "Statistical Learning Methods" textbook. It targets students and practitioners seeking a code-first understanding of machine learning fundamentals, offering direct links between theoretical formulas and practical code.
How It Works
The project meticulously translates each algorithm from the textbook into Python code. Key design choices include ensuring every line of code is commented and explicitly referencing the source formula from the book, facilitating a clear mapping between theory and implementation. This approach aims to demystify complex algorithms for learners.
Quick Start & Requirements
- Installation: No explicit package installation is detailed; code is likely run directly.
- Prerequisites: Python environment. The Mnist dataset is provided in a compressed CSV format (107MB) requiring manual extraction.
- Resources: Requires disk space for datasets and computational resources for running algorithms.
- Documentation: Accompanying blog posts for each chapter are linked within the README.
Highlighted Details
- Comprehensive implementation of all algorithms from Li Hang's "Statistical Learning Methods".
- Code is heavily commented, with direct references to corresponding formulas in the book.
- Each chapter includes detailed blog posts explaining principles and implementation walkthroughs.
- The project is evolving, with plans for a published book and contributions to unsupervised learning sections.
Maintenance & Community
- The project welcomes contributions via Pull Requests and Issues.
- Contact is available via WeChat (lvtengchao) or email (lvtengchao@pku.edu.cn).
- The author is involved in book publication and has connections for MSRA internship referrals.
- Plans for offline ML/MLP/CV training classes are mentioned.
Licensing & Compatibility
- License: CC BY-NC-SA 4.0 (Attribution-NonCommercial-ShareAlike 4.0 International).
- Compatibility: The non-commercial clause restricts usage in commercial products or services. Derivative works must be shared under the same license.
Limitations & Caveats
- The repository serves as educational code examples rather than a production-ready library with a unified API.
- The CC BY-NC-SA 4.0 license strictly prohibits commercial use.
- Dataset handling (e.g., Mnist) requires manual steps like unzipping.