ML SYS learning notes and code
Top 15.8% on sourcepulse
This repository serves as a comprehensive learning resource for Machine Learning Systems (ML-SYS), targeting individuals interested in bridging the gap between ML theory and practical application. It offers detailed learning notes, code examples, and analyses of key systems and techniques in the ML-SYS domain, particularly focusing on Reinforcement Learning from Human Feedback (RLHF) and efficient model serving.
How It Works
The project is structured around the author's personal learning journey, covering topics from RLHF system development (including RLHF implementation, reward modeling, and distributed training) to model serving optimization (like latency reduction and embedding model serving) and fundamental ML system concepts (such as NCCL, PyTorch Distributed, and quantization). The content is presented through a mix of original notes, code walkthroughs, and analyses of existing research and tools like SGLang, OpenRLHF, and vLLM.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The content is presented as personal learning notes and may not represent fully production-ready solutions. Some sections are marked as incomplete or are in progress, with specific issues like NCCL hang errors being actively addressed. The author notes that some original writings were not preferred, indicating a subjective element to the content.
1 day ago
1 day