Parallel training framework for large language models
Top 46.5% on sourcepulse
PatrickStar addresses the prohibitive hardware requirements for training large-scale pre-trained language models (PTMs). It offers a solution for researchers and engineers to train larger models with fewer GPUs by efficiently utilizing both CPU and GPU memory.
How It Works
PatrickStar employs a dynamic, chunk-based memory management system for heterogeneous training. Unlike static approaches, it dynamically offloads model components not currently in use to CPU memory, maximizing GPU utilization. This chunk-based approach also optimizes collective communication for multi-GPU scaling.
Quick Start & Requirements
pip install .
gcc
version 7 or higher.nvcr.io/nvidia/pytorch:21.06-py3
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project's README mentions specific versions of DeepSpeed and PyTorch for benchmarks, implying potential compatibility considerations with newer versions. The primary installation method is from source, which may require more effort than pre-built packages.
2 years ago
Inactive