PatrickStar  by Tencent

Parallel training framework for large language models

Created 5 years ago
778 stars

Top 44.7% on SourcePulse

GitHubView on GitHub
Project Summary

PatrickStar addresses the prohibitive hardware requirements for training large-scale pre-trained language models (PTMs). It offers a solution for researchers and engineers to train larger models with fewer GPUs by efficiently utilizing both CPU and GPU memory.

How It Works

PatrickStar employs a dynamic, chunk-based memory management system for heterogeneous training. Unlike static approaches, it dynamically offloads model components not currently in use to CPU memory, maximizing GPU utilization. This chunk-based approach also optimizes collective communication for multi-GPU scaling.

Quick Start & Requirements

  • Install via pip install .
  • Requires gcc version 7 or higher.
  • Tested NVIDIA NGC image: nvcr.io/nvidia/pytorch:21.06-py3
  • Official quick-start and examples are available.

Highlighted Details

  • Enables training of an 18B parameter model on 8x V100 GPUs with 240GB total GPU memory.
  • Achieves training of a 68B model on 8x A100 GPUs with 1TB CPU memory.
  • Successfully trained a GPT3-175B model on 32 GPUs.
  • Offers performance improvements over DeepSpeed for models of similar sizes.

Maintenance & Community

  • Developed by the WeChat AI Team, Tencent NLP Oteam.
  • Contact: {jiaruifang, zilinzhu, josephyu}@tencent.com

Licensing & Compatibility

  • BSD 3-Clause License.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The project's README mentions specific versions of DeepSpeed and PyTorch for benchmarks, implying potential compatibility considerations with newer versions. The primary installation method is from source, which may require more effort than pre-built packages.

Health Check
Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
12 more.

Liger-Kernel by linkedin

0.2%
6k
Triton kernels for efficient LLM training
Created 1 year ago
Updated 3 days ago
Feedback? Help us improve.