PatrickStar  by Tencent

Parallel training framework for large language models

Created 4 years ago
767 stars

Top 45.5% on SourcePulse

GitHubView on GitHub
Project Summary

PatrickStar addresses the prohibitive hardware requirements for training large-scale pre-trained language models (PTMs). It offers a solution for researchers and engineers to train larger models with fewer GPUs by efficiently utilizing both CPU and GPU memory.

How It Works

PatrickStar employs a dynamic, chunk-based memory management system for heterogeneous training. Unlike static approaches, it dynamically offloads model components not currently in use to CPU memory, maximizing GPU utilization. This chunk-based approach also optimizes collective communication for multi-GPU scaling.

Quick Start & Requirements

  • Install via pip install .
  • Requires gcc version 7 or higher.
  • Tested NVIDIA NGC image: nvcr.io/nvidia/pytorch:21.06-py3
  • Official quick-start and examples are available.

Highlighted Details

  • Enables training of an 18B parameter model on 8x V100 GPUs with 240GB total GPU memory.
  • Achieves training of a 68B model on 8x A100 GPUs with 1TB CPU memory.
  • Successfully trained a GPT3-175B model on 32 GPUs.
  • Offers performance improvements over DeepSpeed for models of similar sizes.

Maintenance & Community

  • Developed by the WeChat AI Team, Tencent NLP Oteam.
  • Contact: {jiaruifang, zilinzhu, josephyu}@tencent.com

Licensing & Compatibility

  • BSD 3-Clause License.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The project's README mentions specific versions of DeepSpeed and PyTorch for benchmarks, implying potential compatibility considerations with newer versions. The primary installation method is from source, which may require more effort than pre-built packages.

Health Check
Last Commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
11 more.

Liger-Kernel by linkedin

0.6%
6k
Triton kernels for efficient LLM training
Created 1 year ago
Updated 1 day ago
Starred by François Chollet François Chollet(Author of Keras; Cofounder of Ndea, ARC Prize), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
13 more.

neon by NervanaSystems

0%
4k
Deep learning framework (discontinued)
Created 11 years ago
Updated 4 years ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Alexey Milovidov Alexey Milovidov(Cofounder of Clickhouse), and
29 more.

llm.c by karpathy

0.2%
28k
LLM training in pure C/CUDA, no PyTorch needed
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.