PatrickStar by Tencent

Parallel training framework for large language models

Created 4 years ago

773 stars

Top 45.3% on SourcePulse

Project Summary

PatrickStar addresses the prohibitive hardware requirements for training large-scale pre-trained language models (PTMs). It offers a solution for researchers and engineers to train larger models with fewer GPUs by efficiently utilizing both CPU and GPU memory.

How It Works

PatrickStar employs a dynamic, chunk-based memory management system for heterogeneous training. Unlike static approaches, it dynamically offloads model components not currently in use to CPU memory, maximizing GPU utilization. This chunk-based approach also optimizes collective communication for multi-GPU scaling.

Quick Start & Requirements

Install via pip install .
Requires gcc version 7 or higher.
Tested NVIDIA NGC image: nvcr.io/nvidia/pytorch:21.06-py3
Official quick-start and examples are available.

Highlighted Details

Enables training of an 18B parameter model on 8x V100 GPUs with 240GB total GPU memory.
Achieves training of a 68B model on 8x A100 GPUs with 1TB CPU memory.
Successfully trained a GPT3-175B model on 32 GPUs.
Offers performance improvements over DeepSpeed for models of similar sizes.

Maintenance & Community

Developed by the WeChat AI Team, Tencent NLP Oteam.
Contact: {jiaruifang, zilinzhu, josephyu}@tencent.com

Licensing & Compatibility

BSD 3-Clause License.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The project's README mentions specific versions of DeepSpeed and PyTorch for benchmarks, implying potential compatibility considerations with newer versions. The primary installation method is from source, which may require more effort than pre-built packages.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

3 stars in the last 30 days

Explore Similar Projects

MegaDLMs by JinjieNi

Accelerate diffusion language model training at any scale with GPU optimization

Created 2 months ago

Updated 2 months ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

1 more.

varuna by microsoft

Tool for efficient large DNN model training on commodity hardware

Created 4 years ago

Updated 1 year ago

ModelCenter by OpenBMB

Transformer library for efficient, low-resource, distributed training

Created 3 years ago

Updated 2 years ago

Starred by

Ying Sheng

Ying Sheng(Coauthor of SGLang).

glake by antgroup

GPU optimization library for memory management and IO

Created 2 years ago

Updated 9 months ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

2 more.

tensor_parallel by BlackSamorez

PyTorch module for multi-GPU model parallelism

Created 3 years ago

Updated 2 years ago

FlagScale by flagos-ai

Large model toolkit for end-to-end management and scaling

Created 2 years ago

Updated 2 days ago

Starred by

Bryan Helmig

Bryan Helmig(Cofounder of Zapier),

Lianmin Zheng

Lianmin Zheng(Coauthor of SGLang, vLLM), and

1 more.

mini-sglang by sgl-project

Lightweight LLM inference framework with advanced optimizations

Created 4 months ago

Updated 5 days ago

Starred by

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

12 more.

Liger-Kernel by linkedin

Triton kernels for efficient LLM training

Created 1 year ago

Updated 4 days ago

Starred by

Emil Ernerfeldt

Emil Ernerfeldt(Cofounder of Rerun),

Jesse Clark

Jesse Clark(Cofounder of Marqo), and

8 more.

burn by tracel-ai

Deep learning framework prioritizing flexibility, efficiency, and portability

Created 3 years ago

Updated 2 days ago

Starred by

François Chollet

François Chollet(Author of Keras; Cofounder of Ndea, ARC Prize),

Chaoyu Yang

Chaoyu Yang(Founder of Bento), and

13 more.

neon by NervanaSystems

Deep learning framework (discontinued)

Created 11 years ago

Updated 5 years ago

Starred by

Beyang Liu

Beyang Liu(Cofounder of Sourcegraph),

Chaoyu Yang

Chaoyu Yang(Founder of Bento), and

12 more.

OpenRLHF by OpenRLHF

RLHF framework for scalable training of large language models

Created 2 years ago

Updated 3 days ago

Starred by

Peter Norvig

Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google),

Alexey Milovidov

Alexey Milovidov(Cofounder of Clickhouse), and

29 more.

llm.c by karpathy

LLM training in pure C/CUDA, no PyTorch needed

Created 1 year ago

Updated 6 months ago

Feedback? Help us improve.