tensor_parallel  by BlackSamorez

PyTorch module for multi-GPU model parallelism

created 2 years ago
657 stars

Top 51.9% on sourcepulse

GitHubView on GitHub
Project Summary

This library enables users to effortlessly distribute PyTorch models across multiple GPUs for training and inference with minimal code changes. It's designed for researchers and practitioners working with large language models that exceed single-GPU memory capacity, offering a straightforward solution for scaling.

How It Works

The core of the library is the tp.tensor_parallel function, which automatically partitions model weights across specified GPUs. It implements tensor parallelism by splitting individual layer weights, performing computations on each GPU, and then synchronizing the results. This approach allows for linear speedups and memory savings by distributing the model's footprint. The library also supports ZeRO-3 sharding for trainable parameters not covered by tensor parallelism, further optimizing memory usage during training.

Quick Start & Requirements

  • Install: pip install tensor_parallel
  • Requirements: PyTorch, transformers. Multi-GPU setup recommended.
  • Demo: Kaggle notebook available for a 40B LLM.

Highlighted Details

  • Enables running large PyTorch models on multiple GPUs with a single line of code.
  • Supports both training and inference.
  • Offers memory-efficient dispatch for loading models using convert_state_dict and accelerate.
  • Includes a save_tensor_parallel context manager for saving models to a non-parallel format.

Maintenance & Community

The project is actively maintained by BlackSamorez. Users can report bugs and issues via the GitHub issue tracker.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The library is primarily designed for quick prototyping on a single machine. For large-scale, multi-node training, more complex solutions like DeepSpeed or Megatron-LM are recommended. Debugging NCCL errors may require setting TENSOR_PARALLEL_USE_NATIVE=1.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

InternEvo by InternLM

1.0%
402
Lightweight training framework for model pre-training
created 1 year ago
updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
2 more.

S-LoRA by S-LoRA

0.1%
2k
System for scalable LoRA adapter serving
created 1 year ago
updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Zhuohan Li Zhuohan Li(Author of vLLM), and
6 more.

torchtitan by pytorch

0.9%
4k
PyTorch platform for generative AI model training research
created 1 year ago
updated 22 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Lianmin Zheng Lianmin Zheng(Author of SGLang), and
13 more.

gpt-fast by pytorch-labs

0.1%
6k
PyTorch text generation for efficient transformer inference
created 1 year ago
updated 3 months ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
6 more.

FasterTransformer by NVIDIA

0.2%
6k
Optimized transformer library for inference
created 4 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
5 more.

TensorRT-LLM by NVIDIA

0.6%
11k
LLM inference optimization SDK for NVIDIA GPUs
created 1 year ago
updated 18 hours ago
Feedback? Help us improve.