PyTorch module for multi-GPU model parallelism
Top 51.9% on sourcepulse
This library enables users to effortlessly distribute PyTorch models across multiple GPUs for training and inference with minimal code changes. It's designed for researchers and practitioners working with large language models that exceed single-GPU memory capacity, offering a straightforward solution for scaling.
How It Works
The core of the library is the tp.tensor_parallel
function, which automatically partitions model weights across specified GPUs. It implements tensor parallelism by splitting individual layer weights, performing computations on each GPU, and then synchronizing the results. This approach allows for linear speedups and memory savings by distributing the model's footprint. The library also supports ZeRO-3 sharding for trainable parameters not covered by tensor parallelism, further optimizing memory usage during training.
Quick Start & Requirements
pip install tensor_parallel
Highlighted Details
convert_state_dict
and accelerate
.save_tensor_parallel
context manager for saving models to a non-parallel format.Maintenance & Community
The project is actively maintained by BlackSamorez. Users can report bugs and issues via the GitHub issue tracker.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The library is primarily designed for quick prototyping on a single machine. For large-scale, multi-node training, more complex solutions like DeepSpeed or Megatron-LM are recommended. Debugging NCCL errors may require setting TENSOR_PARALLEL_USE_NATIVE=1
.
1 year ago
1 day