PyTorch engine for tensor slicing into parallel shards
Top 90.0% on sourcepulse
TorchShard is a PyTorch extension designed to enable efficient training of large neural networks by sharding tensors across multiple GPUs. It targets researchers and engineers working with models that have massive linear layers or a very large number of classes, offering a way to reduce GPU memory consumption and scale training.
How It Works
TorchShard implements tensor parallelism by slicing PyTorch tensors along specified dimensions. It provides drop-in replacements for torch.nn.Linear
(as ts.nn.ParallelLinear
) and integrates with PyTorch's distributed primitives. This approach allows for parallel computation of linear layers and loss functions, distributing the memory and compute load across available GPUs. The API is designed to be consistent with PyTorch, minimizing the learning curve for users.
Quick Start & Requirements
pip install torchshard
torch.distributed.init_process_group
).Highlighted Details
nn.Linear
and loss functions.dim=0
) or column (dim=1
) dimensions.Maintenance & Community
The project is primarily maintained by Kaiyu Yue. Contributions are welcomed via pull requests. Contact email is provided for inquiries.
Licensing & Compatibility
The repository does not explicitly state a license in the README. This requires further investigation for commercial use or closed-source integration.
Limitations & Caveats
The README does not specify compatibility with older PyTorch versions or other deep learning frameworks. The performance figures are based on specific hardware (NVIDIA TITAN-XP) and may vary on different GPU architectures.
1 month ago
Inactive