torchshard  by kaiyuyue

PyTorch engine for tensor slicing into parallel shards

Created 4 years ago
300 stars

Top 88.8% on SourcePulse

GitHubView on GitHub
Project Summary

TorchShard is a PyTorch extension designed to enable efficient training of large neural networks by sharding tensors across multiple GPUs. It targets researchers and engineers working with models that have massive linear layers or a very large number of classes, offering a way to reduce GPU memory consumption and scale training.

How It Works

TorchShard implements tensor parallelism by slicing PyTorch tensors along specified dimensions. It provides drop-in replacements for torch.nn.Linear (as ts.nn.ParallelLinear) and integrates with PyTorch's distributed primitives. This approach allows for parallel computation of linear layers and loss functions, distributing the memory and compute load across available GPUs. The API is designed to be consistent with PyTorch, minimizing the learning curve for users.

Quick Start & Requirements

  • Primary install: pip install torchshard
  • Prerequisites: PyTorch, distributed environment setup (e.g., torch.distributed.init_process_group).
  • Links: Documents, INSTALL.md

Highlighted Details

  • Enables scaling models with millions of classes or massive linear layers.
  • Offers parallel implementations for nn.Linear and loss functions.
  • Supports sharding along row (dim=0) or column (dim=1) dimensions.
  • Provides utilities for collecting sharded model states.

Maintenance & Community

The project is primarily maintained by Kaiyu Yue. Contributions are welcomed via pull requests. Contact email is provided for inquiries.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This requires further investigation for commercial use or closed-source integration.

Limitations & Caveats

The README does not specify compatibility with older PyTorch versions or other deep learning frameworks. The performance figures are based on specific hardware (NVIDIA TITAN-XP) and may vary on different GPU architectures.

Health Check
Last Commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Tri Dao Tri Dao(Chief Scientist at Together AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
1 more.

oslo by tunib-ai

0%
309
Framework for large-scale transformer optimization
Created 3 years ago
Updated 3 years ago
Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
790
Toolkit for easy model parallelization
Created 4 years ago
Updated 2 years ago
Starred by Travis Addair Travis Addair(Cofounder of Predibase), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
10 more.

hummingbird by microsoft

0.0%
3k
Compiler for trained ML models into tensor computation
Created 5 years ago
Updated 2 months ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
15 more.

FasterTransformer by NVIDIA

0.1%
6k
Optimized transformer library for inference
Created 4 years ago
Updated 1 year ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
26 more.

ColossalAI by hpcaitech

0.1%
41k
AI system for large-scale parallel training
Created 3 years ago
Updated 13 hours ago
Feedback? Help us improve.