torchshard by kaiyuyue

PyTorch engine for tensor slicing into parallel shards

Created 4 years ago

300 stars

Top 88.9% on SourcePulse

2 Experts Love This Project

apsdehal

Amanpreet Singh

Cofounder of Contextual AI

RJT1990

Cofounder of General Reasoning; Cocreator of Papers with Code

Project Summary

TorchShard is a PyTorch extension designed to enable efficient training of large neural networks by sharding tensors across multiple GPUs. It targets researchers and engineers working with models that have massive linear layers or a very large number of classes, offering a way to reduce GPU memory consumption and scale training.

How It Works

TorchShard implements tensor parallelism by slicing PyTorch tensors along specified dimensions. It provides drop-in replacements for torch.nn.Linear (as ts.nn.ParallelLinear) and integrates with PyTorch's distributed primitives. This approach allows for parallel computation of linear layers and loss functions, distributing the memory and compute load across available GPUs. The API is designed to be consistent with PyTorch, minimizing the learning curve for users.

Quick Start & Requirements

Primary install: pip install torchshard
Prerequisites: PyTorch, distributed environment setup (e.g., torch.distributed.init_process_group).
Links: Documents, INSTALL.md

Highlighted Details

Enables scaling models with millions of classes or massive linear layers.
Offers parallel implementations for nn.Linear and loss functions.
Supports sharding along row (dim=0) or column (dim=1) dimensions.
Provides utilities for collecting sharded model states.

Maintenance & Community

The project is primarily maintained by Kaiyu Yue. Contributions are welcomed via pull requests. Contact email is provided for inquiries.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This requires further investigation for commercial use or closed-source integration.

Limitations & Caveats

The README does not specify compatibility with older PyTorch versions or other deep learning frameworks. The performance figures are based on specific hardware (NVIDIA TITAN-XP) and may vary on different GPU architectures.

Health Check

Last Commit

7 months ago

Responsiveness

1 week

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

Starred by

Tri Dao

Tri Dao(Chief Scientist at Together AI),

Stas Bekman

Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and

1 more.

oslo by tunib-ai

Framework for large-scale transformer optimization

Created 4 years ago

Updated 3 years ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

2 more.

tensor_parallel by BlackSamorez

PyTorch module for multi-GPU model parallelism

Created 3 years ago

Updated 2 years ago

Starred by

Luca Soldaini

Luca Soldaini(Research Scientist at Ai2),

Edward Sun

Edward Sun(Research Scientist at Meta Superintelligence Lab), and

4 more.

parallelformers by tunib-ai

Toolkit for easy model parallelization

Created 4 years ago

Updated 2 years ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI) and

Chuan Li

Chuan Li(Chief Scientific Officer at Lambda).

distributed-training-guide by LambdaLabsML

PyTorch guide for distributed training of large language models

Created 1 year ago

Updated 2 months ago

Starred by

Luca Antiga

Luca Antiga(CTO of Lightning AI),

William Falcon

William Falcon(Founder of Lightning AI), and

4 more.

lightning-thunder by Lightning-AI

PyTorch compiler for model optimization via source-to-source transformation

Created 1 year ago

Updated 2 days ago

Starred by

Travis Addair

Travis Addair(Cofounder of Predibase),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

10 more.

hummingbird by microsoft

Compiler for trained ML models into tensor computation

Created 5 years ago

Updated 5 months ago

Starred by

Nat Friedman

Nat Friedman(Former CEO of GitHub),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

15 more.

FasterTransformer by NVIDIA

Optimized transformer library for inference

Created 4 years ago

Updated 1 year ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI),

Chaoyu Yang

Chaoyu Yang(Founder of Bento), and

2 more.

oneflow by Oneflow-Inc

Deep learning framework for user-friendly, scalable, efficient model development

Created 9 years ago

Updated 1 month ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI),

Amit Jain

Amit Jain(Cofounder of Luma AI), and

22 more.

Megatron-LM by NVIDIA

Framework for training transformer models at scale

Created 6 years ago

Updated 21 hours ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

28 more.

ColossalAI by hpcaitech

AI system for large-scale parallel training

Created 4 years ago

Updated 2 weeks ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Tomas Valenta

Tomas Valenta(Cofounder of E2B), and

39 more.

DeepSpeed by deepspeedai

Deep learning optimization library for distributed training and inference

Created 6 years ago

Updated 18 hours ago

Starred by

Boris Cherny

Boris Cherny(Creator of Claude Code; MTS at Anthropic),

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind), and

12 more.

Paddle by PaddlePaddle

Deep learning framework for industrial practice

Created 9 years ago

Updated 1 day ago

Feedback? Help us improve.