torchscale  by microsoft

PyTorch library for scaling Transformers

created 2 years ago
3,097 stars

Top 15.8% on sourcepulse

GitHubView on GitHub
Project Summary

TorchScale provides a PyTorch library for building and scaling Transformer-based foundation models, targeting researchers and developers focused on AGI and large-scale AI. It offers implementations of novel architectures like RetNet, LongNet, and BitNet, aiming for improved generality, capability, stability, and efficiency across modalities.

How It Works

TorchScale implements advanced Transformer variants by parameterizing architectural choices within configuration objects. Key features include DeepNet for training stability in very deep Transformers, SubLN for improved generality and stability, and X-MoE for efficient sparse Mixture-of-Experts. It also supports multi-modal architectures and length extrapolation techniques like Xpos.

Quick Start & Requirements

  • Install: pip install torchscale
  • For optimized performance, install flash-attn (Ampere+ GPUs) or xformers (Volta+ GPUs). CUDA 11.8 or 12.1 required for xformers.
  • Example usage involves importing EncoderConfig, DecoderConfig, or EncoderDecoderConfig and instantiating the respective model classes.
  • Official documentation and examples are available within the repository.

Highlighted Details

  • Implements DeepNet for scaling Transformers to 1,000+ layers.
  • Features Foundation Transformers (Magneto) for cross-modal generality.
  • Includes Retentive Network (RetNet) as a Transformer successor.
  • Supports LongNet for processing extremely long sequences (up to 1 billion tokens).
  • Offers BitNet for 1-bit Transformers.

Maintenance & Community

The project is actively maintained by Microsoft researchers, with contributions from Shuming Ma and Hongyu Wang. It cites several related research papers and welcomes community contributions via pull requests, adhering to a Contributor License Agreement (CLA).

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While comprehensive, some advanced features like LongViT are marked as "in progress." The library is primarily focused on PyTorch and may require adaptation for other deep learning frameworks.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
34 stars in the last 90 days

Explore Similar Projects

Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
created 3 years ago
updated 7 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
6 more.

x-transformers by lucidrains

0.2%
5k
Transformer library with extensive experimental features
created 4 years ago
updated 3 days ago
Feedback? Help us improve.