torchscale by microsoft

PyTorch library for scaling Transformers

Created 3 years ago

3,131 stars

Top 15.2% on SourcePulse

View on GitHub

5 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Chris Van Pelt

Cofounder of Weights & Biases

Elvis Saravia

Founder of DAIR.AI

Philipp Schmid

DevRel at Google DeepMind

and 1 more!

Project Summary

TorchScale provides a PyTorch library for building and scaling Transformer-based foundation models, targeting researchers and developers focused on AGI and large-scale AI. It offers implementations of novel architectures like RetNet, LongNet, and BitNet, aiming for improved generality, capability, stability, and efficiency across modalities.

How It Works

TorchScale implements advanced Transformer variants by parameterizing architectural choices within configuration objects. Key features include DeepNet for training stability in very deep Transformers, SubLN for improved generality and stability, and X-MoE for efficient sparse Mixture-of-Experts. It also supports multi-modal architectures and length extrapolation techniques like Xpos.

Quick Start & Requirements

Install: pip install torchscale
For optimized performance, install flash-attn (Ampere+ GPUs) or xformers (Volta+ GPUs). CUDA 11.8 or 12.1 required for xformers.
Example usage involves importing EncoderConfig, DecoderConfig, or EncoderDecoderConfig and instantiating the respective model classes.
Official documentation and examples are available within the repository.

Highlighted Details

Implements DeepNet for scaling Transformers to 1,000+ layers.
Features Foundation Transformers (Magneto) for cross-modal generality.
Includes Retentive Network (RetNet) as a Transformer successor.
Supports LongNet for processing extremely long sequences (up to 1 billion tokens).
Offers BitNet for 1-bit Transformers.

Maintenance & Community

The project is actively maintained by Microsoft researchers, with contributions from Shuming Ma and Hongyu Wang. It cites several related research papers and welcomes community contributions via pull requests, adhering to a Contributor License Agreement (CLA).

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While comprehensive, some advanced features like LongViT are marked as "in progress." The library is primarily focused on PyTorch and may require adaptation for other deep learning frameworks.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days