torchscale  by microsoft

PyTorch library for scaling Transformers

Created 2 years ago
3,114 stars

Top 15.4% on SourcePulse

GitHubView on GitHub
Project Summary

TorchScale provides a PyTorch library for building and scaling Transformer-based foundation models, targeting researchers and developers focused on AGI and large-scale AI. It offers implementations of novel architectures like RetNet, LongNet, and BitNet, aiming for improved generality, capability, stability, and efficiency across modalities.

How It Works

TorchScale implements advanced Transformer variants by parameterizing architectural choices within configuration objects. Key features include DeepNet for training stability in very deep Transformers, SubLN for improved generality and stability, and X-MoE for efficient sparse Mixture-of-Experts. It also supports multi-modal architectures and length extrapolation techniques like Xpos.

Quick Start & Requirements

  • Install: pip install torchscale
  • For optimized performance, install flash-attn (Ampere+ GPUs) or xformers (Volta+ GPUs). CUDA 11.8 or 12.1 required for xformers.
  • Example usage involves importing EncoderConfig, DecoderConfig, or EncoderDecoderConfig and instantiating the respective model classes.
  • Official documentation and examples are available within the repository.

Highlighted Details

  • Implements DeepNet for scaling Transformers to 1,000+ layers.
  • Features Foundation Transformers (Magneto) for cross-modal generality.
  • Includes Retentive Network (RetNet) as a Transformer successor.
  • Supports LongNet for processing extremely long sequences (up to 1 billion tokens).
  • Offers BitNet for 1-bit Transformers.

Maintenance & Community

The project is actively maintained by Microsoft researchers, with contributions from Shuming Ma and Hongyu Wang. It cites several related research papers and welcomes community contributions via pull requests, adhering to a Contributor License Agreement (CLA).

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While comprehensive, some advanced features like LongViT are marked as "in progress." The library is primarily focused on PyTorch and may require adaptation for other deep learning frameworks.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
790
Toolkit for easy model parallelization
Created 4 years ago
Updated 2 years ago
Feedback? Help us improve.