SwissArmyTransformer  by THUDM

Transformer library for flexible model development

created 3 years ago
1,086 stars

Top 35.6% on sourcepulse

GitHubView on GitHub
Project Summary

SwissArmyTransformer (SAT) is a flexible library for developing and training custom Transformer variants, targeting researchers and engineers working with large language models. It simplifies the creation of novel architectures by allowing users to compose existing models with lightweight "mixins" for features like prefix-tuning or custom embeddings, while leveraging DeepSpeed and model parallelism for efficient large-scale training.

How It Works

SAT employs a mixin-based architecture, enabling modular extension of base Transformer models (BERT, GPT, T5, GLM, etc.). Users define new functionalities as mixins, which are then added to a model instance via add_mixin. This approach allows for code reuse and rapid prototyping of model variations, such as integrating prefix-tuning or custom positional embeddings with minimal code changes. It also supports efficient inference for autoregressive models through state caching mixins.

Quick Start & Requirements

  • Install: pip install SwissArmyTransformer
  • Requirements: Python, PyTorch, DeepSpeed. GPU and CUDA recommended for training/inference.
  • Resources: Training large models (10B+ parameters) requires significant GPU memory and distributed training setup.
  • Docs: Tutorials available for using pretrained models and custom training.

Highlighted Details

  • Supports modular extension of various Transformer architectures (BERT, GPT, T5, GLM, ViT) via mixins.
  • Integrates DeepSpeed ZeRO-2 and activation checkpointing for efficient large-scale training.
  • Enables easy addition of techniques like prefix-tuning, custom positional embeddings, and autoregressive caching.
  • Claims to be the only open-source codebase supporting finetuning T5-10B on GPUs.

Maintenance & Community

  • Developed by THUDM.
  • Based on DeepSpeed, Megatron-LM, and Huggingface transformers.
  • Contribution tutorials are planned.

Licensing & Compatibility

  • License: MIT.
  • Compatibility: Permissive license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

The library is primarily focused on Transformer architectures and may require significant adaptation for non-Transformer models. While it supports large models, effective utilization necessitates a strong understanding of distributed training frameworks like DeepSpeed.

Health Check
Last commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Starred by Lewis Tunstall Lewis Tunstall(Researcher at Hugging Face), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
3 more.

FARM by deepset-ai

0%
2k
NLP framework for transfer learning with BERT & Co
created 6 years ago
updated 1 year ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
12 more.

DeepSpeed by deepspeedai

0.2%
40k
Deep learning optimization library for distributed training and inference
created 5 years ago
updated 1 day ago
Feedback? Help us improve.