lightseq  by bytedance

CUDA library for sequence processing/generation, optimized for Transformer-family models

created 5 years ago
3,285 stars

Top 15.1% on sourcepulse

GitHubView on GitHub
Project Summary

LightSeq is a high-performance library for accelerating Transformer-based models (BERT, GPT, ViT, etc.) during training and inference. It targets researchers and engineers working with NLP and CV tasks like machine translation and text generation, offering significant speedups over standard PyTorch implementations.

How It Works

LightSeq leverages custom, fused CUDA kernels built on top of NVIDIA's cuBLAS, Thrust, and CUB libraries. This approach optimizes core Transformer operations for modern GPU architectures. It supports mixed-precision training and inference (fp16, int8) and integrates with popular frameworks like Fairseq and Hugging Face, enabling easy adoption and deployment.

Quick Start & Requirements

  • Install from PyPI: pip install lightseq (Linux, Python 3.6-3.8 only).
  • Build from Source: Requires CUDA toolkit and potentially HDF5. See detailed building introduction.
  • Framework Integration: Requires fairseq, transformers, seqeval, datasets, sacremoses for specific examples.
  • Deployment: Docker image available for Triton Inference Server: sudo docker pull hexisyztem/tritonserver_lightseq:22.01-1.

Highlighted Details

  • Up to 3x speedup for fp16 training and 5x for int8 training compared to PyTorch.
  • Up to 12x speedup for fp16 inference and 15x for int8 inference compared to PyTorch.
  • Supports Transformer, BERT, BART, GPT2, ViT, T5, MT5, XGLM, VAE, Multilingual, and MoE models.
  • Offers various decoding algorithms (beam search, sampling) and compatibility with DeepSpeed.

Maintenance & Community

The project has seen releases up to v3.0.0 (October 2022) with int8 support. Further community engagement details (Discord/Slack, roadmap) are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. This requires further investigation before commercial use or integration into closed-source projects.

Limitations & Caveats

The PyPI installation is restricted to Linux and Python versions 3.6-3.8. Support for newer Python versions or other operating systems likely requires building from source. The latest release noted is from October 2022, suggesting potential maintenance gaps.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
21 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jaret Burkett Jaret Burkett(Founder of Ostris), and
1 more.

nunchaku by nunchaku-tech

2.1%
3k
High-performance 4-bit diffusion model inference engine
created 9 months ago
updated 23 hours ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Philipp Schmid Philipp Schmid(DevRel at Google DeepMind), and
1 more.

SageAttention by thu-ml

2.4%
2k
Attention kernel for plug-and-play inference acceleration
created 10 months ago
updated 1 week ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
6 more.

FasterTransformer by NVIDIA

0.2%
6k
Optimized transformer library for inference
created 4 years ago
updated 1 year ago
Feedback? Help us improve.