ByteTransformer  by bytedance

High-performance BERT transformer inference on NVIDIA GPUs

Created 2 years ago
476 stars

Top 64.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

ByteTransformer is a high-performance inference library for BERT-like transformers, targeting developers seeking to optimize inference serving on NVIDIA GPUs. It offers architectural-aware optimizations for padding-free BERT routines, delivering superior performance and reduced latency compared to standard implementations.

How It Works

This library provides both Python and C++ APIs, featuring a PyTorch plugin for easy integration. It implements end-to-end optimizations across key BERT components like QKV encoding, softmax, feed-forward networks, activation, layernorm, and multi-head attention, specifically targeting padding-free execution for efficiency.

Quick Start & Requirements

  • Installation: Build from source using CMake and make. Requires git submodule update --init, mkdir build && cd build, cmake -DTORCH_CUDA_ARCH_LIST="8.0" -DDataType=FP16 -DBUILD_THS=ON -DCUDAARCHS="80" .., and make.
  • Prerequisites: CUDA 11.6, CMake >= 3.13, PyTorch >= 1.8, Python >= 3.7. Requires GPU compute capability 7.0 (V100) / 7.5 (T4) or 8.0 (A100).
  • Resources: Setup involves compiling from source. Benchmark script benchmark/bert_bench.sh is available.
  • Docs/Demo: Technical details published at IEEE IPDPS 2023 and arXiv:2210.03052.

Highlighted Details

  • Achieves superior inference performance compared to PyTorch, TensorFlow, FasterTransformer, and DeepSpeed on A100 GPUs.
  • Demonstrates significant speedups, e.g., reducing latency for BERT inference with batch size 16 and sequence length 1024 from 53.21ms (PyTorch) to 24.70ms (ByteTransformer).
  • Optimizes padding-free BERT routines, including QKV, softmax, FFN, activation, layernorm, and multi-head attention.
  • Supports both fixed-length and variable-length transformer inputs.

Maintenance & Community

The provided README does not contain information regarding maintainers, community channels (e.g., Discord, Slack), or project roadmaps.

Licensing & Compatibility

The license type and any compatibility notes for commercial or closed-source use are not specified in the provided README.

Limitations & Caveats

Currently, only the standard BERT transformer encoder architecture is supported within this repository.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wei-Lin Chiang Wei-Lin Chiang(Cofounder of LMArena), and
13 more.

awesome-tensor-compilers by merrymercy

0.1%
3k
Curated list of tensor compiler projects and papers
Created 5 years ago
Updated 1 year ago
Starred by Shengjia Zhao Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
14 more.

BIG-bench by google

0.1%
3k
Collaborative benchmark for probing and extrapolating LLM capabilities
Created 5 years ago
Updated 1 year ago
Starred by Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
14 more.

simpletransformers by ThilinaRajapakse

0%
4k
Rapid NLP task implementation
Created 6 years ago
Updated 4 months ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
16 more.

text-to-text-transfer-transformer by google-research

0.1%
6k
Unified text-to-text transformer for NLP research
Created 6 years ago
Updated 2 days ago
Starred by Vaibhav Nivargi Vaibhav Nivargi(Cofounder of Moveworks), Chuan Li Chuan Li(Chief Scientific Officer at Lambda), and
5 more.

awesome-mlops by visenger

0.1%
14k
Curated MLOps knowledge hub
Created 5 years ago
Updated 1 year ago
Feedback? Help us improve.