EET  by NetEase-FuXi

PyTorch plugin for efficient Transformer-based model inference

created 4 years ago
261 stars

Top 98.0% on sourcepulse

GitHubView on GitHub
Project Summary

EET (Easy and Efficient Transformer) is a PyTorch inference plugin designed to optimize the performance and affordability of large Transformer-based NLP and multi-modal models. It targets researchers and developers working with models like GPT-3, BERT, CLIP, Baichuan, and LLaMA, offering significant speedups and reduced memory footprints for single-GPU inference.

How It Works

EET achieves its performance gains through a combination of CUDA kernel optimizations and quantization/sparsity algorithms. It provides low-level "Operators APIs" that can be composed to build custom model architectures, as well as higher-level "Model APIs" that seamlessly integrate with Hugging Face Transformers and Fairseq models. This layered approach allows for both deep customization and easy adoption.

Quick Start & Requirements

  • Installation: Recommended via Docker (docker build -t eet_docker:0.1 . then nvidia-docker run ...). Alternatively, clone the repo and pip install . from source.
  • Prerequisites: CUDA >= 11.4, Python >= 3.7, GCC >= 7.4.0, PyTorch >= 1.12.0, NumPy >= 1.19.1, Fairseq == 0.10.0, Transformers >= 4.31.0.
  • Resources: Supports mega-size models on a single GPU.
  • Docs: https://github.com/NetEase-FuXi/EET/tree/main/example/python

Highlighted Details

  • Supports int8 quantization for further optimization.
  • Offers out-of-the-box integration with Transformers and Fairseq.
  • Provides ready-made pipelines for common NLP tasks (text-classification, fill-mask, text-generation, etc.).
  • Claims 2-8x speedup for models like GPT-3 and T5.

Maintenance & Community

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The project is described as "beta" in some sections.
  • Only pre-padding is supported for GPT-3.
  • The license is not clearly defined, which may impact commercial adoption.
Health Check
Last commit

8 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Julien Chaumond Julien Chaumond(Cofounder of Hugging Face), and
1 more.

parallelformers by tunib-ai

0%
790
Toolkit for easy model parallelization
created 4 years ago
updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
6 more.

AutoGPTQ by AutoGPTQ

0.1%
5k
LLM quantization package using GPTQ algorithm
created 2 years ago
updated 3 months ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
12 more.

DeepSpeed by deepspeedai

0.2%
40k
Deep learning optimization library for distributed training and inference
created 5 years ago
updated 1 day ago
Feedback? Help us improve.