EET  by NetEase-FuXi

PyTorch plugin for efficient Transformer-based model inference

Created 5 years ago
264 stars

Top 96.6% on SourcePulse

GitHubView on GitHub
Project Summary

EET (Easy and Efficient Transformer) is a PyTorch inference plugin designed to optimize the performance and affordability of large Transformer-based NLP and multi-modal models. It targets researchers and developers working with models like GPT-3, BERT, CLIP, Baichuan, and LLaMA, offering significant speedups and reduced memory footprints for single-GPU inference.

How It Works

EET achieves its performance gains through a combination of CUDA kernel optimizations and quantization/sparsity algorithms. It provides low-level "Operators APIs" that can be composed to build custom model architectures, as well as higher-level "Model APIs" that seamlessly integrate with Hugging Face Transformers and Fairseq models. This layered approach allows for both deep customization and easy adoption.

Quick Start & Requirements

  • Installation: Recommended via Docker (docker build -t eet_docker:0.1 . then nvidia-docker run ...). Alternatively, clone the repo and pip install . from source.
  • Prerequisites: CUDA >= 11.4, Python >= 3.7, GCC >= 7.4.0, PyTorch >= 1.12.0, NumPy >= 1.19.1, Fairseq == 0.10.0, Transformers >= 4.31.0.
  • Resources: Supports mega-size models on a single GPU.
  • Docs: https://github.com/NetEase-FuXi/EET/tree/main/example/python

Highlighted Details

  • Supports int8 quantization for further optimization.
  • Offers out-of-the-box integration with Transformers and Fairseq.
  • Provides ready-made pipelines for common NLP tasks (text-classification, fill-mask, text-generation, etc.).
  • Claims 2-8x speedup for models like GPT-3 and T5.

Maintenance & Community

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The project is described as "beta" in some sections.
  • Only pre-padding is supported for GPT-3.
  • The license is not clearly defined, which may impact commercial adoption.
Health Check
Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
789
Toolkit for easy model parallelization
Created 4 years ago
Updated 3 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Maxime Labonne Maxime Labonne(Head of Post-Training at Liquid AI), and
1 more.

GPTFast by MDK8888

0%
684
HF Transformers accelerator for faster inference
Created 2 years ago
Updated 1 year ago
Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0%
1k
Transformer library for flexible model development
Created 4 years ago
Updated 1 year ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
5 more.

matmulfreellm by ridgerchu

0.0%
3k
MatMul-free language models
Created 1 year ago
Updated 4 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
4 more.

gemma_pytorch by google

0.3%
6k
PyTorch implementation for Google's Gemma models
Created 2 years ago
Updated 10 months ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
15 more.

FasterTransformer by NVIDIA

0.0%
6k
Optimized transformer library for inference
Created 5 years ago
Updated 2 years ago
Feedback? Help us improve.