kernl  by ELS-RD

PyTorch transformer inference engine for GPU speedup

created 3 years ago
1,578 stars

Top 27.0% on sourcepulse

GitHubView on GitHub
Project Summary

Kernl is an open-source Python library designed to accelerate PyTorch transformer model inference on GPUs by several times. It targets researchers and engineers working with large language models who need to improve inference speed and reduce latency, offering a more hackable alternative to traditional inference engines.

How It Works

Kernl leverages OpenAI Triton, a Python-based language for writing GPU kernels, to rewrite critical operations like attention, linear layers, and layernorm. This approach allows for operator fusion, reducing memory bandwidth bottlenecks by avoiding intermediate results storage. It also utilizes CUDA graphs for zero-overhead inference replay and TorchDynamo to handle dynamic model behaviors by tracing and recompiling optimized computation graphs.

Quick Start & Requirements

  • Install via pip: pip install 'git+https://github.com/ELS-RD/kernl'
  • Requires PyTorch, Python >= 3.9, an Ampere GPU, and CUDA.
  • Docker image available for easier setup.
  • See Examples for end-to-end use cases.

Highlighted Details

  • Achieves significant speedups on transformer models like Llama v2, T5, and Whisper.
  • Kernels are written in OpenAI Triton, with individual kernels under 200 lines of code for ease of modification.
  • Supports optimization through kernel fusion and replacement of PyTorch operations with custom Triton kernels.
  • Includes extensive benchmarking tools and conventions for performance analysis.

Maintenance & Community

  • Developed by ELS-RD.
  • Contribution guide and Code of Conduct are available.

Licensing & Compatibility

  • License not explicitly stated in the README.

Limitations & Caveats

  • Requires specific hardware (Ampere GPU) and CUDA installation.
  • Benchmarks can take a considerable amount of time to run.
  • The project is built on newer technologies like Triton and TorchDynamo, which may still be evolving.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

InternEvo by InternLM

1.0%
402
Lightweight training framework for model pre-training
created 1 year ago
updated 1 week ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
5 more.

Liger-Kernel by linkedin

0.6%
5k
Triton kernels for efficient LLM training
created 1 year ago
updated 1 day ago
Feedback? Help us improve.