PyTorch transformer inference engine for GPU speedup
Top 27.0% on sourcepulse
Kernl is an open-source Python library designed to accelerate PyTorch transformer model inference on GPUs by several times. It targets researchers and engineers working with large language models who need to improve inference speed and reduce latency, offering a more hackable alternative to traditional inference engines.
How It Works
Kernl leverages OpenAI Triton, a Python-based language for writing GPU kernels, to rewrite critical operations like attention, linear layers, and layernorm. This approach allows for operator fusion, reducing memory bandwidth bottlenecks by avoiding intermediate results storage. It also utilizes CUDA graphs for zero-overhead inference replay and TorchDynamo to handle dynamic model behaviors by tracing and recompiling optimized computation graphs.
Quick Start & Requirements
pip install 'git+https://github.com/ELS-RD/kernl'
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
1 day