Applied AI experiments and examples for PyTorch
Top 91.9% on sourcepulse
This repository provides a collection of applied AI experiments and examples, primarily focusing on PyTorch. It targets researchers and engineers looking to leverage optimized kernels and explore advanced techniques for efficient model training and inference, offering practical implementations for cutting-edge AI research.
How It Works
The core of the repository features custom Triton and CUDA kernels designed to accelerate specific operations. These include Mixture-of-Experts (MoE) GEMM for Mixtral inference, fused Softmax, and fused RMSNorm, all aimed at improving performance by optimizing memory access patterns and fusing operations. The focus is on inference acceleration and efficiency for both training and inference workloads.
Quick Start & Requirements
pip
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Some kernels are explicitly noted as supporting inference only, meaning they do not include backward pass support for training. The repository contains experimental code, and users should be aware of potential instability or ongoing development.
2 months ago
1 week