applied-ai  by meta-pytorch

Applied AI experiments and examples for PyTorch

Created 2 years ago
294 stars

Top 89.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a collection of applied AI experiments and examples, primarily focusing on PyTorch. It targets researchers and engineers looking to leverage optimized kernels and explore advanced techniques for efficient model training and inference, offering practical implementations for cutting-edge AI research.

How It Works

The core of the repository features custom Triton and CUDA kernels designed to accelerate specific operations. These include Mixture-of-Experts (MoE) GEMM for Mixtral inference, fused Softmax, and fused RMSNorm, all aimed at improving performance by optimizing memory access patterns and fusing operations. The focus is on inference acceleration and efficiency for both training and inference workloads.

Quick Start & Requirements

  • Installation: Typically involves cloning the repository and installing dependencies via pip.
  • Prerequisites: PyTorch, Triton, CUDA (for CUDA kernels), and potentially specific model dependencies (e.g., for Llama).
  • Resources: Requires a GPU for running the optimized kernels and experiments.

Highlighted Details

  • Triton kernels for MoE (Mixtral) GEMM, fused Softmax, and fused RMSNorm.
  • CUDA programming reading group materials and recordings.
  • Recipes for fine-tuning and inference of Llama models.
  • Contributions to NeurIPS LLM Efficiency Challenge.
  • Published papers on PyTorch 2, Triton kernel acceleration, and FSDP.

Maintenance & Community

  • The project is associated with pytorch-labs.
  • Links to Discord and lecture materials for the CUDA reading group are provided.

Licensing & Compatibility

  • License: BSD 3-Clause.
  • Compatibility: Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

Some kernels are explicitly noted as supporting inference only, meaning they do not include backward pass support for training. The repository contains experimental code, and users should be aware of potential instability or ongoing development.

Health Check
Last Commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Nikola Borisov Nikola Borisov(Founder and CEO of DeepInfra), and
3 more.

tensorrtllm_backend by triton-inference-server

0.2%
889
Triton backend for serving TensorRT-LLM models
Created 2 years ago
Updated 1 day ago
Starred by Travis Addair Travis Addair(Cofounder of Predibase), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
10 more.

hummingbird by microsoft

0.0%
3k
Compiler for trained ML models into tensor computation
Created 5 years ago
Updated 2 months ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
6 more.

numpy-ml by ddbourgin

0.1%
16k
ML algorithms implemented in NumPy
Created 6 years ago
Updated 1 year ago
Feedback? Help us improve.