antares  by microsoft

Compiler solution for PyTorch operator optimization on diverse accelerators

Created 5 years ago
467 stars

Top 65.2% on SourcePulse

GitHubView on GitHub
Project Summary

Antares (AutoRT) is a compiler solution for PyTorch users to invent, benchmark, and optimize custom operators for various hardware accelerators. It targets researchers and developers needing to push performance boundaries or integrate PyTorch with custom hardware backends, offering accelerated standard PyTorch applications and custom/fused operator generation.

How It Works

Antares utilizes an intermediate representation (IR) to define operations, which are then compiled and optimized for specific backends. This approach allows for abstract operator definition and backend-agnostic compilation, enabling efficient execution across diverse hardware like DirectX 12, CUDA, ROCm, and SYCL. The system supports both programmatic API-style and command-line style operator generation, with an integrated tuning mechanism.

Quick Start & Requirements

  • Install via pip: pip install autort
  • Requires Python 3.x.
  • Experimental support for Windows DirectX 12 and Linux CUDA.
  • Official documentation and tutorials are available.

Highlighted Details

  • Supports multi-platform kernel generation and optimization (CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL, OpenCL, Android).
  • Enables creation of custom-defined or fused operators beyond PyTorch's built-in functions.
  • Can serve as a benchmark utility for device performance testing and profiling.
  • Demonstrates integration with PyTorch 2.0 for applications like sorting, MNIST, and LLama models.

Maintenance & Community

  • Developed by Microsoft.
  • Encourages community contributions via issues and stars.

Licensing & Compatibility

  • License details are not explicitly stated in the provided README snippet, but Microsoft's open-source projects typically use permissive licenses like MIT. Further clarification on licensing is recommended for commercial use.

Limitations & Caveats

  • Support for platforms like ROCm, OpenCL, SYCL, and Apple Metal is listed as experimental or requested for future releases.
  • The README indicates experimental versions for Windows DirectX 12 and Linux CUDA, suggesting potential instability or incomplete features.
Health Check
Last Commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

VeOmni by ByteDance-Seed

3.4%
1k
Framework for scaling multimodal model training across accelerators
Created 5 months ago
Updated 3 weeks ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Zhuohan Li Zhuohan Li(Coauthor of vLLM), and
4 more.

mirage by mirage-project

2.2%
2k
Tool for fast GPU kernel generation via superoptimization
Created 1 year ago
Updated 1 day ago
Starred by David Cournapeau David Cournapeau(Author of scikit-learn), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
5 more.

lectures by gpu-mode

0.8%
5k
Lecture series for GPU-accelerated computing
Created 1 year ago
Updated 4 days ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
15 more.

torchtune by pytorch

0.2%
5k
PyTorch library for LLM post-training and experimentation
Created 1 year ago
Updated 1 day ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
15 more.

FasterTransformer by NVIDIA

0.1%
6k
Optimized transformer library for inference
Created 4 years ago
Updated 1 year ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Alexey Milovidov Alexey Milovidov(Cofounder of Clickhouse), and
29 more.

llm.c by karpathy

0.2%
28k
LLM training in pure C/CUDA, no PyTorch needed
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.