antares  by microsoft

Compiler solution for PyTorch operator optimization on diverse accelerators

created 5 years ago
478 stars

Top 64.9% on sourcepulse

GitHubView on GitHub
Project Summary

Antares (AutoRT) is a compiler solution for PyTorch users to invent, benchmark, and optimize custom operators for various hardware accelerators. It targets researchers and developers needing to push performance boundaries or integrate PyTorch with custom hardware backends, offering accelerated standard PyTorch applications and custom/fused operator generation.

How It Works

Antares utilizes an intermediate representation (IR) to define operations, which are then compiled and optimized for specific backends. This approach allows for abstract operator definition and backend-agnostic compilation, enabling efficient execution across diverse hardware like DirectX 12, CUDA, ROCm, and SYCL. The system supports both programmatic API-style and command-line style operator generation, with an integrated tuning mechanism.

Quick Start & Requirements

  • Install via pip: pip install autort
  • Requires Python 3.x.
  • Experimental support for Windows DirectX 12 and Linux CUDA.
  • Official documentation and tutorials are available.

Highlighted Details

  • Supports multi-platform kernel generation and optimization (CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL, OpenCL, Android).
  • Enables creation of custom-defined or fused operators beyond PyTorch's built-in functions.
  • Can serve as a benchmark utility for device performance testing and profiling.
  • Demonstrates integration with PyTorch 2.0 for applications like sorting, MNIST, and LLama models.

Maintenance & Community

  • Developed by Microsoft.
  • Encourages community contributions via issues and stars.

Licensing & Compatibility

  • License details are not explicitly stated in the provided README snippet, but Microsoft's open-source projects typically use permissive licenses like MIT. Further clarification on licensing is recommended for commercial use.

Limitations & Caveats

  • Support for platforms like ROCm, OpenCL, SYCL, and Apple Metal is listed as experimental or requested for future releases.
  • The README indicates experimental versions for Windows DirectX 12 and Linux CUDA, suggesting potential instability or incomplete features.
Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
7 more.

ThunderKittens by HazyResearch

0.6%
3k
CUDA kernel framework for fast deep learning primitives
created 1 year ago
updated 3 days ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 14 hours ago
Feedback? Help us improve.