Distributed compiler for computation-communication overlapping, based on Triton
Top 40.1% on sourcepulse
Triton-distributed is a distributed compiler based on OpenAI Triton, designed to create efficient kernels for parallel systems by overlapping computation and communication. It targets researchers and engineers developing high-performance distributed AI models, offering primitives to simplify the creation of complex communication patterns and achieve performance comparable to or better than hand-tuned libraries.
How It Works
The project extends OpenAI's Triton with a set of low-level primitives for distributed programming. These primitives abstract complex communication operations like AllToAll and GEMM, allowing developers to focus on the computation-communication overlap. The design emphasizes enabling programmers to write kernels that match or exceed the performance of specialized libraries, leveraging hardware interconnects like NVLink and InfiniBand.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
flash_decode.py
) and parts of Triton's original code.Limitations & Caveats
4 days ago
1 day