ROCm library for GPU collective communication routines
Top 80.0% on sourcepulse
The ROCm Communication Collectives Library (RCCL) provides optimized collective communication routines for GPUs, targeting researchers and developers building large-scale AI and HPC applications. It enables efficient inter-GPU communication across multiple nodes, aiming to maximize bandwidth and minimize latency.
How It Works
RCCL implements standard collective operations like all-reduce, broadcast, and all-gather using ring and tree algorithms. It is optimized for various interconnects (PCIe, xGMI, InfiniBand, TCP/IP) and supports arbitrary numbers of GPUs in single or multi-node, multi-process applications. For performance, small operations can be batched or aggregated via the API.
Quick Start & Requirements
install.sh
script (./install.sh
) or build manually with CMake.install.sh
offers options for quick builds, debugging, and targeting specific GPU architectures. Manual build requires cmake .. && make -j <jobs>
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
install.sh
script simplifies initial setup.1 day ago
1 week