uccl  by uccl-project

GPU collective communication library for ML workloads

created 7 months ago
482 stars

Top 63.5% on SourcePulse

GitHubView on GitHub
Project Summary

UCCL is an open-source collective communication library designed to enhance GPU communication performance for machine learning workloads, offering a drop-in replacement for NCCL/RCCL. It targets researchers and practitioners seeking higher latency and throughput, particularly in heterogeneous GPU and networking environments.

How It Works

UCCL re-architects the communication layer to maximize hardware potential, featuring a custom software transport layer that employs packet spraying across numerous network paths to avoid congestion. This approach, combined with advanced congestion control and efficient loss recovery, aims to outperform traditional single-path transports like kernel TCP and RDMA.

Quick Start & Requirements

  • Install via git clone and bash build_and_install.sh [cuda|rocm].
  • Requires CUDA or ROCm.
  • Usage involves setting NCCL_NET_PLUGIN and LD_PRELOAD environment variables to point to UCCL plugins for specific network configurations (IB/RoCE, AWS EFA).
  • Official website: https://uccl-project.github.io/

Highlighted Details

  • Up to 2.5x performance improvement over NCCL for AllReduce on HGX servers with H100 GPUs.
  • Up to 3.3x improvement for AlltoAll on AWS p4d instances with A100 GPUs.
  • Up to 3.7x improvement for AllReduce on AWS g4dn instances with T4 GPUs.
  • Supports heterogeneous GPU and networking vendors (Nvidia, AMD, Broadcom).
  • Aims to provide vendor-agnostic Triton kernels for collectives.

Maintenance & Community

Actively developed at UC Berkeley Sky Computing Lab and UC Davis ArtSy lab. Supported by AMD, AWS, Broadcom, CloudLab, Google Cloud, IBM, Lambda, and Mibura. Community engagement via GitHub issues.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is under active development, with features like dynamic membership and improved KV cache transfer still pending. The absence of a specified license may pose adoption challenges for commercial applications.

Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
86
Issues (30d)
29
Star History
73 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
2 more.

gpustack by gpustack

3.0%
3k
GPU cluster manager for AI model deployment
created 1 year ago
updated 1 day ago
Feedback? Help us improve.