ucc  by openucx

Collective communication library for HPC, AI/ML, and I/O workloads

Created 5 years ago
277 stars

Top 93.6% on SourcePulse

GitHubView on GitHub
Project Summary

Unified Collective Communication (UCC) provides a flexible, feature-rich API for collective communication operations, targeting HPC, AI/ML, and I/O workloads. It aims to deliver highly scalable and performant collectives across various programming models and runtimes, supporting nonblocking operations, flexible resource allocation, and hardware-specific collectives.

How It Works

UCC is built on a component architecture, leveraging UCX (Unified Communication X) for its transport layer. It supports hardware collectives as first-class citizens, enabling optimized communication patterns for GPUs and other accelerators. The design emphasizes flexibility in resource management and synchronization, allowing for repetitive collective operations with a single initialization.

Quick Start & Requirements

  • Install: Compile from source using autogen.sh, configure --prefix=<ucc-install-path> --with-ucx=<ucx-install-path>, and make -j install.
  • Prerequisites: UCX is required. CUDA (11.0+) or HIP are optional for GPU support. Doxygen is used for documentation generation.
  • Setup: Requires compiling UCX and potentially Open MPI if integrating with MPI.
  • Links: UCX, Open MPI

Highlighted Details

  • Supports UCX/UCP transports including InfiniBand, RoCE, Shared Memory, and hardware offload via SHARP.
  • Integrates with CUDA (NCCL) and HIP (RCCL) for GPU-accelerated collectives.
  • Enables UCC collectives within Open MPI and OpenSHMEM applications via specific MCA parameters.
  • Offers a flexible synchronous model and supports repetitive collective operations.

Maintenance & Community

The project is associated with the openucx organization. Further community and contribution details are available in the CONTRIBUTING file.

Licensing & Compatibility

UCC is BSD-style licensed, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

Compilation requires building UCX and potentially Open MPI from source. Specific hardware transports and GPU support depend on correctly installing and configuring CUDA or HIP.

Health Check
Last Commit

20 hours ago

Responsiveness

1 week

Pull Requests (30d)
14
Issues (30d)
1
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Johannes Hagemann Johannes Hagemann(Cofounder of Prime Intellect), and
4 more.

S-LoRA by S-LoRA

0.1%
2k
System for scalable LoRA adapter serving
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.