ucc  by openucx

Collective communication library for HPC, AI/ML, and I/O workloads

created 5 years ago
263 stars

Top 97.0% on SourcePulse

GitHubView on GitHub
Project Summary

Unified Collective Communication (UCC) provides a flexible, feature-rich API for collective communication operations, targeting HPC, AI/ML, and I/O workloads. It aims to deliver highly scalable and performant collectives across various programming models and runtimes, supporting nonblocking operations, flexible resource allocation, and hardware-specific collectives.

How It Works

UCC is built on a component architecture, leveraging UCX (Unified Communication X) for its transport layer. It supports hardware collectives as first-class citizens, enabling optimized communication patterns for GPUs and other accelerators. The design emphasizes flexibility in resource management and synchronization, allowing for repetitive collective operations with a single initialization.

Quick Start & Requirements

  • Install: Compile from source using autogen.sh, configure --prefix=<ucc-install-path> --with-ucx=<ucx-install-path>, and make -j install.
  • Prerequisites: UCX is required. CUDA (11.0+) or HIP are optional for GPU support. Doxygen is used for documentation generation.
  • Setup: Requires compiling UCX and potentially Open MPI if integrating with MPI.
  • Links: UCX, Open MPI

Highlighted Details

  • Supports UCX/UCP transports including InfiniBand, RoCE, Shared Memory, and hardware offload via SHARP.
  • Integrates with CUDA (NCCL) and HIP (RCCL) for GPU-accelerated collectives.
  • Enables UCC collectives within Open MPI and OpenSHMEM applications via specific MCA parameters.
  • Offers a flexible synchronous model and supports repetitive collective operations.

Maintenance & Community

The project is associated with the openucx organization. Further community and contribution details are available in the CONTRIBUTING file.

Licensing & Compatibility

UCC is BSD-style licensed, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

Compilation requires building UCX and potentially Open MPI from source. Specific hardware transports and GPU support depend on correctly installing and configuring CUDA or HIP.

Health Check
Last commit

2 days ago

Responsiveness

1 week

Pull Requests (30d)
17
Issues (30d)
4
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
11 more.

ThunderKittens by HazyResearch

0.5%
3k
CUDA kernel framework for fast deep learning primitives
created 1 year ago
updated 1 week ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
3 more.

gpu.cpp by AnswerDotAI

0.1%
4k
C++ library for portable GPU computation using WebGPU
created 1 year ago
updated 1 month ago
Feedback? Help us improve.