gdrcopy by NVIDIA

GPU memory copy library using GPUDirect RDMA

Created 11 years ago

1,316 stars

Top 30.2% on SourcePulse

View on GitHub

4 Experts Love This Project

Luis Capelo

Cofounder of Lightning AI

Tristan Hume

MTS at Anthropic

Jeff Hammerbacher

Cofounder of Cloudera

Zhuohan Li

Coauthor of vLLM

Project Summary

GDRCopy is a low-latency library enabling direct CPU access to GPU memory via NVIDIA GPUDirect RDMA. It's designed for researchers and developers requiring high-performance data transfers between CPU and GPU, offering a CPU-driven copy mechanism with minimal overhead.

How It Works

GDRCopy leverages GPUDirect RDMA APIs to create user-space mappings of GPU memory. This allows GPU memory to be treated like host memory, facilitating efficient CPU-driven data transfers. The approach minimizes overhead by avoiding intermediate copies, though an initial memory pinning phase is required.

Quick Start & Requirements

Install: Build from source (make), RPM packages (build-rpm-packages.sh), or DEB packages (build-deb-packages.sh).
Prerequisites: NVIDIA Data Center/RTX GPU (Kepler+), CUDA >= 6.0, NVIDIA driver >= 418.40 (ppc64le) or >= 331.14 (other platforms), DKMS or equivalent for kernel module installation. GPU driver header files are also required.
Supported Platforms: Linux x86_64, ppc64le, arm64 on RHEL8/9, Ubuntu 20.04/22.04, SLE-15, Leap 15.x.
Links: GPUDirect RDMA

Highlighted Details

Achieves very low CPU-driven copy overhead (e.g., ~0.09 us for small transfers).
Host-to-Device (H-D) bandwidth up to 6-8 GB/s, Device-to-Host (D-H) bandwidth is significantly lower due to PCIe limitations.
Includes benchmarks for copy bandwidth (gdrcopy_copybw), latency (gdrcopy_copylat), API performance (gdrcopy_apiperf), and ping-pong latency (gdrcopy_pplat).
Supports NUMA-aware optimizations for performance tuning.

Maintenance & Community

Developed and maintained by NVIDIA.
Bug reporting via NVIDIA Developer site.

Licensing & Compatibility

License: Not explicitly stated in the README, but typically NVIDIA libraries are subject to NVIDIA's SDK license agreements.
Compatibility: Requires specific NVIDIA hardware and drivers. Does not work with CUDA managed memory.

Limitations & Caveats

gdr_map() requires addresses aligned to GPU pages; users must ensure alignment.
Handling memory regions that span across cudaMalloc allocations is not well-supported.
Proprietary driver flavor may have suboptimal performance on coherent platforms or issues with Intel CPUs using confidential computing.
Pinning the same GPU address multiple times may consume excessive BAR1 space on some driver versions.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

25 stars in the last 30 days