cudnn-frontend by NVIDIA

C++ and Python interface for NVIDIA cuDNN and high-performance kernels

Created 5 years ago

865 stars

Top 40.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Jeff Hammerbacher

Cofounder of Cloudera

Lei Zhang

Director Engineering AI at AMD

Project Summary

NVIDIA/cudnn-frontend offers a modern, open-source C++ header-only library and Python interface to the NVIDIA cuDNN library. It simplifies access to cuDNN's Graph API and high-performance kernels, targeting developers seeking to optimize deep learning workloads on NVIDIA hardware. The project enables inspection and contribution to core logic through open-sourced kernels, enhancing transparency and customizability.

How It Works

The library provides a Unified Graph API for defining complex computational subgraphs as reusable cudnn_frontend::graph::Graph objects. It abstracts the boilerplate of the backend cuDNN API through simplified C++ and Python bindings (via pybind11). Key advantages include built-in autotuning, support for the latest NVIDIA GPU architectures, and the ability to leverage and contribute to open-sourced, high-performance kernels like optimized GEMM and Native Sparse Attention.

Quick Start & Requirements

Python Installation: pip install nvidia_cudnn_frontend
C++ Integration: Include the header files; ensure the include path points to the repository's include/ directory.
Prerequisites: Python 3.8+, NVIDIA driver, CUDA Toolkit.
Build from Source: Requires python-dev and dependencies listed in requirements.txt. Environment variables CUDAToolkit_ROOT and CUDNN_PATH can override default paths.
Documentation: Developer Guide, C++ Samples, Python Samples.

Highlighted Details

Open-Source Kernels: Includes implementations for GEMM + Amax (Optimized FP8 matrix multiplication), GEMM + SwiGLU (Fused GEMM with SwiGLU activation), and NSA (Native Sparse Attention).
Unified Graph API: Enables creation of reusable, persistent graph objects for complex subgraphs.
Performance: Features built-in autotuning and support for the latest NVIDIA GPU architectures. Benchmarks for Scaled Dot-Product Attention (SDPA) on GB200 and GB300 GPUs are available.

Maintenance & Community

Contributions are actively welcomed. The README does not specify community channels (e.g., Discord, Slack) or list notable contributors or sponsorships.

Licensing & Compatibility

Licensed under the MIT License. This permissive license generally allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The provided README does not explicitly detail limitations, alpha status, or known bugs. Building from source is required for C++ samples and Python bindings, suggesting a focus on integration via the header-only C++ API or the pip-installed Python package.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

24 stars in the last 30 days