MatX by NVIDIA

C++ library for GPU numerical computing with Python-like syntax

Created 4 years ago

1,388 stars

Top 29.0% on SourcePulse

Project Summary

MatX is a C++17 library designed for high-performance numerical computing on NVIDIA GPUs and CPUs, targeting researchers and engineers who need efficient tensor operations with a Python-like syntax. It aims to provide near-native performance with reduced code complexity compared to lower-level CUDA programming or even GPU libraries like CuPy.

How It Works

MatX leverages optimized backend libraries and employs efficient kernel generation for custom operations. Its core design revolves around a C++ template-based tensor abstraction that allows for operator overloading and expression fusion. This enables the compiler to optimize complex sequences of operations, minimizing intermediate data movement and maximizing computational throughput. The library supports a wide range of data types, including half-precision and complex numbers, with specialized wrappers for seamless host and device execution.

Quick Start & Requirements

Installation: Header-only for application use; build tests/examples via CMake.
Prerequisites: CUDA 11.8 or 12.2.1+; GCC 9+, nvc++ 24.5, or Clang 17+; Linux OS. Supports Pascal to Hopper GPUs and Jetson (Jetpack 5.0+).
Resources: CMake fetches dependencies; compilation can be lengthy without parallelism.
Docs: Official Documentation
Quick Start: Quick Start Guide
Notebooks: Jupyter Notebooks

Highlighted Details

Achieves over 4x speedup compared to CuPy and 2100x over NumPy for FFT resamplers on an A100 GPU.
Supports Python-like syntax for tensor manipulation and operations.
Integrates easily with existing C++ projects via CMake.
Offers web-based visualization of GPU data.

Maintenance & Community

Active development by NVIDIA.
Discussions board available for user interaction.
Issue reporting guidelines provided with specific prefixes ([BUG], [DOC], [FEA], [QST]).
Contribution guide available in CONTRIBUTING.md.

Licensing & Compatibility

License: Apache 2.0.
Compatibility: Permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

Linux-only support due to testing limitations; Windows support is community-driven.
CUDA 12.0.0-12.2.0 may cause build issues with unit tests.
Building documentation requires several external dependencies (Doxygen, Sphinx, etc.).

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

19 stars in the last 30 days