MatX  by NVIDIA

C++ library for GPU numerical computing with Python-like syntax

created 3 years ago
1,340 stars

Top 30.6% on sourcepulse

GitHubView on GitHub
Project Summary

MatX is a C++17 library designed for high-performance numerical computing on NVIDIA GPUs and CPUs, targeting researchers and engineers who need efficient tensor operations with a Python-like syntax. It aims to provide near-native performance with reduced code complexity compared to lower-level CUDA programming or even GPU libraries like CuPy.

How It Works

MatX leverages optimized backend libraries and employs efficient kernel generation for custom operations. Its core design revolves around a C++ template-based tensor abstraction that allows for operator overloading and expression fusion. This enables the compiler to optimize complex sequences of operations, minimizing intermediate data movement and maximizing computational throughput. The library supports a wide range of data types, including half-precision and complex numbers, with specialized wrappers for seamless host and device execution.

Quick Start & Requirements

  • Installation: Header-only for application use; build tests/examples via CMake.
  • Prerequisites: CUDA 11.8 or 12.2.1+; GCC 9+, nvc++ 24.5, or Clang 17+; Linux OS. Supports Pascal to Hopper GPUs and Jetson (Jetpack 5.0+).
  • Resources: CMake fetches dependencies; compilation can be lengthy without parallelism.
  • Docs: Official Documentation
  • Quick Start: Quick Start Guide
  • Notebooks: Jupyter Notebooks

Highlighted Details

  • Achieves over 4x speedup compared to CuPy and 2100x over NumPy for FFT resamplers on an A100 GPU.
  • Supports Python-like syntax for tensor manipulation and operations.
  • Integrates easily with existing C++ projects via CMake.
  • Offers web-based visualization of GPU data.

Maintenance & Community

  • Active development by NVIDIA.
  • Discussions board available for user interaction.
  • Issue reporting guidelines provided with specific prefixes ([BUG], [DOC], [FEA], [QST]).
  • Contribution guide available in CONTRIBUTING.md.

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatibility: Permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

  • Linux-only support due to testing limitations; Windows support is community-driven.
  • CUDA 12.0.0-12.2.0 may cause build issues with unit tests.
  • Building documentation requires several external dependencies (Doxygen, Sphinx, etc.).
Health Check
Last commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
17
Issues (30d)
7
Star History
28 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
7 more.

ThunderKittens by HazyResearch

0.6%
3k
CUDA kernel framework for fast deep learning primitives
created 1 year ago
updated 3 days ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
2 more.

gpu.cpp by AnswerDotAI

0.2%
4k
C++ library for portable GPU computation using WebGPU
created 1 year ago
updated 2 weeks ago
Starred by Bojan Tunguz Bojan Tunguz(AI Scientist; Formerly at NVIDIA), Mckay Wrigley Mckay Wrigley(Founder of Takeoff AI), and
8 more.

ggml by ggml-org

0.3%
13k
Tensor library for machine learning
created 2 years ago
updated 3 days ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 14 hours ago
Feedback? Help us improve.