Discover and explore top open-source AI tools and projects—updated daily.
cyrusbehrHigh-performance GPU inference C++ library
Top 43.4% on SourcePulse
A modern, no-throw C++ library for high-performance GPU inference using NVIDIA TensorRT. It simplifies ONNX model compilation into optimized TensorRT engines, offering a clean C++ API with name-keyed tensors, caller-owned CUDA streams, explicit host/device transfers, and a robust Status/Result<T> error model. The library targets developers seeking efficient deep learning inference, with optional zero-copy Python bindings for seamless integration with Python ML ecosystems.
How It Works
The library abstracts TensorRT and CUDA complexities, providing an EngineBuilder for ONNX model compilation. A key feature is its robust engine caching mechanism, keyed by ONNX content hash, build options, TensorRT version, and GPU UUID, preventing silent misuse of stale caches. It supports dynamic shapes via per-input optimization profiles and enables thread-safe, multi-stream inference using an EnginePool, with each inference call managed by a caller-provided Stream. Public headers are deliberately free of TensorRT, OpenCV, or spdlog types, employing PImpl and generated headers for API purity. Optional fused preprocessing kernels and zero-copy Python bindings leverage __cuda_array_interface__ or DLPack for direct GPU memory access, releasing the GIL during inference.
Quick Start & Requirements
cmake --install build --prefix /opt/trtcpp followed by find_package(tensorrt_cpp_api REQUIRED) in downstream projects.pip install . (builds wheel via scikit-build-core).docs/install.md for installation details; examples/ directory for runnable reference programs.Highlighted Details
__cuda_array_interface__ or DLPack, minimizing host round-trips and releasing the GIL.Maintenance & Community
pre-commit hooks for formatting; CI includes build, CPU tests, sanitizers, and Python wheel builds.Licensing & Compatibility
LICENSE file for specifics.Limitations & Caveats
The project is scoped to Linux environments, CUDA 12, TensorRT ≥ 10, and CNN-style vision models. Support for Windows and advanced features like LLM/transformer inference are explicitly not included.
1 week ago
Inactive
ELS-RD
NVIDIA
NVIDIA
karpathy