client by triton-inference-server

SDK for simplifying Triton inference server communication (C++, Python, Java)

Created 4 years ago

673 stars

Top 50.3% on SourcePulse

Project Summary

This repository provides client libraries and examples for interacting with the Triton Inference Server. It offers C++, Python, and Java APIs to facilitate communication via HTTP/REST or gRPC, enabling inference, status checks, and model repository management. The libraries also support efficient data transfer using system and CUDA shared memory.

How It Works

The core of the project lies in its robust client libraries, which abstract the complexities of network communication (HTTP/REST and gRPC) with the Triton Inference Server. They provide convenient interfaces for sending inference requests, managing model lifecycles, and retrieving server status. A key advantage is the support for shared memory (system and CUDA), which bypasses data serialization/deserialization overhead for significant performance gains, especially with large inputs/outputs.

Quick Start & Requirements

Python: pip install tritonclient[all] (installs HTTP/REST, gRPC, and CUDA shared memory support).
C++/Java: Download pre-built libraries from GitHub releases or build from source using CMake.
Docker: Pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3-sdk from NGC.
Prerequisites: Python 3.x, CMake, Maven/JDK (for Java client build), Docker (for NGC image). CUDA shared memory requires a compatible NVIDIA driver.
Links: Triton Client Libraries

Highlighted Details

Supports HTTP/REST and gRPC protocols.
Enables efficient data transfer via System Shared Memory and CUDA Shared Memory.
Includes example applications for image classification and ensemble models.
Offers Python AsyncIO support (Beta) and a Client Plugin API (Beta) for custom request header manipulation.
Supports ORCA Header Metrics for KV-cache utilization and capacity.
Provides client-side compression and SSL/TLS configuration.

Maintenance & Community

The project is actively maintained by the Triton Inference Server team. Questions and issues can be reported on the main Triton issues page.

Licensing & Compatibility

The client libraries are typically distributed under a permissive license (e.g., Apache 2.0, similar to the server), allowing for commercial use and integration into closed-source applications.

Limitations & Caveats

The Java API currently supports a limited feature subset. Python AsyncIO and Client Plugin API features are in Beta and subject to change. When using CUDA shared memory with Docker, the --pid host flag is required for containers.

client by triton-inference-server

Explore Similar Projects

aiges by iflytek

openai4j by Lambdua

langcorn by msoedov

ccNexus by lich0821

mcp-proxy by TBXark

openai-cpp by olrea

go-mcp by ThinkInAIXYZ

swift-sdk by modelcontextprotocol

gitlab-mcp by zereight

anthropic-sdk-python by anthropics

mcp-context-forge by IBM

servers by modelcontextprotocol