backend  by triton-inference-server

Triton backend tools for model execution

created 5 years ago
336 stars

Top 83.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides common source, scripts, and utilities for developing custom backends for the Triton Inference Server. It targets developers building custom inference logic or integrating new frameworks with Triton, enabling efficient model execution and pre/post-processing.

How It Works

Backends are implemented as shared libraries adhering to the Triton Backend API, which defines interfaces for managing backend, model, and instance lifecycles, as well as handling inference requests and responses. This API allows backends to interact with Triton for request processing, tensor data access, and response generation, supporting both single and decoupled response patterns.

Quick Start & Requirements

  • Build: mkdir build && cd build && cmake -DCMAKE_INSTALL_PREFIX:PATH=$(pwd)/install .. && make install
  • Dependencies: Requires Triton's common and core repositories. Specific tags can be set via CMake arguments (e.g., -DTRITON_COMMON_REPO_TAG=[tag]).
  • Integration: The utilities are typically included in a backend's build via CMakeLists.txt, rather than building this repository directly.

Highlighted Details

  • Supports a wide range of existing backends including TensorRT, ONNX Runtime, TensorFlow, PyTorch, OpenVINO, DALI, FIL, TensorRT-LLM, and vLLM.
  • Provides a Python backend option for custom Python-based pre/post-processing or direct execution of Python scripts.
  • The Triton Backend API is C-based, offering fine-grained control over model execution and request handling.
  • Supports decoupled responses, allowing for out-of-order and multiple responses per request.

Maintenance & Community

  • This repository is part of the Triton Inference Server project. General questions can be directed to the main Triton issues page.

Licensing & Compatibility

  • The repository's license is not explicitly stated in the README. However, Triton Inference Server itself is typically Apache 2.0 licensed, suggesting potential compatibility with commercial and closed-source applications.

Limitations & Caveats

  • Backends developed with the "legacy custom backend" API are deprecated and must be ported to the new Triton Backend API.
  • Platform support varies across the different official backends; a "Backend-Platform Support Matrix" should be consulted.
Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
0
Star History
21 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

client-python by mistralai

0.3%
628
Python SDK for Mistral AI platform
created 1 year ago
updated 1 week ago
Feedback? Help us improve.