tutorials by triton-inference-server

Tutorials for Triton Inference Server deployment

Created 3 years ago

813 stars

Top 43.6% on SourcePulse

Project Summary

This repository provides tutorials and examples for the Triton Inference Server, targeting users migrating from traditional deep learning inference to a more streamlined "Tensor in & Tensor out" approach. It aims to familiarize users with Triton's features and ease their transition to efficient model deployment.

How It Works

The tutorials demonstrate deploying models from various frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM, OpenVINO) to Triton. They cover conceptual understanding of inference infrastructure challenges, framework-specific deployment methods, and feature-specific examples. A dedicated HuggingFace guide details various deployment strategies for HuggingFace models.

Quick Start & Requirements

Examples are designed for users with a basic understanding of Triton.
Refer to the Triton Inference Server documentation for comprehensive details.
Links to Getting Started Checklist, Overview Video, and Conceptual Guide are available.

Highlighted Details

Includes tutorials for popular LLMs like Llama-2-7B and Falcon-7B using TensorRT-LLM and HuggingFace Transformers.
Covers deployment for models trained with PyTorch, TensorFlow, ONNX, TensorRT, vLLM, and OpenVINO.
Features guides on building inference infrastructure, migrating existing solutions, and agentic workflows.
Points to related repositories for Triton Server, Client, Backends, Model Analyzer, and Model Navigator.

Maintenance & Community

Contributions are welcomed via pull requests.
Requests for new examples can be submitted via issues.

Licensing & Compatibility

The repository itself is not explicitly licensed in the README. The Triton Inference Server core is typically Apache 2.0 licensed, but this should be verified with the main Triton repository.

Limitations & Caveats

The list of supported LLMs in the tutorials is not exhaustive.
Examples assume a basic familiarity with Triton Inference Server; prior review of getting started materials is recommended.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

1 stars in the last 30 days

Explore Similar Projects

Starred by

Han Wang

Han Wang(Cofounder of Mintlify) and

Justin Torre

Justin Torre(Cofounder of Helicone).

model_manager by openfoundry-ai

CLI tool for deploying open-source AI models to the cloud

Created 1 year ago

Updated 1 year ago

create-llm by theaniketgiri

Scaffolding LLM training projects

Created 5 months ago

Updated 2 months ago

Nemotron by NVIDIA-NeMo

Open models for advanced AI workflows

Created 3 months ago

Updated 4 days ago

Starred by

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen).

pinferencia by underneathall

Python model deployment library

Created 3 years ago

Updated 2 years ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

Yuan-2.0 by IEIT-Yuan

Large language model for research, fine-tuning, and deployment

Created 2 years ago

Updated 1 year ago

Starred by

Amanpreet Singh

Amanpreet Singh(Cofounder of Contextual AI),

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and

7 more.

truss by basetenlabs

Model deployment tool for productionizing AI/ML models

Created 3 years ago

Updated 2 days ago

vertex-ai-samples by GoogleCloudPlatform

Vertex AI samples: notebooks and code for ML/GenAI workflows

Created 4 years ago

Updated 3 days ago

AI-Guide-and-Demos-zh_CN by Hoper-J

AI guide and demos (zh_CN) for local LLM deployment/finetuning

Created 1 year ago

Updated 4 months ago

Starred by

Elie Bursztein

Elie Bursztein(Cybersecurity Lead at Google DeepMind),

Chaoyu Yang

Chaoyu Yang(Founder of Bento), and

1 more.

Deep-Learning-in-Production by ahkarami

Notes and references for deploying deep learning models to production

Created 7 years ago

Updated 1 year ago

Starred by

Andrew Kane

Andrew Kane(Author of pgvector),

Han Wang

Han Wang(Cofounder of Mintlify), and

6 more.

flower by adap

Federated AI framework for customizable system building

Created 5 years ago

Updated 2 days ago

PaddleX by PaddlePaddle

All-in-one toolkit for PaddlePaddle-based AI development

Created 5 years ago

Updated 2 days ago

Tutorial by InternLM

LLM/VLM tutorial for InternLM models

Created 2 years ago

Updated 8 months ago

Feedback? Help us improve.