TensorRT-Edge-LLM by NVIDIA

High-performance LLM/VLM inference for physical AI on edge

Created 7 months ago

414 stars

Top 70.2% on SourcePulse

Project Summary

Summary

TensorRT Edge-LLM provides a high-performance, lightweight C++ inference runtime for Large Language Models (LLMs) and Vision-Language Models (VLMs) specifically designed for NVIDIA's embedded edge platforms like Jetson and DRIVE. It enables efficient deployment of state-of-the-art AI models on resource-constrained devices, facilitating advanced AI applications in automotive, robotics, industrial IoT, and general edge computing scenarios with reduced latency and improved privacy.

How It Works

The framework leverages a C++ inference runtime optimized for edge hardware. It includes Python scripts to convert HuggingFace checkpoints into the ONNX format, which are then compiled into optimized TensorRT engines. Crucially, the entire model export, engine building, and end-to-end inference process is designed to run directly on the target edge platforms, minimizing data transfer and maximizing on-device performance.

Quick Start & Requirements

Setup can be completed in under 15 minutes.
Requires supported NVIDIA edge platforms (e.g., Jetson, DRIVE), models, and precisions, detailed in the project's Overview and Supported Models documentation sections.
Refer to the Quick Start Guide and Developer Guide for comprehensive installation and usage instructions.

Highlighted Details

Optimized C++ runtime for high-performance LLM and VLM inference on edge.
Enables deployment on resource-constrained NVIDIA Jetson and DRIVE platforms.
Supports model conversion from HuggingFace checkpoints to ONNX via Python scripts.
End-to-end inference pipeline runs entirely on edge devices.
Targeted use cases include automotive AI assistants, robotics interaction, industrial monitoring, and on-device chatbots.

Maintenance & Community

Community support is available via GitHub Issues and GitHub Discussions.
Further engagement can be found on the NVIDIA Developer Forums.
Contribution guidelines are provided for interested parties.

Licensing & Compatibility

Licensed under the Apache License 2.0.
This permissive license generally allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The provided README does not explicitly detail known limitations, unsupported features, or alpha/beta status.

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

5

Issues (30d)

16

Star History

46 stars in the last 30 days

Explore Similar Projects

OmniInfer by omnimind-ai

Easy, fast, and private LLM & VLM inference for every device

Created 2 months ago

Updated 1 day ago

vllm-swift by TheTom

Native Swift/Metal LLM inference for Apple Silicon

Created 1 month ago

Updated 5 days ago

Kolosal by KolosalAI

Desktop app for local LLM training and inference

Created 1 year ago

Updated 1 year ago

OpenArc by SearchSavior

Local AI inference engine for Intel devices serving diverse models

Created 1 year ago

Updated 1 day ago

LLM-TPU by sophgo

Generative AI model deployment on Sophgo edge TPUs

Created 2 years ago

Updated 5 days ago

ztachip by ztachip

RISC-V edge AI accelerator platform for FPGAs and ASICs

Created 5 years ago

Updated 3 weeks ago

DL4AGX by NVIDIA

Deep learning toolkit for edge AI and autonomous vehicles

Created 7 years ago

Updated 2 months ago

ai-hub-apps by qualcomm

On-device AI apps for Snapdragon

Created 2 years ago

Updated 18 hours ago

Starred by

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm).

MaixPy by sipeed

Python SDK for edge AI development

Created 2 years ago

Updated 1 week ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

3 more.

LitServe by Lightning-AI

AI inference pipeline framework

Created 2 years ago

Updated 3 weeks ago

shimmy by Michael-A-Kuykendall

A lightweight, local-first AI inference server

Created 9 months ago

Updated 20 hours ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI),

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory), and

1 more.

openvino by openvinotoolkit

Open source toolkit for optimizing and deploying AI inference

Created 7 years ago

Updated 12 hours ago

Feedback? Help us improve.