TensorRT-Edge-LLM  by NVIDIA

High-performance LLM/VLM inference for physical AI on edge

Created 4 months ago
260 stars

Top 97.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

TensorRT Edge-LLM provides a high-performance, lightweight C++ inference runtime for Large Language Models (LLMs) and Vision-Language Models (VLMs) specifically designed for NVIDIA's embedded edge platforms like Jetson and DRIVE. It enables efficient deployment of state-of-the-art AI models on resource-constrained devices, facilitating advanced AI applications in automotive, robotics, industrial IoT, and general edge computing scenarios with reduced latency and improved privacy.

How It Works

The framework leverages a C++ inference runtime optimized for edge hardware. It includes Python scripts to convert HuggingFace checkpoints into the ONNX format, which are then compiled into optimized TensorRT engines. Crucially, the entire model export, engine building, and end-to-end inference process is designed to run directly on the target edge platforms, minimizing data transfer and maximizing on-device performance.

Quick Start & Requirements

  • Setup can be completed in under 15 minutes.
  • Requires supported NVIDIA edge platforms (e.g., Jetson, DRIVE), models, and precisions, detailed in the project's Overview and Supported Models documentation sections.
  • Refer to the Quick Start Guide and Developer Guide for comprehensive installation and usage instructions.

Highlighted Details

  • Optimized C++ runtime for high-performance LLM and VLM inference on edge.
  • Enables deployment on resource-constrained NVIDIA Jetson and DRIVE platforms.
  • Supports model conversion from HuggingFace checkpoints to ONNX via Python scripts.
  • End-to-end inference pipeline runs entirely on edge devices.
  • Targeted use cases include automotive AI assistants, robotics interaction, industrial monitoring, and on-device chatbots.

Maintenance & Community

  • Community support is available via GitHub Issues and GitHub Discussions.
  • Further engagement can be found on the NVIDIA Developer Forums.
  • Contribution guidelines are provided for interested parties.

Licensing & Compatibility

  • Licensed under the Apache License 2.0.
  • This permissive license generally allows for commercial use and integration into closed-source projects.

Limitations & Caveats

  • The provided README does not explicitly detail known limitations, unsupported features, or alpha/beta status.
Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
6
Star History
54 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
1 more.

ArcticInference by snowflakedb

0%
400
vLLM plugin for high-throughput, low-latency LLM and embedding inference
Created 11 months ago
Updated 1 day ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
3 more.

LitServe by Lightning-AI

0.1%
4k
AI inference pipeline framework
Created 2 years ago
Updated 21 hours ago
Feedback? Help us improve.