TensorRT-Edge-LLM  by NVIDIA

High-performance LLM/VLM inference for physical AI on edge

Created 6 months ago
329 stars

Top 83.2% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

TensorRT Edge-LLM provides a high-performance, lightweight C++ inference runtime for Large Language Models (LLMs) and Vision-Language Models (VLMs) specifically designed for NVIDIA's embedded edge platforms like Jetson and DRIVE. It enables efficient deployment of state-of-the-art AI models on resource-constrained devices, facilitating advanced AI applications in automotive, robotics, industrial IoT, and general edge computing scenarios with reduced latency and improved privacy.

How It Works

The framework leverages a C++ inference runtime optimized for edge hardware. It includes Python scripts to convert HuggingFace checkpoints into the ONNX format, which are then compiled into optimized TensorRT engines. Crucially, the entire model export, engine building, and end-to-end inference process is designed to run directly on the target edge platforms, minimizing data transfer and maximizing on-device performance.

Quick Start & Requirements

  • Setup can be completed in under 15 minutes.
  • Requires supported NVIDIA edge platforms (e.g., Jetson, DRIVE), models, and precisions, detailed in the project's Overview and Supported Models documentation sections.
  • Refer to the Quick Start Guide and Developer Guide for comprehensive installation and usage instructions.

Highlighted Details

  • Optimized C++ runtime for high-performance LLM and VLM inference on edge.
  • Enables deployment on resource-constrained NVIDIA Jetson and DRIVE platforms.
  • Supports model conversion from HuggingFace checkpoints to ONNX via Python scripts.
  • End-to-end inference pipeline runs entirely on edge devices.
  • Targeted use cases include automotive AI assistants, robotics interaction, industrial monitoring, and on-device chatbots.

Maintenance & Community

  • Community support is available via GitHub Issues and GitHub Discussions.
  • Further engagement can be found on the NVIDIA Developer Forums.
  • Contribution guidelines are provided for interested parties.

Licensing & Compatibility

  • Licensed under the Apache License 2.0.
  • This permissive license generally allows for commercial use and integration into closed-source projects.

Limitations & Caveats

  • The provided README does not explicitly detail known limitations, unsupported features, or alpha/beta status.
Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
15
Star History
58 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
3 more.

LitServe by Lightning-AI

0.1%
4k
AI inference pipeline framework
Created 2 years ago
Updated 2 days ago
Feedback? Help us improve.