Discover and explore top open-source AI tools and projects—updated daily.
NVIDIAHigh-performance LLM/VLM inference for physical AI on edge
Top 97.6% on SourcePulse
Summary
TensorRT Edge-LLM provides a high-performance, lightweight C++ inference runtime for Large Language Models (LLMs) and Vision-Language Models (VLMs) specifically designed for NVIDIA's embedded edge platforms like Jetson and DRIVE. It enables efficient deployment of state-of-the-art AI models on resource-constrained devices, facilitating advanced AI applications in automotive, robotics, industrial IoT, and general edge computing scenarios with reduced latency and improved privacy.
How It Works
The framework leverages a C++ inference runtime optimized for edge hardware. It includes Python scripts to convert HuggingFace checkpoints into the ONNX format, which are then compiled into optimized TensorRT engines. Crucially, the entire model export, engine building, and end-to-end inference process is designed to run directly on the target edge platforms, minimizing data transfer and maximizing on-device performance.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
6 days ago
Inactive
snowflakedb
Lightning-AI
openvinotoolkit