NanoLLM by dusty-nv

Optimized local inference for LLMs with HuggingFace-like APIs

Created 1 year ago

340 stars

Top 81.2% on SourcePulse

Project Summary

NanoLLM provides optimized local inference for large language models (LLMs) and multimodal AI applications. It targets developers and researchers seeking efficient, on-device deployment of advanced AI capabilities, including quantization, vision/language models, multimodal agents, speech processing, vector databases, and Retrieval-Augmented Generation (RAG). The project aims to simplify the integration of these complex AI functionalities into local environments.

How It Works

NanoLLM leverages highly optimized C++ implementations for core inference tasks, aiming for maximum performance on edge devices. It supports various quantization techniques to reduce model size and memory footprint without significant accuracy loss. The architecture is designed for modularity, allowing seamless integration of different AI components like vision encoders, language decoders, and speech models within a unified framework.

Quick Start & Requirements

Install via Docker: docker pull dustynv/nano_llm:24.7-r36.2.0
Prerequisites: NVIDIA GPU with CUDA 12+ recommended for optimal performance.
Documentation: dusty-nv.github.io/NanoLLM
Tutorials: Jetson AI Lab

Highlighted Details

Optimized C++ inference engine for high performance.
Support for quantization (e.g., FP16, INT8).
Integrated components for vision, language, speech, and RAG.
HuggingFace-like API for ease of use.

Maintenance & Community

The project is actively maintained by NVIDIA. Community resources and tutorials are available via the Jetson AI Lab.

Licensing & Compatibility

The project appears to be under a permissive license, suitable for commercial use and integration into closed-source applications. Specific license details should be verified in the repository.

Limitations & Caveats

While optimized for NVIDIA hardware, performance on non-NVIDIA platforms may vary. The project is actively developed, and specific model support or features may be subject to change.

NanoLLM by dusty-nv

Explore Similar Projects

dots.vlm1 by rednote-hilab

cobra by h-zhao1997

FastV by pkunlp-icler

MR-Models by mtkresearch

awesome-huge-models by zhengzangw

Building-a-Small-LLM-from-Scratch by KaihuaTang

Lyra by JIA-Lab-research

omnilingual-asr by facebookresearch

MGM by JIA-Lab-research

wavenet_vocoder by r9y9

mlx-examples by ml-explore

unilm by microsoft