NanoLLM  by dusty-nv

Optimized local inference for LLMs with HuggingFace-like APIs

created 1 year ago
304 stars

Top 88.9% on sourcepulse

GitHubView on GitHub
Project Summary

NanoLLM provides optimized local inference for large language models (LLMs) and multimodal AI applications. It targets developers and researchers seeking efficient, on-device deployment of advanced AI capabilities, including quantization, vision/language models, multimodal agents, speech processing, vector databases, and Retrieval-Augmented Generation (RAG). The project aims to simplify the integration of these complex AI functionalities into local environments.

How It Works

NanoLLM leverages highly optimized C++ implementations for core inference tasks, aiming for maximum performance on edge devices. It supports various quantization techniques to reduce model size and memory footprint without significant accuracy loss. The architecture is designed for modularity, allowing seamless integration of different AI components like vision encoders, language decoders, and speech models within a unified framework.

Quick Start & Requirements

  • Install via Docker: docker pull dustynv/nano_llm:24.7-r36.2.0
  • Prerequisites: NVIDIA GPU with CUDA 12+ recommended for optimal performance.
  • Documentation: dusty-nv.github.io/NanoLLM
  • Tutorials: Jetson AI Lab

Highlighted Details

  • Optimized C++ inference engine for high performance.
  • Support for quantization (e.g., FP16, INT8).
  • Integrated components for vision, language, speech, and RAG.
  • HuggingFace-like API for ease of use.

Maintenance & Community

The project is actively maintained by NVIDIA. Community resources and tutorials are available via the Jetson AI Lab.

Licensing & Compatibility

The project appears to be under a permissive license, suitable for commercial use and integration into closed-source applications. Specific license details should be verified in the repository.

Limitations & Caveats

While optimized for NVIDIA hardware, performance on non-NVIDIA platforms may vary. The project is actively developed, and specific model support or features may be subject to change.

Health Check
Last commit

9 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
56 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

ktransformers by kvcache-ai

0.4%
15k
Framework for LLM inference optimization experimentation
created 1 year ago
updated 3 days ago
Feedback? Help us improve.