onprem  by amaiya

Python toolkit for on-premises LLMs applied to private data

Created 2 years ago
750 stars

Top 46.3% on SourcePulse

GitHubView on GitHub
Project Summary

OnPrem.LLM is a Python toolkit designed to simplify the integration of on-premises Large Language Models (LLMs) with private data. It targets developers and researchers needing to apply LLMs to sensitive or locally stored information, offering a unified interface for document intelligence tasks like RAG, summarization, and few-shot classification.

How It Works

The toolkit primarily leverages llama-cpp-python for efficient local LLM inference, supporting GGUF model formats and GPU offloading via CUDA or Metal. It also offers an alternative backend using Hugging Face Transformers, enabling broader model compatibility and easier integration with quantized models (e.g., AWQ, bitsandbytes). For data processing, it supports various PDF extraction methods, including OCR and table structure inference, and offers both dense (Chroma) and sparse vector stores for efficient document retrieval.

Quick Start & Requirements

Highlighted Details

  • Supports local LLMs (GGUF via llama-cpp-python, Hugging Face Transformers) and cloud LLMs (via LiteLLM).
  • Features Retrieval Augmented Generation (RAG) with both dense and sparse vector stores.
  • Includes pipelines for summarization, information extraction, and few-shot text classification.
  • Offers a built-in web UI for interactive use.

Maintenance & Community

The project has active development with frequent releases (v0.13.0 as of April 2025), introducing new features like streamlined Ollama/cloud LLM support and an improved Web UI.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Installation of llama-cpp-python can be complex, especially on Windows, with recommendations to use WSL. AWQ quantization support is limited to Linux systems. The project's license is not clearly stated, which may impact commercial adoption.

Health Check
Last Commit

22 hours ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
3
Star History
14 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Jiaming Song Jiaming Song(Chief Scientist at Luma AI), and
1 more.

Curator by NVIDIA-NeMo

1.3%
1k
Data curation toolkit for LLMs
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.