onprem  by amaiya

Python toolkit for on-premises LLMs applied to private data

created 1 year ago
730 stars

Top 48.4% on sourcepulse

GitHubView on GitHub
Project Summary

OnPrem.LLM is a Python toolkit designed to simplify the integration of on-premises Large Language Models (LLMs) with private data. It targets developers and researchers needing to apply LLMs to sensitive or locally stored information, offering a unified interface for document intelligence tasks like RAG, summarization, and few-shot classification.

How It Works

The toolkit primarily leverages llama-cpp-python for efficient local LLM inference, supporting GGUF model formats and GPU offloading via CUDA or Metal. It also offers an alternative backend using Hugging Face Transformers, enabling broader model compatibility and easier integration with quantized models (e.g., AWQ, bitsandbytes). For data processing, it supports various PDF extraction methods, including OCR and table structure inference, and offers both dense (Chroma) and sparse vector stores for efficient document retrieval.

Quick Start & Requirements

Highlighted Details

  • Supports local LLMs (GGUF via llama-cpp-python, Hugging Face Transformers) and cloud LLMs (via LiteLLM).
  • Features Retrieval Augmented Generation (RAG) with both dense and sparse vector stores.
  • Includes pipelines for summarization, information extraction, and few-shot text classification.
  • Offers a built-in web UI for interactive use.

Maintenance & Community

The project has active development with frequent releases (v0.13.0 as of April 2025), introducing new features like streamlined Ollama/cloud LLM support and an improved Web UI.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Installation of llama-cpp-python can be complex, especially on Windows, with recommendations to use WSL. AWQ quantization support is limited to Linux systems. The project's license is not clearly stated, which may impact commercial adoption.

Health Check
Last commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
26
Star History
14 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
2 more.

llmware by llmware-ai

0.2%
14k
Framework for enterprise RAG pipelines using small, specialized models
created 1 year ago
updated 1 week ago
Feedback? Help us improve.