onprem by amaiya

Python toolkit for on-premises LLMs applied to private data

Created 2 years ago

807 stars

Top 43.8% on SourcePulse

View on GitHub

3 Experts Love This Project

Project Summary

OnPrem.LLM is a Python toolkit designed to simplify the integration of on-premises Large Language Models (LLMs) with private data. It targets developers and researchers needing to apply LLMs to sensitive or locally stored information, offering a unified interface for document intelligence tasks like RAG, summarization, and few-shot classification.

How It Works

The toolkit primarily leverages llama-cpp-python for efficient local LLM inference, supporting GGUF model formats and GPU offloading via CUDA or Metal. It also offers an alternative backend using Hugging Face Transformers, enabling broader model compatibility and easier integration with quantized models (e.g., AWQ, bitsandbytes). For data processing, it supports various PDF extraction methods, including OCR and table structure inference, and offers both dense (Chroma) and sparse vector stores for efficient document retrieval.

Quick Start & Requirements

Install: pip install onprem
Prerequisites: PyTorch (with optional GPU support), llama-cpp-python (for CPU/GPU inference), CUDA Toolkit (for NVIDIA GPUs), or Hugging Face Transformers.
GPU Acceleration: Requires llama-cpp-python compiled with GGML_CUDA=on (Linux) or GGML_METAL=on (Mac).
Docs: https://github.com/amaiya/onprem
Colab Demo: https://colab.research.google.com/github/amaiya/onprem/blob/main/onprem.ipynb

Highlighted Details

Supports local LLMs (GGUF via llama-cpp-python, Hugging Face Transformers) and cloud LLMs (via LiteLLM).
Features Retrieval Augmented Generation (RAG) with both dense and sparse vector stores.
Includes pipelines for summarization, information extraction, and few-shot text classification.
Offers a built-in web UI for interactive use.

Maintenance & Community

The project has active development with frequent releases (v0.13.0 as of April 2025), introducing new features like streamlined Ollama/cloud LLM support and an improved Web UI.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Installation of llama-cpp-python can be complex, especially on Windows, with recommendations to use WSL. AWQ quantization support is limited to Linux systems. The project's license is not clearly stated, which may impact commercial adoption.

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days