Self-hosted chatbot app for local, private LLM inference
Top 4.7% on sourcepulse
This project provides a self-hosted, offline, ChatGPT-like chatbot experience powered by Llama 2 and Code Llama models. It targets users seeking a private, local AI assistant that keeps all data on their device, with support for various hardware configurations including Macs and Docker-enabled systems.
How It Works
LlamaGPT leverages llama.cpp
and its Python bindings to run quantized Llama models (GGML and GGUF formats) efficiently on local hardware. It offers a Docker-based deployment for broad compatibility and an OpenAI-compatible API endpoint, allowing integration with other applications. The architecture prioritizes privacy by ensuring no data leaves the user's machine.
Quick Start & Requirements
git clone https://github.com/getumbrel/llama-gpt.git && cd llama-gpt && ./run-mac.sh --model <model_name>
(e.g., 7b
, code-7b
). Requires Docker and Xcode.git clone https://github.com/getumbrel/llama-gpt.git && cd llama-gpt && ./run.sh --model <model_name>
(add --with-cuda
for NVIDIA GPUs). Requires Docker.kubectl apply -k deploy/kubernetes/. -n llama
./models
on first run.http://localhost:3000
(UI), http://localhost:3001
(API).Highlighted Details
Maintenance & Community
The project is actively developed by Umbrel. Key features like Code Llama and CUDA support have been recently added. A roadmap is available, with custom model loading and model switching as future priorities.
Licensing & Compatibility
The project appears to be MIT licensed, based on the repository's overall structure and common Umbrel project practices. Llama 2 and Code Llama models are released under permissive licenses by Meta, allowing for commercial use and integration into closed-source applications.
Limitations & Caveats
Custom model loading and the ability to switch between models are not yet implemented. Performance varies significantly based on hardware, with lower-end devices showing considerably slower generation speeds.
1 year ago
1 day