llama-gpt  by getumbrel

Self-hosted chatbot app for local, private LLM inference

created 2 years ago
10,991 stars

Top 4.7% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a self-hosted, offline, ChatGPT-like chatbot experience powered by Llama 2 and Code Llama models. It targets users seeking a private, local AI assistant that keeps all data on their device, with support for various hardware configurations including Macs and Docker-enabled systems.

How It Works

LlamaGPT leverages llama.cpp and its Python bindings to run quantized Llama models (GGML and GGUF formats) efficiently on local hardware. It offers a Docker-based deployment for broad compatibility and an OpenAI-compatible API endpoint, allowing integration with other applications. The architecture prioritizes privacy by ensuring no data leaves the user's machine.

Quick Start & Requirements

  • UmbrelOS: Install directly from the Umbrel App Store.
  • Mac (M1/M2): git clone https://github.com/getumbrel/llama-gpt.git && cd llama-gpt && ./run-mac.sh --model <model_name> (e.g., 7b, code-7b). Requires Docker and Xcode.
  • Docker (Any System): git clone https://github.com/getumbrel/llama-gpt.git && cd llama-gpt && ./run.sh --model <model_name> (add --with-cuda for NVIDIA GPUs). Requires Docker.
  • Kubernetes: kubectl apply -k deploy/kubernetes/. -n llama.
  • Models: Downloaded automatically to /models on first run.
  • Access: http://localhost:3000 (UI), http://localhost:3001 (API).
  • Docs: https://github.com/getumbrel/llama-gpt

Highlighted Details

  • Supports Llama 2 (7B, 13B, 70B) and Code Llama (7B, 13B, 34B) models in GGML/GGUF formats.
  • Achieves up to 54 tokens/sec on an M1 Max MacBook Pro (7B model).
  • Provides an OpenAI-compatible API endpoint for integration.
  • Includes CUDA support for NVIDIA GPUs.

Maintenance & Community

The project is actively developed by Umbrel. Key features like Code Llama and CUDA support have been recently added. A roadmap is available, with custom model loading and model switching as future priorities.

Licensing & Compatibility

The project appears to be MIT licensed, based on the repository's overall structure and common Umbrel project practices. Llama 2 and Code Llama models are released under permissive licenses by Meta, allowing for commercial use and integration into closed-source applications.

Limitations & Caveats

Custom model loading and the ability to switch between models are not yet implemented. Performance varies significantly based on hardware, with lower-end devices showing considerably slower generation speeds.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
74 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.