llama-gpt by getumbrel

Self-hosted chatbot app for local, private LLM inference

Created 2 years ago

10,991 stars

Top 4.6% on SourcePulse

View on GitHub

6 Experts Love This Project

Georgi Gerganov

Author of llama.cpp, whisper.cpp

and 2 more!

Project Summary

This project provides a self-hosted, offline, ChatGPT-like chatbot experience powered by Llama 2 and Code Llama models. It targets users seeking a private, local AI assistant that keeps all data on their device, with support for various hardware configurations including Macs and Docker-enabled systems.

How It Works

LlamaGPT leverages llama.cpp and its Python bindings to run quantized Llama models (GGML and GGUF formats) efficiently on local hardware. It offers a Docker-based deployment for broad compatibility and an OpenAI-compatible API endpoint, allowing integration with other applications. The architecture prioritizes privacy by ensuring no data leaves the user's machine.

Quick Start & Requirements

UmbrelOS: Install directly from the Umbrel App Store.
Mac (M1/M2): git clone https://github.com/getumbrel/llama-gpt.git && cd llama-gpt && ./run-mac.sh --model <model_name> (e.g., 7b, code-7b). Requires Docker and Xcode.
Docker (Any System): git clone https://github.com/getumbrel/llama-gpt.git && cd llama-gpt && ./run.sh --model <model_name> (add --with-cuda for NVIDIA GPUs). Requires Docker.
Kubernetes: kubectl apply -k deploy/kubernetes/. -n llama.
Models: Downloaded automatically to /models on first run.
Access: http://localhost:3000 (UI), http://localhost:3001 (API).
Docs: https://github.com/getumbrel/llama-gpt

Highlighted Details

Supports Llama 2 (7B, 13B, 70B) and Code Llama (7B, 13B, 34B) models in GGML/GGUF formats.
Achieves up to 54 tokens/sec on an M1 Max MacBook Pro (7B model).
Provides an OpenAI-compatible API endpoint for integration.
Includes CUDA support for NVIDIA GPUs.

Maintenance & Community

The project is actively developed by Umbrel. Key features like Code Llama and CUDA support have been recently added. A roadmap is available, with custom model loading and model switching as future priorities.

Licensing & Compatibility

The project appears to be MIT licensed, based on the repository's overall structure and common Umbrel project practices. Llama 2 and Code Llama models are released under permissive licenses by Meta, allowing for commercial use and integration into closed-source applications.

Limitations & Caveats

Custom model loading and the ability to switch between models are not yet implemented. Performance varies significantly based on hardware, with lower-end devices showing considerably slower generation speeds.

Health Check

Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

15 stars in the last 30 days