llama-gpt  by getumbrel

Self-hosted chatbot app for local, private LLM inference

Created 2 years ago
11,002 stars

Top 4.6% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a self-hosted, offline, ChatGPT-like chatbot experience powered by Llama 2 and Code Llama models. It targets users seeking a private, local AI assistant that keeps all data on their device, with support for various hardware configurations including Macs and Docker-enabled systems.

How It Works

LlamaGPT leverages llama.cpp and its Python bindings to run quantized Llama models (GGML and GGUF formats) efficiently on local hardware. It offers a Docker-based deployment for broad compatibility and an OpenAI-compatible API endpoint, allowing integration with other applications. The architecture prioritizes privacy by ensuring no data leaves the user's machine.

Quick Start & Requirements

  • UmbrelOS: Install directly from the Umbrel App Store.
  • Mac (M1/M2): git clone https://github.com/getumbrel/llama-gpt.git && cd llama-gpt && ./run-mac.sh --model <model_name> (e.g., 7b, code-7b). Requires Docker and Xcode.
  • Docker (Any System): git clone https://github.com/getumbrel/llama-gpt.git && cd llama-gpt && ./run.sh --model <model_name> (add --with-cuda for NVIDIA GPUs). Requires Docker.
  • Kubernetes: kubectl apply -k deploy/kubernetes/. -n llama.
  • Models: Downloaded automatically to /models on first run.
  • Access: http://localhost:3000 (UI), http://localhost:3001 (API).
  • Docs: https://github.com/getumbrel/llama-gpt

Highlighted Details

  • Supports Llama 2 (7B, 13B, 70B) and Code Llama (7B, 13B, 34B) models in GGML/GGUF formats.
  • Achieves up to 54 tokens/sec on an M1 Max MacBook Pro (7B model).
  • Provides an OpenAI-compatible API endpoint for integration.
  • Includes CUDA support for NVIDIA GPUs.

Maintenance & Community

The project is actively developed by Umbrel. Key features like Code Llama and CUDA support have been recently added. A roadmap is available, with custom model loading and model switching as future priorities.

Licensing & Compatibility

The project appears to be MIT licensed, based on the repository's overall structure and common Umbrel project practices. Llama 2 and Code Llama models are released under permissive licenses by Meta, allowing for commercial use and integration into closed-source applications.

Limitations & Caveats

Custom model loading and the ability to switch between models are not yet implemented. Performance varies significantly based on hardware, with lower-end devices showing considerably slower generation speeds.

Health Check
Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

JittorLLMs by Jittor

0.0%
2k
Low-resource LLM inference library
Created 2 years ago
Updated 6 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Gabriel Almeida Gabriel Almeida(Cofounder of Langflow), and
2 more.

torchchat by pytorch

0.1%
4k
PyTorch-native SDK for local LLM inference across diverse platforms
Created 1 year ago
Updated 1 week ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Anil Dash Anil Dash(Former CEO of Glitch), and
23 more.

llamafile by Mozilla-Ocho

0.1%
23k
Single-file LLM distribution and runtime via `llama.cpp` and Cosmopolitan Libc
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.