llama2-webui  by liltom-eth

Web UI for local Llama 2 inference

created 2 years ago
1,959 stars

Top 22.9% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a Gradio-based web UI for running Llama 2 models locally on various hardware, including CPU and GPU across Linux, Windows, and macOS. It aims to simplify the deployment and interaction with Llama 2 variants, offering an OpenAI-compatible API and serving as a backend for generative AI applications.

How It Works

The project supports multiple backends for inference: transformers (with bitsandbytes for 8-bit quantization), AutoGPTQ (for 4-bit quantization), and llama.cpp (for GGML/GGUF formats). This flexibility allows users to choose the best trade-off between performance, VRAM usage, and model precision based on their hardware. The llama2-wrapper library abstracts these backends, providing a unified interface for model loading, inference, and API serving.

Quick Start & Requirements

  • Install via PyPI: pip install llama2-wrapper
  • Install from source: git clone https://github.com/liltom-eth/llama2-webui.git && cd llama2-webui && pip install -r requirements.txt
  • Requires Python. Specific bitsandbytes versions may be needed for older NVIDIA GPUs or Windows.
  • For Mac Metal acceleration, specific llama-cpp-python installation is required.
  • Official Docs: https://github.com/liltom-eth/llama2-webui

Highlighted Details

  • Supports Llama 2 (7B, 13B, 70B) and Code Llama models in various formats (GPTQ, GGML, GGUF).
  • Offers 8-bit and 4-bit quantization for reduced VRAM usage.
  • Provides an OpenAI-compatible API endpoint.
  • Includes benchmark scripts for performance evaluation across different hardware.

Maintenance & Community

  • Active development indicated by recent updates and community contributions.
  • Links to Hugging Face repositories for models and underlying libraries are provided.

Licensing & Compatibility

  • MIT License: Permissive, allowing for commercial use and integration into closed-source projects without restrictions.

Limitations & Caveats

  • bitsandbytes version compatibility can be an issue on older NVIDIA GPUs, potentially requiring downgrades.
  • Specific installation steps are needed for Windows bitsandbytes and Mac Metal acceleration.
  • Downloading Llama 2 models requires prior access approval from Meta.
Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 20 hours ago
Feedback? Help us improve.