llama2-webui by liltom-eth

Web UI for local Llama 2 inference

Created 2 years ago

1,951 stars

Top 22.3% on SourcePulse

View on GitHub

2 Experts Love This Project

Marc Klingen

Cofounder of Langfuse

Abubakar Abid

Cofounder of Gradio

Project Summary

This project provides a Gradio-based web UI for running Llama 2 models locally on various hardware, including CPU and GPU across Linux, Windows, and macOS. It aims to simplify the deployment and interaction with Llama 2 variants, offering an OpenAI-compatible API and serving as a backend for generative AI applications.

How It Works

The project supports multiple backends for inference: transformers (with bitsandbytes for 8-bit quantization), AutoGPTQ (for 4-bit quantization), and llama.cpp (for GGML/GGUF formats). This flexibility allows users to choose the best trade-off between performance, VRAM usage, and model precision based on their hardware. The llama2-wrapper library abstracts these backends, providing a unified interface for model loading, inference, and API serving.

Quick Start & Requirements

Install via PyPI: pip install llama2-wrapper
Install from source: git clone https://github.com/liltom-eth/llama2-webui.git && cd llama2-webui && pip install -r requirements.txt
Requires Python. Specific bitsandbytes versions may be needed for older NVIDIA GPUs or Windows.
For Mac Metal acceleration, specific llama-cpp-python installation is required.
Official Docs: https://github.com/liltom-eth/llama2-webui

Highlighted Details

Supports Llama 2 (7B, 13B, 70B) and Code Llama models in various formats (GPTQ, GGML, GGUF).
Offers 8-bit and 4-bit quantization for reduced VRAM usage.
Provides an OpenAI-compatible API endpoint.
Includes benchmark scripts for performance evaluation across different hardware.

Maintenance & Community

Active development indicated by recent updates and community contributions.
Links to Hugging Face repositories for models and underlying libraries are provided.

Licensing & Compatibility

MIT License: Permissive, allowing for commercial use and integration into closed-source projects without restrictions.

Limitations & Caveats

bitsandbytes version compatibility can be an issue on older NVIDIA GPUs, potentially requiring downgrades.
Specific installation steps are needed for Windows bitsandbytes and Mac Metal acceleration.
Downloading Llama 2 models requires prior access approval from Meta.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days