Discover and explore top open-source AI tools and projects—updated daily.
liltom-ethWeb UI for local Llama 2 inference
Top 22.4% on SourcePulse
This project provides a Gradio-based web UI for running Llama 2 models locally on various hardware, including CPU and GPU across Linux, Windows, and macOS. It aims to simplify the deployment and interaction with Llama 2 variants, offering an OpenAI-compatible API and serving as a backend for generative AI applications.
How It Works
The project supports multiple backends for inference: transformers (with bitsandbytes for 8-bit quantization), AutoGPTQ (for 4-bit quantization), and llama.cpp (for GGML/GGUF formats). This flexibility allows users to choose the best trade-off between performance, VRAM usage, and model precision based on their hardware. The llama2-wrapper library abstracts these backends, providing a unified interface for model loading, inference, and API serving.
Quick Start & Requirements
pip install llama2-wrappergit clone https://github.com/liltom-eth/llama2-webui.git && cd llama2-webui && pip install -r requirements.txtbitsandbytes versions may be needed for older NVIDIA GPUs or Windows.llama-cpp-python installation is required.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
bitsandbytes version compatibility can be an issue on older NVIDIA GPUs, potentially requiring downgrades.bitsandbytes and Mac Metal acceleration.1 year ago
Inactive
pinokiocomputer
Atome-FE
cocktailpeanut