llama-cpp-python by abetlen

Python bindings for llama.cpp, enabling local LLM inference

Created 2 years ago

9,889 stars

Top 5.1% on SourcePulse

View on GitHub

16 Experts Love This Project

Anton Troynikov

Cofounder of Chroma

Lysandre Debut

Chief Open-Source Officer at Hugging Face

Junyang Lin

Core Maintainer at Alibaba Qwen

Omar Sanseviero

DevRel at Google DeepMind

and 12 more!

Project Summary

This repository provides Python bindings for the llama.cpp library, enabling efficient local execution of large language models. It targets developers and researchers who need to integrate LLMs into Python applications, offering both low-level C API access and a high-level, OpenAI-compatible API for ease of use and migration.

How It Works

The package leverages ctypes to interface with the underlying C/C++ implementation of llama.cpp. This allows for direct access to the core functionalities, including model loading, tokenization, and inference. The high-level API abstracts these details, providing a familiar interface for text and chat completion, function calling, and multi-modal capabilities, while also supporting features like JSON mode and speculative decoding.

Quick Start & Requirements

Install: pip install llama-cpp-python
Build Configuration: Hardware acceleration (OpenBLAS, CUDA, Metal, ROCm, Vulkan, SYCL) can be enabled via CMAKE_ARGS environment variable or --config-settings during installation. Pre-built wheels for CPU and CUDA are available.
Prerequisites: Python 3.8+, C compiler (GCC/Clang on Linux/macOS, Visual Studio/MinGW on Windows). CUDA 12.1+ required for CUDA wheels. macOS 11.0+ for Metal wheels.
Documentation: https://llama-cpp-python.readthedocs.io/

Highlighted Details

OpenAI-compatible API for text and chat completion.
Supports function calling, JSON mode, and multi-modal models (e.g., Llava).
Includes an OpenAI-compatible web server (pip install 'llama-cpp-python[server]').
Offers low-level ctypes bindings for direct C API interaction.

Maintenance & Community

Actively developed byabetlen and contributors.
Development workflow includes pytest and a Makefile.
Community support channels are not explicitly mentioned in the README.

Licensing & Compatibility

License: MIT
Compatibility: Permissive MIT license allows for commercial use and integration into closed-source applications.

Limitations & Caveats

The README emphasizes building from source for optimal performance, suggesting that pre-built binaries might disable system-specific compiler optimizations. Compatibility with specific hardware and CUDA versions for pre-built wheels is detailed.

Health Check

Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

86 stars in the last 30 days