Python bindings for llama.cpp, enabling local LLM inference
Top 5.4% on sourcepulse
This repository provides Python bindings for the llama.cpp
library, enabling efficient local execution of large language models. It targets developers and researchers who need to integrate LLMs into Python applications, offering both low-level C API access and a high-level, OpenAI-compatible API for ease of use and migration.
How It Works
The package leverages ctypes
to interface with the underlying C/C++ implementation of llama.cpp
. This allows for direct access to the core functionalities, including model loading, tokenization, and inference. The high-level API abstracts these details, providing a familiar interface for text and chat completion, function calling, and multi-modal capabilities, while also supporting features like JSON mode and speculative decoding.
Quick Start & Requirements
pip install llama-cpp-python
CMAKE_ARGS
environment variable or --config-settings
during installation. Pre-built wheels for CPU and CUDA are available.Highlighted Details
pip install 'llama-cpp-python[server]'
).ctypes
bindings for direct C API interaction.Maintenance & Community
pytest
and a Makefile
.Licensing & Compatibility
Limitations & Caveats
The README emphasizes building from source for optimal performance, suggesting that pre-built binaries might disable system-specific compiler optimizations. Compatibility with specific hardware and CUDA versions for pre-built wheels is detailed.
2 weeks ago
1 day