llama-cpp-python  by abetlen

Python bindings for llama.cpp, enabling local LLM inference

Created 2 years ago
9,577 stars

Top 5.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides Python bindings for the llama.cpp library, enabling efficient local execution of large language models. It targets developers and researchers who need to integrate LLMs into Python applications, offering both low-level C API access and a high-level, OpenAI-compatible API for ease of use and migration.

How It Works

The package leverages ctypes to interface with the underlying C/C++ implementation of llama.cpp. This allows for direct access to the core functionalities, including model loading, tokenization, and inference. The high-level API abstracts these details, providing a familiar interface for text and chat completion, function calling, and multi-modal capabilities, while also supporting features like JSON mode and speculative decoding.

Quick Start & Requirements

  • Install: pip install llama-cpp-python
  • Build Configuration: Hardware acceleration (OpenBLAS, CUDA, Metal, ROCm, Vulkan, SYCL) can be enabled via CMAKE_ARGS environment variable or --config-settings during installation. Pre-built wheels for CPU and CUDA are available.
  • Prerequisites: Python 3.8+, C compiler (GCC/Clang on Linux/macOS, Visual Studio/MinGW on Windows). CUDA 12.1+ required for CUDA wheels. macOS 11.0+ for Metal wheels.
  • Documentation: https://llama-cpp-python.readthedocs.io/

Highlighted Details

  • OpenAI-compatible API for text and chat completion.
  • Supports function calling, JSON mode, and multi-modal models (e.g., Llava).
  • Includes an OpenAI-compatible web server (pip install 'llama-cpp-python[server]').
  • Offers low-level ctypes bindings for direct C API interaction.

Maintenance & Community

  • Actively developed byabetlen and contributors.
  • Development workflow includes pytest and a Makefile.
  • Community support channels are not explicitly mentioned in the README.

Licensing & Compatibility

  • License: MIT
  • Compatibility: Permissive MIT license allows for commercial use and integration into closed-source applications.

Limitations & Caveats

The README emphasizes building from source for optimal performance, suggesting that pre-built binaries might disable system-specific compiler optimizations. Compatibility with specific hardware and CUDA versions for pre-built wheels is detailed.

Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
4
Issues (30d)
8
Star History
100 stars in the last 30 days

Explore Similar Projects

Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google).

python-openai-demos by pamelafox

0.5%
381
Python scripts for OpenAI API demos
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.