llama-cpp-python  by abetlen

Python bindings for llama.cpp, enabling local LLM inference

created 2 years ago
9,398 stars

Top 5.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides Python bindings for the llama.cpp library, enabling efficient local execution of large language models. It targets developers and researchers who need to integrate LLMs into Python applications, offering both low-level C API access and a high-level, OpenAI-compatible API for ease of use and migration.

How It Works

The package leverages ctypes to interface with the underlying C/C++ implementation of llama.cpp. This allows for direct access to the core functionalities, including model loading, tokenization, and inference. The high-level API abstracts these details, providing a familiar interface for text and chat completion, function calling, and multi-modal capabilities, while also supporting features like JSON mode and speculative decoding.

Quick Start & Requirements

  • Install: pip install llama-cpp-python
  • Build Configuration: Hardware acceleration (OpenBLAS, CUDA, Metal, ROCm, Vulkan, SYCL) can be enabled via CMAKE_ARGS environment variable or --config-settings during installation. Pre-built wheels for CPU and CUDA are available.
  • Prerequisites: Python 3.8+, C compiler (GCC/Clang on Linux/macOS, Visual Studio/MinGW on Windows). CUDA 12.1+ required for CUDA wheels. macOS 11.0+ for Metal wheels.
  • Documentation: https://llama-cpp-python.readthedocs.io/

Highlighted Details

  • OpenAI-compatible API for text and chat completion.
  • Supports function calling, JSON mode, and multi-modal models (e.g., Llava).
  • Includes an OpenAI-compatible web server (pip install 'llama-cpp-python[server]').
  • Offers low-level ctypes bindings for direct C API interaction.

Maintenance & Community

  • Actively developed byabetlen and contributors.
  • Development workflow includes pytest and a Makefile.
  • Community support channels are not explicitly mentioned in the README.

Licensing & Compatibility

  • License: MIT
  • Compatibility: Permissive MIT license allows for commercial use and integration into closed-source applications.

Limitations & Caveats

The README emphasizes building from source for optimal performance, suggesting that pre-built binaries might disable system-specific compiler optimizations. Compatibility with specific hardware and CUDA versions for pre-built wheels is detailed.

Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
8
Issues (30d)
4
Star History
408 stars in the last 90 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), Nathan Lambert Nathan Lambert(AI Researcher at AI2), and
1 more.

unified-io-2 by allenai

0.3%
619
Unified-IO 2 code for training, inference, and demo
created 1 year ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Shawn Wang Shawn Wang(Editor of Latent Space), and
8 more.

llm by rustformers

0%
6k
Rust ecosystem for LLM Rust inference (unmaintained)
created 2 years ago
updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 14 hours ago
Feedback? Help us improve.