API wrapper for local LLM inference, emulating OpenAI's GPT endpoints
Top 55.3% on sourcepulse
This project provides a local API server that emulates OpenAI's GPT endpoints, allowing GPT-powered applications to run with local llama.cpp models. It targets developers and users seeking cost savings, enhanced privacy, and offline capabilities for their AI applications.
How It Works
gpt-llama.cpp acts as a middleware, routing requests intended for OpenAI's GPT APIs to a local instance of llama.cpp. This approach enables seamless integration with existing GPT-based applications by presenting a familiar API interface. It leverages llama.cpp's efficient C++ implementation for local model inference, supporting various quantization levels and model architectures.
Quick Start & Requirements
cd gpt-llama.cpp
, npm install
.llama.cpp
installation. Follow the llama.cpp
README for setup on macOS (ARM/Intel) or Windows. Python dependencies are installed via pip install -r requirements.txt
within the llama.cpp
directory.npm start
. Advanced configurations can be passed as arguments (e.g., PORT=8000 npm start mlock threads 8
)..bin
format).Highlighted Details
llama.cpp
improvements.chatbot-ui
, Auto-GPT
, langchain
, DiscGPT
, and ChatGPT-Siri
.EMBEDDINGS=py
).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The authentication token for API requests must be set to the absolute path of the local llama model file. The test-installation.sh
script is currently only supported on Mac.
2 years ago
1 day