llama-cpp-agent by Maximilian-Winter

Framework for LLM interaction, function calls, and structured output

Created 2 years ago

609 stars

Top 53.9% on SourcePulse

View on GitHub

2 Experts Love This Project

Travis Fischer

Founder of Agentic

Georgi Gerganov

Author of llama.cpp, whisper.cpp

Project Summary

This framework simplifies interaction with Large Language Models (LLMs) for developers and researchers. It enables structured output generation, function calling, and retrieval-augmented generation (RAG) even with models not specifically fine-tuned for these tasks, leveraging guided sampling via grammars and JSON schema.

How It Works

The core innovation is guided sampling, which uses grammars and JSON schema to constrain LLM output to desired structures. This allows models to perform tasks like function calling and structured data generation without explicit fine-tuning. The framework supports multiple LLM backends, including llama.cpp server, llama-cpp-python, TGI, and vllm, offering flexibility in deployment.

Quick Start & Requirements

Install via pip: pip install llama-cpp-agent
Optional RAG dependencies: pip install llama-cpp-agent[rag]
Compatible with various LLM servers (llama.cpp, TGI, vllm).
Documentation: https://maximilian-winter.github.io/llama-cpp-agent/
Getting Started: https://maximilian-winter.github.io/llama-cpp-agent/getting_started/

Highlighted Details

Supports single and parallel function calling.
Integrates RAG with optional colbert reranking.
Enables agentic chains (Conversational, Sequential, Mapping).
Offers multiple message formatting presets (Mistral, ChatML, Llama 3, etc.) and custom formatter support.

Maintenance & Community

Active development with a Discord community available for support and discussion.
Contributions are welcomed via pull requests on GitHub.
Discord: https://discord.gg/N7f4w7f9

Licensing & Compatibility

Released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The framework's effectiveness with models not fine-tuned for structured output relies on the quality of guided sampling, which may vary. Compatibility with the absolute latest versions of backend LLM libraries should be verified, though the project aims for current compatibility.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days