forge by antoinezambelli

Self-hosted LLM agentic workflows and tool-calling framework

Created 4 months ago

2,165 stars

Top 20.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Joe Walnes

Head of Experimental Projects at Stripe

Project Summary

A Python framework for self-hosted LLM tool-calling and multi-step agentic workflows, Forge enhances the reliability and performance of local models. It targets engineers and researchers building or integrating self-hosted LLM agents, enabling them to achieve top-tier results on complex tasks through robust guardrails and intelligent context management.

How It Works

Forge acts as a reliability layer for self-hosted LLMs, employing guardrails such as rescue parsing, retry nudges, and step enforcement, alongside VRAM-aware context management with tiered compaction. This approach significantly boosts the capabilities of local models on multi-step agentic workflows. Users can leverage Forge via three main patterns: WorkflowRunner for defining and executing structured agent loops; SlotWorker for managing priority-queued inference slots in multi-agent architectures; and Guardrails middleware for integrating Forge's validation and error-handling stack into custom orchestration loops. A notable feature is its OpenAI-compatible proxy server, which transparently applies Forge's guardrails to local model interactions.

Quick Start & Requirements

Installation: Core: pip install forge-guardrails. With Anthropic client: pip install "forge-guardrails[anthropic]". For development: git clone https://github.com/antoinezambelli/forge.git, cd forge, pip install -e ".[dev]".
Prerequisites: Python 3.12+, a running LLM backend (Ollama, llama-server, Llamafile, or Anthropic API). llama-server is recommended for optimal performance.
Documentation: User Guide, Model Guide, Backend Setup, Eval Guide.

Highlighted Details

Achieves 86.5% on its 26-scenario eval suite with a Ministral-3 8B Instruct Q8 model on llama-server, positioning it as a top performer among self-hosted configurations.
Provides an OpenAI-compatible proxy server that transparently adds guardrails to local LLM interactions for clients like opencode, Continue, and aider.
The proxy automatically injects a synthetic respond tool to guide smaller models towards reliable tool-calling, essential for models that struggle with mixed text/tool output.

Maintenance & Community

No specific details on maintainers, community channels (e.g., Discord, Slack), or roadmap were found in the provided README.

Licensing & Compatibility

Licensed under the MIT License, Forge is permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The effectiveness of Forge is contingent on the chosen LLM backend and model configuration. Smaller models may still require explicit guidance, such as the synthetic respond tool, to reliably execute tool calls. The evaluation harness primarily focuses on tool-calling and multi-step reasoning capabilities.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

116 stars in the last 30 days