Fast constrained decoding for LLMs
Top 81.1% on sourcepulse
This library implements constrained decoding for Large Language Models (LLMs), enabling the enforcement of arbitrary context-free grammars on model outputs. It targets developers building LLM applications requiring structured, predictable outputs, offering significant speed improvements over other methods.
How It Works
llguidance computes token masks on-the-fly using a combination of Earley's algorithm for context-free grammars and a lexer based on regular expression derivatives. This approach allows for dynamic mask generation without significant startup costs, unlike methods that pre-compute all possible states. The library leverages optimized prefix tree traversal for efficient mask computation, achieving speeds of approximately 50μs per token.
Quick Start & Requirements
./scripts/install-deps.sh
to build, and ./scripts/test-guidance.sh
to build and test.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The internal llguidance JSON-based format is being deprecated in favor of the Lark-like format, though the internal format is currently more powerful. The README does not specify a license, which may impact commercial adoption.
3 days ago
1 day