llguidance by guidance-ai

Fast constrained decoding for LLMs

Created 1 year ago

652 stars

Top 51.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Bryan Helmig

Cofounder of Zapier

Project Summary

This library implements constrained decoding for Large Language Models (LLMs), enabling the enforcement of arbitrary context-free grammars on model outputs. It targets developers building LLM applications requiring structured, predictable outputs, offering significant speed improvements over other methods.

How It Works

llguidance computes token masks on-the-fly using a combination of Earley's algorithm for context-free grammars and a lexer based on regular expression derivatives. This approach allows for dynamic mask generation without significant startup costs, unlike methods that pre-compute all possible states. The library leverages optimized prefix tree traversal for efficient mask computation, achieving speeds of approximately 50μs per token.

Quick Start & Requirements

Install Rust 1.75+ for the core library.
For Python bindings: install Python 3.9+, run ./scripts/install-deps.sh to build, and ./scripts/test-guidance.sh to build and test.
Integrations available for Guidance, llama.cpp, Chromium, SGLang, vLLM, LLGTRT, mistral.rs, and onnxruntime-genai.

Highlighted Details

Achieves ~50μs CPU time per token (128k tokenizer) with negligible startup costs.
Supports JSON schemas, regular expressions, and Lark-like grammars.
Integrations include major LLM inference frameworks like llama.cpp, vLLM, and SGLang.
Performance benchmarks show significant speed advantages over LM-format-enforcer, llama.cpp grammars, Outlines, and XGrammar.

Maintenance & Community

Contributions are welcomed, requiring agreement with a Microsoft CLA.
Adheres to the Microsoft Open Source Code of Conduct.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The internal llguidance JSON-based format is being deprecated in favor of the Lark-like format, though the internal format is currently more powerful. The README does not specify a license, which may impact commercial adoption.

llguidance by guidance-ai

Explore Similar Projects

flash-tokenizer by NLPOptimize

Awesome-LLM-Constrained-Decoding by Saibo-creator

Consistency_LLM by hao-ai-lab

EXAONE-Deep by LG-AI-EXAONE

Yi-Coder by 01-ai

syncode by structuredllm

CodeTF by salesforce

xgrammar by mlc-ai

codeshell by WisdomShell

BlingFire by microsoft

Baichuan-7B by baichuan-inc

Qwen3-Coder by QwenLM