llguidance  by guidance-ai

Fast constrained decoding for LLMs

created 1 year ago
347 stars

Top 81.1% on sourcepulse

GitHubView on GitHub
Project Summary

This library implements constrained decoding for Large Language Models (LLMs), enabling the enforcement of arbitrary context-free grammars on model outputs. It targets developers building LLM applications requiring structured, predictable outputs, offering significant speed improvements over other methods.

How It Works

llguidance computes token masks on-the-fly using a combination of Earley's algorithm for context-free grammars and a lexer based on regular expression derivatives. This approach allows for dynamic mask generation without significant startup costs, unlike methods that pre-compute all possible states. The library leverages optimized prefix tree traversal for efficient mask computation, achieving speeds of approximately 50μs per token.

Quick Start & Requirements

  • Install Rust 1.75+ for the core library.
  • For Python bindings: install Python 3.9+, run ./scripts/install-deps.sh to build, and ./scripts/test-guidance.sh to build and test.
  • Integrations available for Guidance, llama.cpp, Chromium, SGLang, vLLM, LLGTRT, mistral.rs, and onnxruntime-genai.

Highlighted Details

  • Achieves ~50μs CPU time per token (128k tokenizer) with negligible startup costs.
  • Supports JSON schemas, regular expressions, and Lark-like grammars.
  • Integrations include major LLM inference frameworks like llama.cpp, vLLM, and SGLang.
  • Performance benchmarks show significant speed advantages over LM-format-enforcer, llama.cpp grammars, Outlines, and XGrammar.

Maintenance & Community

  • Contributions are welcomed, requiring agreement with a Microsoft CLA.
  • Adheres to the Microsoft Open Source Code of Conduct.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The internal llguidance JSON-based format is being deprecated in favor of the Lark-like format, though the internal format is currently more powerful. The README does not specify a license, which may impact commercial adoption.

Health Check
Last commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
16
Issues (30d)
15
Star History
140 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Andreas Jansson Andreas Jansson(Cofounder of Replicate), and
1 more.

lm-format-enforcer by noamgat

0.2%
2k
Format enforcer for language model outputs (JSON, regex, etc.)
created 1 year ago
updated 5 months ago
Feedback? Help us improve.