llguidance  by guidance-ai

Fast constrained decoding for LLMs

Created 1 year ago
479 stars

Top 63.9% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This library implements constrained decoding for Large Language Models (LLMs), enabling the enforcement of arbitrary context-free grammars on model outputs. It targets developers building LLM applications requiring structured, predictable outputs, offering significant speed improvements over other methods.

How It Works

llguidance computes token masks on-the-fly using a combination of Earley's algorithm for context-free grammars and a lexer based on regular expression derivatives. This approach allows for dynamic mask generation without significant startup costs, unlike methods that pre-compute all possible states. The library leverages optimized prefix tree traversal for efficient mask computation, achieving speeds of approximately 50μs per token.

Quick Start & Requirements

  • Install Rust 1.75+ for the core library.
  • For Python bindings: install Python 3.9+, run ./scripts/install-deps.sh to build, and ./scripts/test-guidance.sh to build and test.
  • Integrations available for Guidance, llama.cpp, Chromium, SGLang, vLLM, LLGTRT, mistral.rs, and onnxruntime-genai.

Highlighted Details

  • Achieves ~50μs CPU time per token (128k tokenizer) with negligible startup costs.
  • Supports JSON schemas, regular expressions, and Lark-like grammars.
  • Integrations include major LLM inference frameworks like llama.cpp, vLLM, and SGLang.
  • Performance benchmarks show significant speed advantages over LM-format-enforcer, llama.cpp grammars, Outlines, and XGrammar.

Maintenance & Community

  • Contributions are welcomed, requiring agreement with a Microsoft CLA.
  • Adheres to the Microsoft Open Source Code of Conduct.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The internal llguidance JSON-based format is being deprecated in favor of the Lark-like format, though the internal format is currently more powerful. The README does not specify a license, which may impact commercial adoption.

Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
5
Star History
51 stars in the last 30 days

Explore Similar Projects

Starred by Cody Yu Cody Yu(Coauthor of vLLM; MTS at OpenAI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

Consistency_LLM by hao-ai-lab

0.3%
404
Parallel decoder for efficient LLM inference
Created 1 year ago
Updated 10 months ago
Feedback? Help us improve.