Format enforcer for language model outputs (JSON, regex, etc.)
Top 23.8% on sourcepulse
This library enforces structured output formats (JSON Schema, Regex) for language models, targeting developers and researchers needing reliable, predictable LLM responses. It enhances LLM usability by filtering token generation at each step, ensuring adherence to specified formats while minimizing constraints on the model's creative freedom.
How It Works
The core mechanism combines a character-level parser with a tokenizer's prefix tree. The character-level parser defines the valid sequence of characters for a given format (e.g., JSON Schema, Regex). Simultaneously, a prefix tree represents all possible token sequences the LLM can generate. By intersecting these two structures, the library identifies valid tokens that advance both the format parsing and the LLM's generation, effectively filtering out invalid token choices at each generation step. This approach allows the LLM to control whitespace and field ordering, potentially improving output quality.
Quick Start & Requirements
pip install lm-format-enforcer
transformers
, torch
, huggingface_hub
, optimum
, and auto-gptq
(with CUDA 11.8 specified).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
interegular
.5 months ago
1 day