Constrained decoding for LLMs against JSON schema
Top 84.2% on sourcepulse
This repository provides "Clownfish," a method for constrained decoding in Large Language Models (LLMs) to enforce adherence to JSON schemas. It's designed for developers and researchers building LLM-powered applications that require reliable, structured output, such as generating API calls, SQL queries, or configuration files, thereby preventing hallucinations and ensuring data integrity.
How It Works
Clownfish implements "ControLogits," a novel approach that modifies the LLM's token generation process. It integrates custom "LogitProcessors" that, for each token prediction, evaluate candidate tokens against a streaming JSON schema parser. Invalid tokens are zeroed out, forcing the LLM to select only valid continuations according to the schema. This method leverages immutable data structures for efficient backtracking and schema validation, offering a more robust alternative to fine-tuning or prompt engineering alone.
Quick Start & Requirements
pip install -e .
(from source).create
or create_api
functions.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README notes that OpenAI models do not expose all logits, limiting direct application to GPT-3.5/4 without workarounds. The current implementation might be inefficient, especially with minimal prompting, as the model may explore a large state space to find valid tokens.
2 years ago
1 day