syncode  by structuredllm

Grammar-guided LLM generation framework ensuring syntactically valid output

Created 2 years ago
293 stars

Top 90.1% on SourcePulse

GitHubView on GitHub
Project Summary

SynCode is a framework for grammar-guided Large Language Model (LLM) generation, ensuring syntactically valid outputs for programming languages and structured data. It targets developers and researchers seeking to improve LLM reliability and efficiency in code generation and structured data tasks, offering up to 20% speed improvements and high accuracy.

How It Works

SynCode employs an incremental parsing approach combined with a pre-computed DFA mask store. It processes partial code to identify valid next tokens (accept sequences) and uses a DFA mask store, derived from grammar rules, to efficiently filter the LLM's output distribution. This method guarantees syntactic correctness according to a Context-Free Grammar (CFG) and supports complex language features like Python's indentation.

Quick Start & Requirements

  • Install via pip: pip install syncode
  • Requires HuggingFace transformers v4.51.0+ and Python 3.6-3.12. Python 3.13 is not supported.
  • Usage involves initializing SyncodeLogitsProcessor or the Syncode class with a model and optional grammar.
  • See notebooks for examples.

Highlighted Details

  • Achieves 99% accuracy in JSON generation with Gemma-2b.
  • Supports seamless integration with any HuggingFace model.
  • Offers built-in CFGs for Python, Go, SQL, Math, JSON, and more, with custom EBNF grammar support.
  • Compatible with standard decoding strategies like greedy search and nucleus sampling.

Maintenance & Community

  • Project initiated by Shubham Ugare, Tarun Suresh, Hangoo Kang, Sasa Misailovic, and Gagandeep Singh.
  • Contact information provided for Shubham Ugare.
  • No explicit community links (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

  • Licensed under GPL.
  • The GPL license may impose copyleft restrictions, potentially requiring derivative works to also be open-sourced if linked. Compatibility with closed-source commercial applications should be carefully reviewed.

Limitations & Caveats

  • Python 3.13 is not supported due to dependency constraints.
  • The GPL license may restrict commercial use in closed-source projects.
  • While supporting general-purpose languages, the README notes potential complexities with non-context-free fragments like Python indentation, which SynCode aims to handle.
Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
10
Issues (30d)
2
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
3 more.

prompt-lookup-decoding by apoorvumang

0.2%
566
Decoding method for faster LLM generation
Created 1 year ago
Updated 1 year ago
Starred by Kaichao You Kaichao You(Core Maintainer of vLLM), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

lm-format-enforcer by noamgat

0.6%
2k
Format enforcer for language model outputs (JSON, regex, etc.)
Created 2 years ago
Updated 3 weeks ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), and
41 more.

guidance by guidance-ai

0.1%
21k
Guidance is a programming paradigm for steering LLMs
Created 2 years ago
Updated 1 day ago
Feedback? Help us improve.