syncode  by structuredllm

Grammar-guided LLM generation framework ensuring syntactically valid output

created 1 year ago
282 stars

Top 93.5% on sourcepulse

GitHubView on GitHub
Project Summary

SynCode is a framework for grammar-guided Large Language Model (LLM) generation, ensuring syntactically valid outputs for programming languages and structured data. It targets developers and researchers seeking to improve LLM reliability and efficiency in code generation and structured data tasks, offering up to 20% speed improvements and high accuracy.

How It Works

SynCode employs an incremental parsing approach combined with a pre-computed DFA mask store. It processes partial code to identify valid next tokens (accept sequences) and uses a DFA mask store, derived from grammar rules, to efficiently filter the LLM's output distribution. This method guarantees syntactic correctness according to a Context-Free Grammar (CFG) and supports complex language features like Python's indentation.

Quick Start & Requirements

  • Install via pip: pip install syncode
  • Requires HuggingFace transformers v4.51.0+ and Python 3.6-3.12. Python 3.13 is not supported.
  • Usage involves initializing SyncodeLogitsProcessor or the Syncode class with a model and optional grammar.
  • See notebooks for examples.

Highlighted Details

  • Achieves 99% accuracy in JSON generation with Gemma-2b.
  • Supports seamless integration with any HuggingFace model.
  • Offers built-in CFGs for Python, Go, SQL, Math, JSON, and more, with custom EBNF grammar support.
  • Compatible with standard decoding strategies like greedy search and nucleus sampling.

Maintenance & Community

  • Project initiated by Shubham Ugare, Tarun Suresh, Hangoo Kang, Sasa Misailovic, and Gagandeep Singh.
  • Contact information provided for Shubham Ugare.
  • No explicit community links (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

  • Licensed under GPL.
  • The GPL license may impose copyleft restrictions, potentially requiring derivative works to also be open-sourced if linked. Compatibility with closed-source commercial applications should be carefully reviewed.

Limitations & Caveats

  • Python 3.13 is not supported due to dependency constraints.
  • The GPL license may restrict commercial use in closed-source projects.
  • While supporting general-purpose languages, the README notes potential complexities with non-context-free fragments like Python indentation, which SynCode aims to handle.
Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
8
Issues (30d)
2
Star History
22 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Andreas Jansson Andreas Jansson(Cofounder of Replicate), and
1 more.

lm-format-enforcer by noamgat

0.2%
2k
Format enforcer for language model outputs (JSON, regex, etc.)
created 1 year ago
updated 5 months ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
21 more.

guidance by guidance-ai

0.1%
21k
Guidance is a programming paradigm for steering LLMs
created 2 years ago
updated 1 day ago
Feedback? Help us improve.