rellm  by r2d4

Regex tool for language model completion

created 2 years ago
509 stars

Top 62.1% on sourcepulse

GitHubView on GitHub
Project Summary

ReLLM enables language models to generate text that strictly adheres to specified regular expression patterns, ensuring structured and predictable output. This is particularly useful for extracting specific data formats like JSON, XML, dates, or numbers, or for filling in template sentences. The primary audience includes developers and researchers needing programmatic control over LLM outputs for data parsing, validation, and templating.

How It Works

ReLLM operates by filtering tokens pre-generation. For each potential token, it tests the completion against a partial regex. Tokens that would lead to a non-matching sequence are masked in the logits, preventing the language model from generating them. This approach ensures adherence to the pattern without requiring post-processing or complex prompt engineering, offering a more direct and efficient method for structured generation.

Quick Start & Requirements

  • Install via pip: pip install rellm
  • Requires transformers and regex libraries.
  • Example usage with gpt2 is provided in the README.
  • Official quick-start and examples: r2d4/parserllm

Highlighted Details

  • Filters non-matching tokens pre-generation to enforce regex patterns.
  • Demonstrated success in improving output quality and parseability even with small models.
  • Supports generating specific syntactic structures (JSON, XML) and semantic structures (dates, numbers).
  • Can be used for template filling with specific constraints.

Maintenance & Community

  • Project appears to be maintained by r2d4.
  • No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. The repository structure suggests it might be MIT or Apache 2.0, but this requires verification.
  • Compatibility for commercial use or closed-source linking is dependent on the actual license.

Limitations & Caveats

The README does not specify a license, creating uncertainty for commercial use. Performance impact on larger models or complex regex patterns is not detailed. The project is presented as having "interesting" preliminary results, suggesting it may still be in an experimental phase.

Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Andreas Jansson Andreas Jansson(Cofounder of Replicate), and
1 more.

lm-format-enforcer by noamgat

0.2%
2k
Format enforcer for language model outputs (JSON, regex, etc.)
created 1 year ago
updated 5 months ago
Feedback? Help us improve.