rellm  by r2d4

Regex tool for language model completion

Created 2 years ago
512 stars

Top 61.0% on SourcePulse

GitHubView on GitHub
Project Summary

ReLLM enables language models to generate text that strictly adheres to specified regular expression patterns, ensuring structured and predictable output. This is particularly useful for extracting specific data formats like JSON, XML, dates, or numbers, or for filling in template sentences. The primary audience includes developers and researchers needing programmatic control over LLM outputs for data parsing, validation, and templating.

How It Works

ReLLM operates by filtering tokens pre-generation. For each potential token, it tests the completion against a partial regex. Tokens that would lead to a non-matching sequence are masked in the logits, preventing the language model from generating them. This approach ensures adherence to the pattern without requiring post-processing or complex prompt engineering, offering a more direct and efficient method for structured generation.

Quick Start & Requirements

  • Install via pip: pip install rellm
  • Requires transformers and regex libraries.
  • Example usage with gpt2 is provided in the README.
  • Official quick-start and examples: r2d4/parserllm

Highlighted Details

  • Filters non-matching tokens pre-generation to enforce regex patterns.
  • Demonstrated success in improving output quality and parseability even with small models.
  • Supports generating specific syntactic structures (JSON, XML) and semantic structures (dates, numbers).
  • Can be used for template filling with specific constraints.

Maintenance & Community

  • Project appears to be maintained by r2d4.
  • No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. The repository structure suggests it might be MIT or Apache 2.0, but this requires verification.
  • Compatibility for commercial use or closed-source linking is dependent on the actual license.

Limitations & Caveats

The README does not specify a license, creating uncertainty for commercial use. Performance impact on larger models or complex regex patterns is not detailed. The project is presented as having "interesting" preliminary results, suggesting it may still be in an experimental phase.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Kaichao You Kaichao You(Core Maintainer of vLLM), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

lm-format-enforcer by noamgat

0.6%
2k
Format enforcer for language model outputs (JSON, regex, etc.)
Created 2 years ago
Updated 3 weeks ago
Feedback? Help us improve.