rellm by r2d4

Regex tool for language model completion

Created 2 years ago

514 stars

Top 60.9% on SourcePulse

View on GitHub

2 Experts Love This Project

Travis Fischer

Founder of Agentic

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

ReLLM enables language models to generate text that strictly adheres to specified regular expression patterns, ensuring structured and predictable output. This is particularly useful for extracting specific data formats like JSON, XML, dates, or numbers, or for filling in template sentences. The primary audience includes developers and researchers needing programmatic control over LLM outputs for data parsing, validation, and templating.

How It Works

ReLLM operates by filtering tokens pre-generation. For each potential token, it tests the completion against a partial regex. Tokens that would lead to a non-matching sequence are masked in the logits, preventing the language model from generating them. This approach ensures adherence to the pattern without requiring post-processing or complex prompt engineering, offering a more direct and efficient method for structured generation.

Quick Start & Requirements

Install via pip: pip install rellm
Requires transformers and regex libraries.
Example usage with gpt2 is provided in the README.
Official quick-start and examples: r2d4/parserllm

Highlighted Details

Filters non-matching tokens pre-generation to enforce regex patterns.
Demonstrated success in improving output quality and parseability even with small models.
Supports generating specific syntactic structures (JSON, XML) and semantic structures (dates, numbers).
Can be used for template filling with specific constraints.

Maintenance & Community

Project appears to be maintained by r2d4.
No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The repository structure suggests it might be MIT or Apache 2.0, but this requires verification.
Compatibility for commercial use or closed-source linking is dependent on the actual license.

Limitations & Caveats

The README does not specify a license, creating uncertainty for commercial use. Performance impact on larger models or complex regex patterns is not detailed. The project is presented as having "interesting" preliminary results, suggesting it may still be in an experimental phase.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days