Gramformer by PrithivirajDamodaran

Grammar correction framework for NLP text

Created 4 years ago

1,566 stars

Top 26.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Luis Capelo

Cofounder of Lightning AI

Project Summary

Gramformer is a Python library designed to detect, highlight, and correct grammatical errors in natural language text. It is particularly useful for post-processing machine-generated text from sources like NMT, ASR, and text summarization, as well as for assisting human writers and integrating into custom text editors or messaging platforms.

How It Works

Gramformer leverages a family of algorithms, combining top-tier research in grammar correction. It operates at the sentence level, processing text through its correction, highlighting, or detection modules. The library aims to provide high-quality corrections and highlights, with an integrated quality estimator to filter candidates. Its approach is novel in its ability to generate a dataset for grammar error correction, though current fine-tuning is on smaller models due to compute constraints.

Quick Start & Requirements

Install via pip: pip install -U git+https://github.com/PrithivirajDamodaran/Gramformer.git
Requires Python. GPU usage is optional but recommended for performance.
Hugging Face Hub login may be required for model access: from huggingface_hub import notebook_login or huggingface-cli login.
Official documentation and examples are available in the README.

Highlighted Details

Offers distinct interfaces for correction, highlighting, and detection (detector is in development).
get_edits method provides detailed edit information (type, original, corrected).
highlight method visually marks errors within the original text.
Fine-tuned on datasets derived from WikiEdits, C4, and PIE synthetic pairs.

Maintenance & Community

The project is open to pull requests and collaboration. Further community engagement details (Discord/Slack, roadmap) are not explicitly provided in the README.

Licensing & Compatibility

Releases <= v1.0 are explicitly not intended for commercial usage. Stable releases > v1.0 are implied to be suitable for commercial use, but this should be verified.

Limitations & Caveats

The model is trained on 64-length sentences and is not yet suitable for long prose or paragraphs. Fine-tuning was constrained by compute budget, so results should be considered a proof-of-concept. A version based on larger models and more data is planned for production use. The QE estimator's support has been subject to version conflicts.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days