Gramformer  by PrithivirajDamodaran

Grammar correction framework for NLP text

created 4 years ago
1,542 stars

Top 27.5% on sourcepulse

GitHubView on GitHub
Project Summary

Gramformer is a Python library designed to detect, highlight, and correct grammatical errors in natural language text. It is particularly useful for post-processing machine-generated text from sources like NMT, ASR, and text summarization, as well as for assisting human writers and integrating into custom text editors or messaging platforms.

How It Works

Gramformer leverages a family of algorithms, combining top-tier research in grammar correction. It operates at the sentence level, processing text through its correction, highlighting, or detection modules. The library aims to provide high-quality corrections and highlights, with an integrated quality estimator to filter candidates. Its approach is novel in its ability to generate a dataset for grammar error correction, though current fine-tuning is on smaller models due to compute constraints.

Quick Start & Requirements

  • Install via pip: pip install -U git+https://github.com/PrithivirajDamodaran/Gramformer.git
  • Requires Python. GPU usage is optional but recommended for performance.
  • Hugging Face Hub login may be required for model access: from huggingface_hub import notebook_login or huggingface-cli login.
  • Official documentation and examples are available in the README.

Highlighted Details

  • Offers distinct interfaces for correction, highlighting, and detection (detector is in development).
  • get_edits method provides detailed edit information (type, original, corrected).
  • highlight method visually marks errors within the original text.
  • Fine-tuned on datasets derived from WikiEdits, C4, and PIE synthetic pairs.

Maintenance & Community

The project is open to pull requests and collaboration. Further community engagement details (Discord/Slack, roadmap) are not explicitly provided in the README.

Licensing & Compatibility

Releases <= v1.0 are explicitly not intended for commercial usage. Stable releases > v1.0 are implied to be suitable for commercial use, but this should be verified.

Limitations & Caveats

The model is trained on 64-length sentences and is not yet suitable for long prose or paragraphs. Fine-tuning was constrained by compute budget, so results should be considered a proof-of-concept. A version based on larger models and more data is planned for production use. The QE estimator's support has been subject to version conflicts.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
19 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Travis Fischer Travis Fischer(Founder of Agentic), and
5 more.

setfit by huggingface

0.3%
3k
Few-shot learning framework for Sentence Transformers
created 3 years ago
updated 3 months ago
Feedback? Help us improve.