Gramformer  by PrithivirajDamodaran

Grammar correction framework for NLP text

Created 4 years ago
1,549 stars

Top 27.0% on SourcePulse

GitHubView on GitHub
Project Summary

Gramformer is a Python library designed to detect, highlight, and correct grammatical errors in natural language text. It is particularly useful for post-processing machine-generated text from sources like NMT, ASR, and text summarization, as well as for assisting human writers and integrating into custom text editors or messaging platforms.

How It Works

Gramformer leverages a family of algorithms, combining top-tier research in grammar correction. It operates at the sentence level, processing text through its correction, highlighting, or detection modules. The library aims to provide high-quality corrections and highlights, with an integrated quality estimator to filter candidates. Its approach is novel in its ability to generate a dataset for grammar error correction, though current fine-tuning is on smaller models due to compute constraints.

Quick Start & Requirements

  • Install via pip: pip install -U git+https://github.com/PrithivirajDamodaran/Gramformer.git
  • Requires Python. GPU usage is optional but recommended for performance.
  • Hugging Face Hub login may be required for model access: from huggingface_hub import notebook_login or huggingface-cli login.
  • Official documentation and examples are available in the README.

Highlighted Details

  • Offers distinct interfaces for correction, highlighting, and detection (detector is in development).
  • get_edits method provides detailed edit information (type, original, corrected).
  • highlight method visually marks errors within the original text.
  • Fine-tuned on datasets derived from WikiEdits, C4, and PIE synthetic pairs.

Maintenance & Community

The project is open to pull requests and collaboration. Further community engagement details (Discord/Slack, roadmap) are not explicitly provided in the README.

Licensing & Compatibility

Releases <= v1.0 are explicitly not intended for commercial usage. Stable releases > v1.0 are implied to be suitable for commercial use, but this should be verified.

Limitations & Caveats

The model is trained on 64-length sentences and is not yet suitable for long prose or paragraphs. Fine-tuning was constrained by compute budget, so results should be considered a proof-of-concept. A version based on larger models and more data is planned for production use. The QE estimator's support has been subject to version conflicts.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Elvis Saravia Elvis Saravia(Founder of DAIR.AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
3 more.

nlp-library by mihail911

0.1%
1k
NLP papers for practitioners
Created 8 years ago
Updated 5 years ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Eugene Yan Eugene Yan(AI Scientist at AWS), and
14 more.

text by pytorch

0.0%
4k
PyTorch library for NLP tasks
Created 8 years ago
Updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

pycorrector by shibing624

0.2%
6k
Toolkit for text error correction, supports multiple models for Chinese
Created 7 years ago
Updated 1 week ago
Feedback? Help us improve.