neuspell  by neuspell

Neural spelling correction toolkit

created 5 years ago
696 stars

Top 49.9% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

NeuSpell is an open-source toolkit for context-sensitive English spelling correction, offering a suite of ten neural and non-neural models. It targets NLP practitioners and researchers seeking to improve text quality, with applications ranging from adversarial attack defense to enhancing OCR and grammar correction systems.

How It Works

NeuSpell trains neural models using synthetically generated spelling errors within context, reverse-engineered from isolated misspellings. It leverages rich contextual representations from models like BERT and ELMo, achieving higher correction rates than systems trained on random perturbations. The toolkit provides a unified interface for using these models.

Quick Start & Requirements

  • Installation: pip install -e . (source install) or pip install neuspell (pip install).
  • Dependencies: pip install -r extras-requirements.txt for optional features (e.g., [elmo], [spacy]). spacy models require python -m spacy download en_core_web_sm. Non-neural checkers (Aspell, Jamspell) have separate, manual installation steps.
  • Checkpoints: Pretrained models (450MB to 1.23GB) must be downloaded separately via neuspell.seq_modeling.downloads.download_pretrained_model("checkpoint_name") or "_all_".
  • Resources: GPU recommended for neural models; CPU performance is significantly slower.
  • Docs: http://neuspell.github.io/

Highlighted Details

  • Offers 10 spell checkers, including CNN-LSTM, SC-LSTM, BERT, and ELMo-enhanced variants.
  • Achieves up to 79.8% word correction rate on the BEA-60K dataset.
  • Supports fine-tuning on custom data and training new models using Hugging Face Transformers.
  • Includes utilities for generating synthetic training data via character and word-level noising strategies.

Maintenance & Community

  • Last updated April 2021 with API additions and pip availability.
  • Contact: jsaimurali001 [at] gmail [dot] com.

Licensing & Compatibility

  • The primary license is not explicitly stated in the README, but the project is open-source.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • Support for languages other than English is marked as "Coming Soon."
  • The allennlp library is not automatically installed for ELMo-based models, requiring a source installation.
  • Some model checkpoints are large, requiring significant disk space.
Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
1
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.