ChineseErrorCorrector by TW-NLP

Chinese text error correction models

Created 1 year ago

490 stars

Top 63.0% on SourcePulse

Project Summary

This repository provides a suite of Chinese text error correction models, including spelling and grammar correction, targeting researchers and developers working with Chinese NLP. It offers state-of-the-art performance, evidenced by multiple championship wins in major NLP competitions, and provides tools for data augmentation and model training.

How It Works

The project leverages large language models, specifically Qwen variants, fine-tuned on extensive Chinese error correction datasets. It supports various inference backends like Transformers and VLLM for efficient deployment and offers a grammar error augmentation tool to generate synthetic training data for custom model training.

Quick Start & Requirements

Installation: pip install transformers or pip install vllm==0.8.5 and modelscope.
Prerequisites: Python 3.10+, PyTorch. GPU is recommended for optimal performance.
Setup: Clone the repository and install dependencies via requirements.txt. Configuration involves modifying config.py to specify model checkpoints and inference backend.
Resources: Models range from 1.5B to 32B parameters.
Documentation: Official Quick Start

Highlighted Details

Achieved 1st place in 2024 CCL, 2023 NLPCC-NaCGEC, and 2022 FCGEC error correction tasks.
Offers models trained on up to 2 million error correction data points.
Supports multiple inference methods: Transformers, VLLM, and ModelScope.
Includes a grammar error augmentation tool for 14 types of grammatical errors.

Maintenance & Community

The project is actively maintained by TW-NLP, with recent updates including new model releases and improved training procedures. Links to community resources are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. However, the models are available on Hugging Face and ModelScope, suggesting a permissive usage for research purposes. Commercial use compatibility is not specified.

Limitations & Caveats

The README does not detail specific limitations, unsupported platforms, or known bugs. The project appears to be in active development, with newer models frequently released, which may imply potential for breaking changes.

ChineseErrorCorrector by TW-NLP

Explore Similar Projects

mend by eric-mitchell

fancy-nlp by boat-group

gpt-oss-recipes by huggingface

open_lm by mlfoundations

neuspell by neuspell

textgen by shibing624

texar-pytorch by asyml

KoELECTRA by monologg

Gramformer by PrithivirajDamodaran

fast-bert by appvision-ai

zero_nlp by yuanzhoulvpi2017

pycorrector by shibing624