Chinese text error correction models
Top 82.8% on sourcepulse
This repository provides a suite of Chinese text error correction models, including spelling and grammar correction, targeting researchers and developers working with Chinese NLP. It offers state-of-the-art performance, evidenced by multiple championship wins in major NLP competitions, and provides tools for data augmentation and model training.
How It Works
The project leverages large language models, specifically Qwen variants, fine-tuned on extensive Chinese error correction datasets. It supports various inference backends like Transformers and VLLM for efficient deployment and offers a grammar error augmentation tool to generate synthetic training data for custom model training.
Quick Start & Requirements
pip install transformers
or pip install vllm==0.8.5
and modelscope
.requirements.txt
. Configuration involves modifying config.py
to specify model checkpoints and inference backend.Highlighted Details
Maintenance & Community
The project is actively maintained by TW-NLP, with recent updates including new model releases and improved training procedures. Links to community resources are not explicitly provided in the README.
Licensing & Compatibility
The repository does not explicitly state a license. However, the models are available on Hugging Face and ModelScope, suggesting a permissive usage for research purposes. Commercial use compatibility is not specified.
Limitations & Caveats
The README does not detail specific limitations, unsupported platforms, or known bugs. The project appears to be in active development, with newer models frequently released, which may imply potential for breaking changes.
1 week ago
Inactive