ChineseErrorCorrector  by TW-NLP

Chinese text error correction models

created 1 year ago
337 stars

Top 82.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a suite of Chinese text error correction models, including spelling and grammar correction, targeting researchers and developers working with Chinese NLP. It offers state-of-the-art performance, evidenced by multiple championship wins in major NLP competitions, and provides tools for data augmentation and model training.

How It Works

The project leverages large language models, specifically Qwen variants, fine-tuned on extensive Chinese error correction datasets. It supports various inference backends like Transformers and VLLM for efficient deployment and offers a grammar error augmentation tool to generate synthetic training data for custom model training.

Quick Start & Requirements

  • Installation: pip install transformers or pip install vllm==0.8.5 and modelscope.
  • Prerequisites: Python 3.10+, PyTorch. GPU is recommended for optimal performance.
  • Setup: Clone the repository and install dependencies via requirements.txt. Configuration involves modifying config.py to specify model checkpoints and inference backend.
  • Resources: Models range from 1.5B to 32B parameters.
  • Documentation: Official Quick Start

Highlighted Details

  • Achieved 1st place in 2024 CCL, 2023 NLPCC-NaCGEC, and 2022 FCGEC error correction tasks.
  • Offers models trained on up to 2 million error correction data points.
  • Supports multiple inference methods: Transformers, VLLM, and ModelScope.
  • Includes a grammar error augmentation tool for 14 types of grammatical errors.

Maintenance & Community

The project is actively maintained by TW-NLP, with recent updates including new model releases and improved training procedures. Links to community resources are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. However, the models are available on Hugging Face and ModelScope, suggesting a permissive usage for research purposes. Commercial use compatibility is not specified.

Limitations & Caveats

The README does not detail specific limitations, unsupported platforms, or known bugs. The project appears to be in active development, with newer models frequently released, which may imply potential for breaking changes.

Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
4
Star History
209 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.