fastHan  by fastnlp

NLP toolkit for Chinese, like spaCy

created 5 years ago
756 stars

Top 46.9% on sourcepulse

GitHubView on GitHub
Project Summary

fastHan is a convenient, BERT-based toolkit for Chinese Natural Language Processing, offering functionalities similar to spaCy. It targets researchers and developers needing to perform tasks like word segmentation, part-of-speech tagging, named entity recognition, and dependency parsing on modern and classical Chinese text, including Chinese AMR.

How It Works

fastHan utilizes a BERT-based joint model trained on multiple corpora. This multi-task approach allows it to handle various NLP tasks simultaneously, improving efficiency and performance. The integration of prompt technology in version 2.0 further enhances its capabilities.

Quick Start & Requirements

  • Install: pip install fastHan or clone from GitHub and run python setup.py install.
  • Prerequisites: torch>=1.8.0, fastNLP>=1.0.0, transformers>=4.0.0, datasets==2.7.0, pandas==1.5.1, numpy==1.22.2. GPU recommended for performance.
  • Setup: Initial model loading automatically downloads parameters.
  • Docs: English README

Highlighted Details

  • Supports modern and classical Chinese word segmentation and POS tagging.
  • Includes Named Entity Recognition (NER) and Dependency Parsing.
  • Offers Chinese Abstract Meaning Representation (AMR) processing via FastCAMR.
  • Allows fine-tuning on custom datasets for specific tasks.
  • Supports user-defined dictionaries for word segmentation.

Maintenance & Community

  • The project experienced a server outage causing parameter loss, temporarily halting service. The team is working on the next iteration.
  • No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. However, the project is hosted on GitHub, implying a typical open-source license. Further clarification is needed for commercial use.

Limitations & Caveats

  • Model parameters are currently unavailable due to a server failure.
  • Classical Chinese POS tagging fine-tuning is not supported due to sample constraints.
  • Dependency parsing speed is noted as slower compared to other tasks on CPU.
Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.