fastHan by fastnlp

NLP toolkit for Chinese, like spaCy

Created 6 years ago

760 stars

Top 45.8% on SourcePulse

Project Summary

fastHan is a convenient, BERT-based toolkit for Chinese Natural Language Processing, offering functionalities similar to spaCy. It targets researchers and developers needing to perform tasks like word segmentation, part-of-speech tagging, named entity recognition, and dependency parsing on modern and classical Chinese text, including Chinese AMR.

How It Works

fastHan utilizes a BERT-based joint model trained on multiple corpora. This multi-task approach allows it to handle various NLP tasks simultaneously, improving efficiency and performance. The integration of prompt technology in version 2.0 further enhances its capabilities.

Quick Start & Requirements

Install: pip install fastHan or clone from GitHub and run python setup.py install.
Prerequisites: torch>=1.8.0, fastNLP>=1.0.0, transformers>=4.0.0, datasets==2.7.0, pandas==1.5.1, numpy==1.22.2. GPU recommended for performance.
Setup: Initial model loading automatically downloads parameters.
Docs: English README

Highlighted Details

Supports modern and classical Chinese word segmentation and POS tagging.
Includes Named Entity Recognition (NER) and Dependency Parsing.
Offers Chinese Abstract Meaning Representation (AMR) processing via FastCAMR.
Allows fine-tuning on custom datasets for specific tasks.
Supports user-defined dictionaries for word segmentation.

Maintenance & Community

The project experienced a server outage causing parameter loss, temporarily halting service. The team is working on the next iteration.
No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. However, the project is hosted on GitHub, implying a typical open-source license. Further clarification is needed for commercial use.

Limitations & Caveats

Model parameters are currently unavailable due to a server failure.
Classical Chinese POS tagging fine-tuning is not supported due to sample constraints.
Dependency parsing speed is noted as slower compared to other tasks on CPU.

Health Check

Last Commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

Starred by

Lysandre Debut

Lysandre Debut(Chief Open-Source Officer at Hugging Face).

bert-japanese by cl-tohoku

Pretrained BERT models for Japanese text

Created 6 years ago

Updated 1 year ago

awesome-chinese-ner by taishan1994

Resource list for Chinese NER

Created 3 years ago

Updated 7 months ago

nlp-cheat-sheet-python by janlukasschroeder

A Python NLP cheat sheet covering core concepts and tools

Created 6 years ago

Updated 3 years ago

nlp-paper by changwookjun

Created 6 years ago

Updated 3 weeks ago

pororo by kakaobrain

NLP SDK for natural language and speech processing tasks

Created 5 years ago

Updated 3 years ago

Chinese-XLNet by ymcui

Chinese XLNet pre-trained models for NLP tasks

Created 6 years ago

Updated 7 months ago

KoBERT by SKTBrain

Korean BERT for language tasks

Created 6 years ago

Updated 8 months ago

zero_nlp by yuanzhoulvpi2017

NLP solution for Chinese language models, data, training, and inference

Created 3 years ago

Updated 6 months ago

Synonyms by chatopera

NLP tools for chatbot-like applications

Created 8 years ago

Updated 3 weeks ago

Chinese-BERT-wwm by ymcui

Pre-trained language models for Chinese NLP tasks

Created 6 years ago

Updated 7 months ago

Awesome-Chinese-NLP by crownpku

Chinese NLP resource list

Created 8 years ago

Updated 2 years ago

Starred by

Lei Xu

Lei Xu(Cofounder of LanceDB) and

Binyuan Hui

Binyuan Hui(Research Scientist at Alibaba Qwen).

HanLP by hankcs

Multilingual NLP library for research/industry, built on PyTorch and TensorFlow

Created 11 years ago

Updated 3 months ago

Feedback? Help us improve.