ChineseNER by DSXiangLi

Advanced Chinese NER toolkit

Created 4 years ago

320 stars

Top 85.0% on SourcePulse

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This repository offers a comprehensive toolkit for Chinese Named Entity Recognition (NER), featuring a wide array of models from traditional BiLSTM-CRF to advanced BERT and Transformer architectures. It caters to NLP researchers and practitioners seeking flexible, state-of-the-art solutions for Chinese text analysis, simplifying the implementation and experimentation of diverse NER techniques.

How It Works

The project implements various NER approaches, including character-based, lexicon-enhanced, multi-task learning (MTL), and Transformer models. It supports techniques like adversarial transfer learning and an MRC (Machine Reading Comprehension) paradigm for NER. Data is processed into TFRecords using specific tokenizers (WordPiece for BERT, pre-trained word vectors for others) and includes data augmentation strategies for enhanced robustness.

Quick Start & Requirements

Installation: Primarily Python-based; environment configuration is detailed in requirement.txt.
Prerequisites: TensorFlow (implied by Docker image tensorflow/serving_model:1.14.0), pre-trained models (download links in folder READMEs), and datasets (preprocess scripts in data). Specific models may require pre-trained word vectors or BERT vocab files.
Setup: Requires downloading models and preprocessing data.
Links: Blog posts detailing model implementations and concepts are provided.

Highlighted Details

Supports a broad spectrum of NER models: BiLSTM-CRF, BERT variants, Transformer, lexicon-enhanced, and multi-task learning.
Incorporates advanced techniques like adversarial transfer learning and MRC for NER.
Offers data augmentation methods for improving model robustness.
Provides Docker images for TensorFlow Serving for streamlined inference deployment.

Maintenance & Community

The repository encourages community contributions via Pull Requests. No specific details on active maintainers, community channels, or a roadmap are provided in the README.

Licensing & Compatibility

The license type is not explicitly stated in the provided README content.

Limitations & Caveats

The code is noted as "not rigorously tested," indicating potential instability or bugs. The README does not detail specific limitations regarding unsupported platforms or known issues beyond this general testing caveat.

ChineseNER by DSXiangLi

Explore Similar Projects

mint by dpressel

universal-ner by universal-ner

fancy-nlp by boat-group

tner by asahi417

transformers-php by CodeWithKyrian

NER-BERT-pytorch by lemonhu

NER-Chinese by EOA-AILab

albert-chinese-ner by ProHiryu

transformers-tutorials by abhimishra91

mt-dnn by namisan

BERT-BiLSTM-CRF-NER by macanv

openvino by openvinotoolkit