ChineseNER  by DSXiangLi

Advanced Chinese NER toolkit

Created 4 years ago
320 stars

Top 84.6% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This repository offers a comprehensive toolkit for Chinese Named Entity Recognition (NER), featuring a wide array of models from traditional BiLSTM-CRF to advanced BERT and Transformer architectures. It caters to NLP researchers and practitioners seeking flexible, state-of-the-art solutions for Chinese text analysis, simplifying the implementation and experimentation of diverse NER techniques.

How It Works

The project implements various NER approaches, including character-based, lexicon-enhanced, multi-task learning (MTL), and Transformer models. It supports techniques like adversarial transfer learning and an MRC (Machine Reading Comprehension) paradigm for NER. Data is processed into TFRecords using specific tokenizers (WordPiece for BERT, pre-trained word vectors for others) and includes data augmentation strategies for enhanced robustness.

Quick Start & Requirements

  • Installation: Primarily Python-based; environment configuration is detailed in requirement.txt.
  • Prerequisites: TensorFlow (implied by Docker image tensorflow/serving_model:1.14.0), pre-trained models (download links in folder READMEs), and datasets (preprocess scripts in data). Specific models may require pre-trained word vectors or BERT vocab files.
  • Setup: Requires downloading models and preprocessing data.
  • Links: Blog posts detailing model implementations and concepts are provided.

Highlighted Details

  • Supports a broad spectrum of NER models: BiLSTM-CRF, BERT variants, Transformer, lexicon-enhanced, and multi-task learning.
  • Incorporates advanced techniques like adversarial transfer learning and MRC for NER.
  • Offers data augmentation methods for improving model robustness.
  • Provides Docker images for TensorFlow Serving for streamlined inference deployment.

Maintenance & Community

The repository encourages community contributions via Pull Requests. No specific details on active maintainers, community channels, or a roadmap are provided in the README.

Licensing & Compatibility

The license type is not explicitly stated in the provided README content.

Limitations & Caveats

The code is noted as "not rigorously tested," indicating potential instability or bugs. The README does not detail specific limitations regarding unsupported platforms or known issues beyond this general testing caveat.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Forrest Iandola Forrest Iandola(Author of SqueezeNet; Research Scientist at Meta), Chris Van Pelt Chris Van Pelt(Cofounder of Weights & Biases), and
2 more.

mt-dnn by namisan

0%
2k
PyTorch package for multi-task deep neural networks research
Created 6 years ago
Updated 1 year ago
Feedback? Help us improve.