bert-chinese-ner by ProHiryu

Fine-tunes BERT for Chinese NER

Created 7 years ago

976 stars

Top 37.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Project Summary

This repository provides a fine-tuned BERT model for Chinese Named Entity Recognition (NER). It is intended for researchers and developers working with Chinese NLP tasks who need a robust NER solution. The project offers a practical implementation for leveraging pre-trained BERT models for NER, achieving high accuracy.

How It Works

The project fine-tunes a pre-trained BERT model using the BIO data annotation scheme. It takes a pre-trained BERT model and a dataset formatted for NER tasks, then trains the model to identify and classify named entities within Chinese text. The approach leverages BERT's powerful contextual embeddings to achieve state-of-the-art performance.

Quick Start & Requirements

Install/Run: python BERT_NER.py --data_dir=data/ --bert_config_file=checkpoint/bert_config.json --init_checkpoint=checkpoint/bert_model.ckpt --vocab_file=vocab.txt --output_dir=./output/result_dir/
Prerequisites: BERT-TF source code, BERT-Base Chinese model checkpoint, data in BIO format.
Setup: Requires downloading BERT source and model checkpoints.

Highlighted Details

Achieved an evaluation F1 score of 0.966.
Uses the classic People's Daily dataset for training.
Supports fine-tuning BERT for Chinese NER.

Maintenance & Community

The project appears to be a personal exploration, with a note suggesting migration to an ALBERT fine-tune NER model. No community links or active maintenance signals are present.

Licensing & Compatibility

The license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The project is presented as an experimental attempt and may not be actively maintained, with a recommendation to use an ALBERT-based model instead. The README does not specify the exact BERT version or framework used (e.g., TensorFlow, PyTorch).

Health Check

Last Commit

5 years ago

Responsiveness

1+ week

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days