NER-BERT-pytorch by lemonhu

PyTorch solution for named entity recognition

Created 7 years ago

450 stars

Top 66.8% on SourcePulse

Project Summary

This repository provides a PyTorch implementation for Named Entity Recognition (NER) using Google's BERT model, specifically tailored for Chinese text. It targets researchers and practitioners in Natural Language Processing (NLP) who need a robust solution for identifying entities like Person, Organization, and Location in text. The project offers a clear path to fine-tuning BERT on custom NER datasets, demonstrating strong performance on the MSRA dataset.

How It Works

The project leverages the BERT architecture for its powerful contextual embeddings, fine-tuning it on a sequence labeling task. It processes text using a BIO tagging scheme, where each token is tagged with its entity type and position (Beginning, Inside, or Outside). The implementation uses the pytorch-pretrained-bert library for model loading and management, enabling efficient fine-tuning on the provided MSRA dataset.

Quick Start & Requirements

Install via pip: pip install tensorflow>=1.11.0 torch>=0.4.1 pytorch-pretrained-bert==0.4.0 tqdm apex
Requires Python 3.5+ and PyTorch 0.4.1/1.0.0.
TensorFlow is only needed for converting pre-trained models.
apex is recommended for mixed-precision and distributed training.
Download pre-trained BERT base Chinese model or convert from TensorFlow checkpoint.
Run python build_msra_dataset_tags.py to prepare the dataset.
Train using python train.py or python train.py --data_dir <path> --bert_model_dir <path> --model_dir <path>.
Evaluate using python evaluate.py.
Official BERT Chinese model: https://github.com/google-research/bert
pytorch-pretrained-bert: https://github.com/huggingface/pytorch-pretrained-BERT
apex: https://github.com/NVIDIA/apex

Highlighted Details

Achieved 94.62% F1 score on the MSRA test set without extensive hyperparameter tuning.
Detailed per-entity type performance metrics (PER: 96.39%, ORG: 90.84%, LOC: 95.52%).
Supports both Chinese and English NER tasks.
Provides clear instructions for converting TensorFlow BERT checkpoints to PyTorch.

Maintenance & Community

The project is a personal implementation by lemonhu. There are no explicit mentions of active maintenance, community channels (like Discord/Slack), or a public roadmap.

Licensing & Compatibility

The repository does not explicitly state a license. However, it relies on libraries with permissive licenses (PyTorch, TensorFlow, Hugging Face's pytorch-pretrained-bert). Commercial use would require careful verification of any implicit licensing or dependencies.

Limitations & Caveats

The project specifies compatibility with older versions of PyTorch (0.4.1/1.0.0) and Python 3.5, which may pose challenges for integration with modern ML stacks. The lack of explicit licensing information is a significant caveat for commercial adoption.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days