BERT-NER by kyzhouhzau

BERT fine-tuning for named entity recognition

Created 7 years ago

1,272 stars

Top 31.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Kaichao You

Core Maintainer of vLLM

Project Summary

This repository provides a fine-tuned BERT model for Named Entity Recognition (NER) on the CoNLL-2003 dataset. It's designed for researchers and practitioners looking to implement and experiment with BERT for sequence labeling tasks, offering a cleaner, updated version of an earlier implementation.

How It Works

The project leverages Google's BERT architecture for NER, fine-tuning it on the CoNLL-2003 dataset. The approach involves data preprocessing and custom layer design, with specific suggestions to modify the CRF or softmax layers for potential performance improvements. The use of a cased BERT model is recommended for better accuracy, aligning with findings from Google's research.

Quick Start & Requirements

Install/Run: Execute bash run_ner.sh.
Prerequisites:
- BERT model files (vocab.txt, bert_config.json, bert_model.ckpt) from Google's BERT repository.
- CoNLL-2003 dataset.
- conlleval.pl script for evaluation.
Setup: Requires downloading BERT model weights and the dataset.
Links:
- BERT: https://github.com/google-research/bert
- CoNLL-2003 data and script: https://github.com/kyzhouhzau/BERT-NER/tree/master/data

Highlighted Details

Achieves 89.72% FB1 on the CoNLL-2003 test set with default parameters.
Reports high precision and recall for PER (95.01% FB1) and LOC (91.86% FB1) entities.
Offers a refactored version with improved code clarity and annotations over an older implementation.
Suggests modifications to crf_layer or softmax_layer for further tuning.

Maintenance & Community

The repository appears to be a personal project with no explicit mention of active maintenance, community channels, or notable contributors beyond the author.

Licensing & Compatibility

The README does not explicitly state a license. The project relies on Google's BERT, which is typically Apache 2.0 licensed. Compatibility for commercial use would depend on the licensing of the BERT model weights and the CoNLL-2003 dataset.

Limitations & Caveats

The project relies on external downloads for BERT model weights and the dataset. Performance might require further tuning beyond the default parameters, as suggested by the author. The README does not specify Python version requirements or explicit dependency management beyond the core BERT components.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days