bert_ner  by Kyubyong

PyTorch code for Named Entity Recognition (NER) using pretrained BERT

created 6 years ago
282 stars

Top 93.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation for Named Entity Recognition (NER) using BERT, aiming to reproduce research findings that BERT models achieve strong NER performance without complex prediction-conditioned algorithms. It is targeted at researchers and practitioners interested in applying BERT to sequence labeling tasks.

How It Works

The project implements two approaches: a feature-based method leveraging BERT embeddings with additional RNN layers, and a fine-tuning approach where BERT's weights are directly updated. This allows for direct comparison of BERT's inherent capabilities versus its fine-tuned performance on NER.

Quick Start & Requirements

  • Install: pip install pytorch==1.0 pytorch_pretrained_bert==0.6.1 numpy>=1.15.4
  • Prerequisites: Python >= 3.6.
  • Dataset: Download CoNLL 2003 NER dataset using bash download.sh.
  • Training:
    • Feature-based: python train.py --logdir checkpoints/feature --batch_size 128 --top_rnns --lr 1e-4 --n_epochs 30
    • Fine-tuning: python train.py --logdir checkpoints/finetuning --finetuning --batch_size 32 --lr 5e-5 --n_epochs 3
  • Links: GitHub Repo

Highlighted Details

  • Achieves F1 scores up to 0.95 on the CoNLL 2003 validation dataset via fine-tuning.
  • Reports comparative results for both feature-based and fine-tuning BERT approaches.
  • Outputs classification results are saved in the checkpoints directory.

Maintenance & Community

The repository is maintained by Kyubyong. No specific community channels or roadmap are indicated in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires PyTorch 1.0 and pytorch_pretrained_bert 0.6.1, which are older versions and may present compatibility issues with current PyTorch ecosystems. The fine-tuning approach is limited to 3 epochs, which might be insufficient for optimal convergence.

Health Check
Last commit

5 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.