mrc-for-flat-nested-ner  by ShannonAI

Research paper code for named entity recognition via unified MRC framework

created 5 years ago
675 stars

Top 51.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the implementation for a unified MRC (Machine Reading Comprehension) framework for Named Entity Recognition (NER), addressing both flat and nested entity extraction. It is targeted at NLP researchers and practitioners seeking to leverage MRC for improved NER performance. The framework offers a novel approach to NER by reformulating it as a question-answering task.

How It Works

The framework treats NER as a reading comprehension problem, where the model answers questions about text to identify entities. For flat NER, it uses a span-extraction approach. For nested NER, it employs a more complex MRC formulation to handle entities that contain other entities. This MRC-based approach allows for a more flexible and potentially more accurate extraction of entities compared to traditional sequence labeling methods.

Quick Start & Requirements

  • Install: pip install -r requirements.txt (after installing PyTorch with CUDA 10.1 support if applicable: pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html).
  • Prerequisites: Python 3.6+, PyTorch with CUDA 10.1 (for GPU acceleration), BERT models.
  • Setup: Requires downloading preprocessed datasets and BERT models. Training scripts need DATA_DIR, BERT_DIR, and OUTPUT_DIR to be configured.
  • Docs: Refer to PyTorch Lightning documentation for argument details.

Highlighted Details

  • Implements a unified framework for both flat and nested Named Entity Recognition.
  • Reformulates NER as a Machine Reading Comprehension task.
  • Provides scripts for data preprocessing, training, evaluation, and inference.
  • Based on PyTorch and PyTorch Lightning.

Maintenance & Community

  • Developed by ShannonAI, authors of the ACL 2020 paper "A Unified MRC Framework for Named Entity Recognition."
  • Questions can be posted as GitHub issues.

Licensing & Compatibility

  • The README does not explicitly state a license.

Limitations & Caveats

  • Requires specific PyTorch versions for CUDA 10.1; other CUDA versions may require manual PyTorch installation.
  • Configuration of data and model directories is necessary before running training scripts.
Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.