Multiple-Relations-Extraction-Only-Look-Once  by yuanxiaosc

Joint entity and relation extraction for SPO triples

created 6 years ago
347 stars

Top 81.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an end-to-end solution for extracting multiple entity-relation triples from Chinese text in a single pass. It targets researchers and practitioners in information extraction and knowledge graph construction, offering a joint modeling approach that converts entity and relation extraction into sequence annotation and multi-head selection tasks, respectively.

How It Works

The core approach leverages a pre-trained transformer model (BERT) as a feature extractor. Entity extraction is framed as a sequence labeling task, while multi-relation extraction is treated as a multi-head selection problem. The model predicts output labels, predicate values, and predicate locations for each token, enabling the simultaneous extraction of entities and their associated relations. This unified approach aims for efficiency and accuracy in complex extraction scenarios.

Quick Start & Requirements

  • Installation: Requires Python 3.6+ and TensorFlow 1.12.0+.
  • Data: Download the SKE dataset (train, dev, schema) and place them in the ./raw_data/ directory. A Chinese BERT model checkpoint is also required in ./pretrained_model/.
  • Preprocessing: Run python bin/data_manager.py.
  • Training: Execute python run_multiple_relations_extraction_mask_loss.py with specified arguments.
  • Prediction: Use python run_multiple_relations_extraction.py with --do_predict=true.
  • Output Generation: Run python produce_submit_json_file.py.
  • Documentation: Details on data preprocessing and model execution are provided in the README.

Highlighted Details

  • Implements a joint entity recognition and relation extraction model.
  • Converts relation extraction into a multi-head selection problem.
  • Utilizes BERT as a feature extractor, with potential for replacement by other models.
  • Supports schema-constrained extraction for knowledge graph construction.

Maintenance & Community

The project appears to be an unofficial implementation of research papers. There are open issues requesting help with loss function design and improving the speed of output file generation. Contact information for data inquiries is provided.

Licensing & Compatibility

The README does not explicitly state a license. The project is based on research papers, and the data used is from the 2019 Language and Intelligence Technology Competition. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project relies on specific versions of TensorFlow and requires downloading external datasets and pre-trained models. The README indicates that the run_multiple_relations_extraction_MSE_loss.py is recommended over the basic run_multiple_relations_extraction.py for training and forecasting, suggesting potential improvements or stability issues with the latter.

Health Check
Last commit

6 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.