Joint entity and relation extraction for SPO triples
Top 81.1% on sourcepulse
This repository provides an end-to-end solution for extracting multiple entity-relation triples from Chinese text in a single pass. It targets researchers and practitioners in information extraction and knowledge graph construction, offering a joint modeling approach that converts entity and relation extraction into sequence annotation and multi-head selection tasks, respectively.
How It Works
The core approach leverages a pre-trained transformer model (BERT) as a feature extractor. Entity extraction is framed as a sequence labeling task, while multi-relation extraction is treated as a multi-head selection problem. The model predicts output labels, predicate values, and predicate locations for each token, enabling the simultaneous extraction of entities and their associated relations. This unified approach aims for efficiency and accuracy in complex extraction scenarios.
Quick Start & Requirements
./raw_data/
directory. A Chinese BERT model checkpoint is also required in ./pretrained_model/
.python bin/data_manager.py
.python run_multiple_relations_extraction_mask_loss.py
with specified arguments.python run_multiple_relations_extraction.py
with --do_predict=true
.python produce_submit_json_file.py
.Highlighted Details
Maintenance & Community
The project appears to be an unofficial implementation of research papers. There are open issues requesting help with loss function design and improving the speed of output file generation. Contact information for data inquiries is provided.
Licensing & Compatibility
The README does not explicitly state a license. The project is based on research papers, and the data used is from the 2019 Language and Intelligence Technology Competition. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project relies on specific versions of TensorFlow and requires downloading external datasets and pre-trained models. The README indicates that the run_multiple_relations_extraction_MSE_loss.py
is recommended over the basic run_multiple_relations_extraction.py
for training and forecasting, suggesting potential improvements or stability issues with the latter.
6 years ago
1 day