Multiple-Relations-Extraction-Only-Look-Once by yuanxiaosc

Joint entity and relation extraction for SPO triples

Created 6 years ago

349 stars

Top 79.7% on SourcePulse

Project Summary

This repository provides an end-to-end solution for extracting multiple entity-relation triples from Chinese text in a single pass. It targets researchers and practitioners in information extraction and knowledge graph construction, offering a joint modeling approach that converts entity and relation extraction into sequence annotation and multi-head selection tasks, respectively.

How It Works

The core approach leverages a pre-trained transformer model (BERT) as a feature extractor. Entity extraction is framed as a sequence labeling task, while multi-relation extraction is treated as a multi-head selection problem. The model predicts output labels, predicate values, and predicate locations for each token, enabling the simultaneous extraction of entities and their associated relations. This unified approach aims for efficiency and accuracy in complex extraction scenarios.

Quick Start & Requirements

Installation: Requires Python 3.6+ and TensorFlow 1.12.0+.
Data: Download the SKE dataset (train, dev, schema) and place them in the ./raw_data/ directory. A Chinese BERT model checkpoint is also required in ./pretrained_model/.
Preprocessing: Run python bin/data_manager.py.
Training: Execute python run_multiple_relations_extraction_mask_loss.py with specified arguments.
Prediction: Use python run_multiple_relations_extraction.py with --do_predict=true.
Output Generation: Run python produce_submit_json_file.py.
Documentation: Details on data preprocessing and model execution are provided in the README.

Highlighted Details

Implements a joint entity recognition and relation extraction model.
Converts relation extraction into a multi-head selection problem.
Utilizes BERT as a feature extractor, with potential for replacement by other models.
Supports schema-constrained extraction for knowledge graph construction.

Maintenance & Community

The project appears to be an unofficial implementation of research papers. There are open issues requesting help with loss function design and improving the speed of output file generation. Contact information for data inquiries is provided.

Licensing & Compatibility

The README does not explicitly state a license. The project is based on research papers, and the data used is from the 2019 Language and Intelligence Technology Competition. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project relies on specific versions of TensorFlow and requires downloading external datasets and pre-trained models. The README indicates that the run_multiple_relations_extraction_MSE_loss.py is recommended over the basic run_multiple_relations_extraction.py for training and forecasting, suggesting potential improvements or stability issues with the latter.

Multiple-Relations-Extraction-Only-Look-Once by yuanxiaosc

Explore Similar Projects

Schema-based-Knowledge-Extraction by yuanxiaosc

GLiNER2 by fastino-ai

deep-seek by dzhng

awesome-nlprojects by costezki

itext2kg by AuvaLab

USC-DS-RelationExtraction by INK-USC

kg-gen by stair-lab

BERT-Relation-Extraction by plkmo

BLINK by facebookresearch

GLiNER by urchade

Entity-Relation-Extraction by yuanxiaosc

DeepKE by zjunlp