Entity-Relation-Extraction by yuanxiaosc

TensorFlow code for entity-relation extraction

Created 6 years ago

1,228 stars

Top 31.9% on SourcePulse

Project Summary

This repository provides a pipeline-based solution for entity and relation extraction, specifically tailored for schema-constrained knowledge extraction tasks. It is designed for researchers and practitioners working with Chinese text data, offering a practical implementation based on TensorFlow and BERT for the 2019 Language and Intelligence Technology Competition.

How It Works

The system employs a two-stage pipeline. First, a multi-label classification model identifies potential relationship types within a sentence. Subsequently, a sequence labeling model, taking the sentence and predicted relationship types as input, identifies and labels the entities (subject and object) corresponding to those relationships. This approach allows for a structured extraction of (Subject, Predicate, Object) triples that adhere to predefined schemas.

Quick Start & Requirements

Install: Python 3.6+, TensorFlow 1.12.0+. Download and place a Chinese BERT pre-trained model in the pretrained_model directory. Download competition data and place it in ./raw_data/.
Data: Requires specific training, development, and schema files from the 2019 Language and Intelligence Technology Competition. Official data download links are no longer active; contact provided email for assistance.
Training: Separate commands are provided for training the relation classification model (run_predicate_classification.py) and the sequence labeling model (run_sequnce_labeling.py).
Prediction: Commands for inference using trained models are also available.
Resources: Requires a Chinese BERT model checkpoint.

Highlighted Details

Achieved 87.1% F1 score on the test set in a competition setting.
Implements a pipeline approach combining relation classification and sequence labeling.
Utilizes a large-scale Chinese dataset (SKE) with over 430,000 triples and 210,000 sentences.
Provides detailed training and prediction scripts for both components.

Maintenance & Community

The project is associated with the 2019 Language and Intelligence Technology Competition.
Contact information (email) is provided for data-related inquiries.
Links to competition forums and related reports are included.

Licensing & Compatibility

The repository does not explicitly state a license.
TensorFlow 1.12.0+ is a requirement, which is compatible with commercial use.

Limitations & Caveats

Official data download links are no longer active, potentially hindering setup.
The provided test data lacks labels, necessitating submission to official evaluation platforms for validation.
The project is based on TensorFlow 1.x, which is legacy.

Health Check

Last Commit

5 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days