Entity-Relation-Extraction  by yuanxiaosc

TensorFlow code for entity-relation extraction

created 6 years ago
1,229 stars

Top 32.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a pipeline-based solution for entity and relation extraction, specifically tailored for schema-constrained knowledge extraction tasks. It is designed for researchers and practitioners working with Chinese text data, offering a practical implementation based on TensorFlow and BERT for the 2019 Language and Intelligence Technology Competition.

How It Works

The system employs a two-stage pipeline. First, a multi-label classification model identifies potential relationship types within a sentence. Subsequently, a sequence labeling model, taking the sentence and predicted relationship types as input, identifies and labels the entities (subject and object) corresponding to those relationships. This approach allows for a structured extraction of (Subject, Predicate, Object) triples that adhere to predefined schemas.

Quick Start & Requirements

  • Install: Python 3.6+, TensorFlow 1.12.0+. Download and place a Chinese BERT pre-trained model in the pretrained_model directory. Download competition data and place it in ./raw_data/.
  • Data: Requires specific training, development, and schema files from the 2019 Language and Intelligence Technology Competition. Official data download links are no longer active; contact provided email for assistance.
  • Training: Separate commands are provided for training the relation classification model (run_predicate_classification.py) and the sequence labeling model (run_sequnce_labeling.py).
  • Prediction: Commands for inference using trained models are also available.
  • Resources: Requires a Chinese BERT model checkpoint.

Highlighted Details

  • Achieved 87.1% F1 score on the test set in a competition setting.
  • Implements a pipeline approach combining relation classification and sequence labeling.
  • Utilizes a large-scale Chinese dataset (SKE) with over 430,000 triples and 210,000 sentences.
  • Provides detailed training and prediction scripts for both components.

Maintenance & Community

  • The project is associated with the 2019 Language and Intelligence Technology Competition.
  • Contact information (email) is provided for data-related inquiries.
  • Links to competition forums and related reports are included.

Licensing & Compatibility

  • The repository does not explicitly state a license.
  • TensorFlow 1.12.0+ is a requirement, which is compatible with commercial use.

Limitations & Caveats

  • Official data download links are no longer active, potentially hindering setup.
  • The provided test data lacks labels, necessitating submission to official evaluation platforms for validation.
  • The project is based on TensorFlow 1.x, which is legacy.
Health Check
Last commit

5 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.