Schema-based-Knowledge-Extraction by yuanxiaosc

SPO triples extraction for knowledge graphs

Created 7 years ago

288 stars

Top 91.4% on SourcePulse

Project Summary

This repository provides an end-to-end joint model for schema-based information extraction, specifically addressing the task of extracting Subject-Predicate-Object (SPO) triples from Chinese text under given schema constraints. It is designed for researchers and practitioners in Natural Language Processing (NLP) and Artificial Intelligence (AI) aiming to advance Chinese information extraction capabilities.

How It Works

The project utilizes a BERT-based approach for joint entity and relation extraction. This end-to-end model processes sentences and schema constraints to directly output SPO triples that conform to the specified schema types for subjects and objects. This integrated approach aims for greater efficiency and accuracy compared to pipeline methods that handle entity recognition and relation extraction separately.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python 3.x, PyTorch, Transformers library. Specific BERT model checkpoints are required.
Data: The SKE dataset (over 430k triples, 210k sentences) is available for download.
Details: https://github.com/yuanxiaosc/Schema-based-Knowledge-Extraction

Highlighted Details

Implements an end-to-end joint model for schema-based SPO extraction.
Leverages BERT for enhanced contextual understanding.
Addresses a large-scale Chinese information extraction dataset (SKE).
Focuses on a specific task defined by schema constraints for SPO triples.

Maintenance & Community

The project appears to be associated with the CCF LIC 2019 competition. Further community or maintenance activity is not explicitly detailed in the README.

Licensing & Compatibility

The repository's license is not specified in the provided README.

Limitations & Caveats

The README focuses on the competition task and dataset, with no explicit mention of model performance benchmarks, limitations, or potential issues with the implementation. The primary focus is on Chinese text.

Health Check

Last Commit

6 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days