USC-DS-RelationExtraction  by INK-USC

Relation extraction system using distant supervision

created 8 years ago
419 stars

Top 71.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a system for sentence-level relation extraction using distant supervision, targeting researchers and practitioners in Natural Language Processing. It offers implementations of recent models and processed datasets, enabling the identification of relationships between entity pairs within text.

How It Works

The system leverages distant supervision, automatically labeling entity pairs in a corpus based on existing knowledge bases. It processes raw text, identifies entity mentions, maps them to knowledge base entities, and aligns facts to sentences. The core approach involves learning representations that capture contextual information and relation types, with specific models like CoType (joint extraction of typed entities and relations) and various LSTM/GRU-based architectures implemented.

Quick Start & Requirements

  • Install: pip install pexpect ujson tqdm
  • Dependencies: Python 2.7, Stanford CoreNLP 3.7.0 (with Python wrapper), Eigen 3.2.5. Requires downloading and unzipping Stanford CoreNLP.
  • Setup: Requires downloading and processing datasets (PubMed-BioInfer, NYT-manual, Wiki-KBP). Stanford CoreNLP server setup is necessary.
  • Links: Quick Start, Blog Posts, Data, Benchmark

Highlighted Details

  • Implements CoType, a joint extraction model for typed entities and relations, achieving an F1 of 0.369 on Wiki-KBP.
  • Provides implementations of various baseline models including CNN, PCNN, LSTM, Bi-GRU, and SDP-LSTM.
  • Includes processed datasets (PubMed-BioInfer, NYT-manual, Wiki-KBP) formatted for sentence-level extraction.
  • Offers evaluation scripts for performance measurement and threshold tuning.

Maintenance & Community

  • Key contributors include Xiang Ren, Ellen Wu, Meng Qu, and Frank Xu.
  • The project is associated with the WWW 2017 paper "CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases".

Licensing & Compatibility

  • The repository does not explicitly state a license.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project specifies Python 2.7, which is end-of-life. Stanford CoreNLP 3.7.0 is also an older version. The lack of an explicit license may pose issues for commercial adoption.

Health Check
Last commit

5 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.