OpenNRE is an open-source toolkit for neural relation extraction (NRE), designed to facilitate the extraction of structured knowledge from plain text. It offers a unified framework for implementing and scaling various NRE models, supporting both supervised and distant supervision settings, and integrating conventional neural networks with pre-trained language models like BERT.
How It Works
OpenNRE provides a unified interface for diverse relation extraction models, including CNN-based architectures (CNN, CNN+ATT) and BERT-based approaches. The CNN+ATT model, a key feature, utilizes instance-level attention to effectively combine information from multiple sentences (bags) related to a specific entity pair, aiming for improved performance over standard CNN models.
Quick Start & Requirements
- Install via cloning the repository:
git clone https://github.com/thunlp/OpenNRE.git
- Install dependencies:
pip install -r requirements.txt
- Requires appropriate PyTorch version compatible with your CUDA version.
- Data and pre-trained files must be downloaded separately using provided scripts (e.g.,
bash benchmark/download_fewrel.sh
).
- Official documentation: https://github.com/thunlp/OpenNRE
Highlighted Details
- Supports both supervised and distant supervised relation extraction.
- Implements CNN-ATT model, showing slightly better performance than original paper on NYT10 dataset (AUC: 0.333 vs 0.318, F1: 0.397 vs 0.380).
- Offers pre-trained models for datasets like Wiki80 and TACRED, utilizing CNN and BERT encoders.
- Includes example training scripts for custom model training on user-provided or included datasets.
Maintenance & Community
- Primarily contributed by THU NLP group.
- Part of the larger OpenSKL project.
- Citation: Han et al., EMNLP-IJCNLP 2019.
Licensing & Compatibility
- License details are not explicitly stated in the README, but related OpenSKL sub-projects (e.g., OpenKE embeddings) are MIT licensed. Compatibility for commercial use or closed-source linking requires clarification.
Limitations & Caveats
- The project is described as "Coming soon!" for deployment as a Python package, suggesting it may still be under active development or not yet fully packaged for easy distribution.
- Exclusion of data and pre-train files necessitates manual downloads, adding an extra step for initial setup.