ERNIE  by thunlp

Language model for knowledge graph augmentation (research paper)

created 6 years ago
1,418 stars

Top 29.3% on sourcepulse

GitHubView on GitHub
Project Summary

ERNIE is an open-source toolkit for augmenting pre-trained language models with knowledge graph representations, targeting researchers and practitioners in Natural Language Processing. It enhances model performance on knowledge-intensive tasks by integrating entity information from knowledge graphs.

How It Works

ERNIE enhances pre-trained language models by incorporating knowledge graph embeddings. This approach aims to imbue models with a richer understanding of entities and their relationships, leading to improved performance on tasks like entity typing and relation classification. The toolkit provides pre-trained ERNIE models and detailed instructions for fine-tuning on specific downstream tasks.

Quick Start & Requirements

  • Install: pip install tagme (for entity linking in new tasks).
  • Prerequisites: PyTorch >= 0.4.1, Python 3, tqdm, boto3, requests, apex (for fp16).
  • Setup: Requires downloading large datasets (Wikidump, anchor2id, pre-trained models, annotated datasets) and potentially pre-training the model, which can take significant time and resources (e.g., 8 NVIDIA-2080Ti for ~1 day).
  • Links: ACL 2019 Paper, Example Usage

Highlighted Details

  • Achieves improved accuracy and F1 scores over BERT on entity typing and relation classification tasks.
  • Supports fine-tuning for tasks like FewRel, TACRED, FIGER, and OpenEntity.
  • Includes utilities for data preprocessing, pre-training, and evaluation.
  • Offers a quick-start example for using ERNIE with masked language modeling.

Maintenance & Community

  • ERNIE is a sub-project of OpenSKL.
  • The project is associated with THU (Tsinghua University).
  • Citation details are provided for the ACL 2019 paper.

Licensing & Compatibility

  • The ERNIE toolkit itself is not explicitly licensed in the README.
  • OpenKE resources (knowledge graph embeddings) are provided under the MIT license.
  • Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

The setup process for pre-training is resource-intensive and time-consuming. The README does not explicitly state the license for the ERNIE toolkit code itself, which may impact commercial use. Entity linking for new tasks relies on external tools like TAGME.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
11 more.

sentence-transformers by UKPLab

0.2%
17k
Framework for text embeddings, retrieval, and reranking
created 6 years ago
updated 3 days ago
Feedback? Help us improve.