ERNIE by thunlp

Language model for knowledge graph augmentation (research paper)

Created 6 years ago

1,418 stars

Top 28.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Thomas Wolf

Cofounder of Hugging Face

Project Summary

ERNIE is an open-source toolkit for augmenting pre-trained language models with knowledge graph representations, targeting researchers and practitioners in Natural Language Processing. It enhances model performance on knowledge-intensive tasks by integrating entity information from knowledge graphs.

How It Works

ERNIE enhances pre-trained language models by incorporating knowledge graph embeddings. This approach aims to imbue models with a richer understanding of entities and their relationships, leading to improved performance on tasks like entity typing and relation classification. The toolkit provides pre-trained ERNIE models and detailed instructions for fine-tuning on specific downstream tasks.

Quick Start & Requirements

Install: pip install tagme (for entity linking in new tasks).
Prerequisites: PyTorch >= 0.4.1, Python 3, tqdm, boto3, requests, apex (for fp16).
Setup: Requires downloading large datasets (Wikidump, anchor2id, pre-trained models, annotated datasets) and potentially pre-training the model, which can take significant time and resources (e.g., 8 NVIDIA-2080Ti for ~1 day).
Links: ACL 2019 Paper, Example Usage

Highlighted Details

Achieves improved accuracy and F1 scores over BERT on entity typing and relation classification tasks.
Supports fine-tuning for tasks like FewRel, TACRED, FIGER, and OpenEntity.
Includes utilities for data preprocessing, pre-training, and evaluation.
Offers a quick-start example for using ERNIE with masked language modeling.

Maintenance & Community

ERNIE is a sub-project of OpenSKL.
The project is associated with THU (Tsinghua University).
Citation details are provided for the ACL 2019 paper.

Licensing & Compatibility

The ERNIE toolkit itself is not explicitly licensed in the README.
OpenKE resources (knowledge graph embeddings) are provided under the MIT license.
Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

The setup process for pre-training is resource-intensive and time-consuming. The README does not explicitly state the license for the ERNIE toolkit code itself, which may impact commercial use. Entity linking for new tasks relies on external tools like TAGME.

Health Check

Last Commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days