relik  by SapienzaNLP

Academic research paper for fast entity linking and relation extraction

created 1 year ago
441 stars

Top 68.9% on sourcepulse

GitHubView on GitHub
Project Summary

ReLiK is a Python library for fast and accurate Entity Linking (EL) and Relation Extraction (RE), designed for researchers and practitioners. It offers a modular architecture with separate retriever and reader components, enabling flexible deployment and fine-tuning, and provides pre-trained models for various tasks and datasets.

How It Works

ReLiK employs a two-stage approach: a retriever identifies relevant documents or passages, and a reader extracts entities and relations from these candidates. This retrieval-augmented generation strategy allows for efficient processing of large knowledge bases and complex texts. The library is built using PyTorch Lightning, facilitating scalable training and inference.

Quick Start & Requirements

  • Installation: pip install relik or pip install relik[all] for full dependencies. GPU support is recommended, with specific instructions for CUDA 12.1 and FAISS installation provided.
  • Requirements: Python 3.10+, PyTorch, Hugging Face Transformers. GPU with CUDA is highly recommended for performance.
  • Resources: Pre-trained models are available on Hugging Face. Training requires significant computational resources.
  • Links: Hugging Face Collection, Colab Notebook

Highlighted Details

  • Achieves state-of-the-art performance on Entity Linking benchmarks like AIDA, outperforming previous methods in speed and accuracy.
  • Offers specialized models for Entity Linking, Relation Extraction, and Closed Information Extraction (CIE).
  • Provides a Command Line Interface (CLI) for serving models via FastAPI and performing batch inference.
  • Includes comprehensive scripts for data preprocessing, model training, and evaluation on standard datasets.

Maintenance & Community

The project is associated with SapienzaNLP and has active development, indicated by its ACL 2024 publication. Pre-trained models are hosted on Hugging Face.

Licensing & Compatibility

The software and data are licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0. This license restricts commercial use and requires sharing modifications under the same terms.

Limitations & Caveats

The CC BY-NC-SA 4.0 license prohibits commercial use. Some datasets, like AIDA, are not publicly available, requiring users to obtain them separately. The preprocessing scripts for the NYT dataset mention potential duplicate triplets due to legacy format handling.

Health Check
Last commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
5
Star History
27 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
1 more.

BioGPT by microsoft

0.1%
4k
BioGPT is a generative pre-trained transformer for biomedical text
created 3 years ago
updated 1 year ago
Feedback? Help us improve.