relik  by SapienzaNLP

Academic research paper for fast entity linking and relation extraction

Created 1 year ago
450 stars

Top 66.9% on SourcePulse

GitHubView on GitHub
Project Summary

ReLiK is a Python library for fast and accurate Entity Linking (EL) and Relation Extraction (RE), designed for researchers and practitioners. It offers a modular architecture with separate retriever and reader components, enabling flexible deployment and fine-tuning, and provides pre-trained models for various tasks and datasets.

How It Works

ReLiK employs a two-stage approach: a retriever identifies relevant documents or passages, and a reader extracts entities and relations from these candidates. This retrieval-augmented generation strategy allows for efficient processing of large knowledge bases and complex texts. The library is built using PyTorch Lightning, facilitating scalable training and inference.

Quick Start & Requirements

  • Installation: pip install relik or pip install relik[all] for full dependencies. GPU support is recommended, with specific instructions for CUDA 12.1 and FAISS installation provided.
  • Requirements: Python 3.10+, PyTorch, Hugging Face Transformers. GPU with CUDA is highly recommended for performance.
  • Resources: Pre-trained models are available on Hugging Face. Training requires significant computational resources.
  • Links: Hugging Face Collection, Colab Notebook

Highlighted Details

  • Achieves state-of-the-art performance on Entity Linking benchmarks like AIDA, outperforming previous methods in speed and accuracy.
  • Offers specialized models for Entity Linking, Relation Extraction, and Closed Information Extraction (CIE).
  • Provides a Command Line Interface (CLI) for serving models via FastAPI and performing batch inference.
  • Includes comprehensive scripts for data preprocessing, model training, and evaluation on standard datasets.

Maintenance & Community

The project is associated with SapienzaNLP and has active development, indicated by its ACL 2024 publication. Pre-trained models are hosted on Hugging Face.

Licensing & Compatibility

The software and data are licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0. This license restricts commercial use and requires sharing modifications under the same terms.

Limitations & Caveats

The CC BY-NC-SA 4.0 license prohibits commercial use. Some datasets, like AIDA, are not publicly available, requiring users to obtain them separately. The preprocessing scripts for the NYT dataset mention potential duplicate triplets due to legacy format handling.

Health Check
Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.