deeptype  by openai

Entity linking via neural type system evolution (research paper code)

created 7 years ago
650 stars

Top 52.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code for DeepType, a method for multilingual entity linking using neural type systems. It enables the discovery and evolution of task-specific constraints to guide neural networks in understanding documents, achieving state-of-the-art accuracy in entity recognition. The target audience includes researchers and engineers working on natural language processing and information extraction.

How It Works

DeepType leverages type systems as a strong signal for natural language understanding. By constraining neural network predictions to semantically valid types, it significantly reduces the search space for entity recognition. The approach involves learning these type systems from data, allowing them to evolve and adapt to specific tasks, thereby improving accuracy on benchmark datasets like CoNLL and TAC KBP.

Quick Start & Requirements

  • Installation: pip3 install -r requirements.txt and pip3 install wikidata_linker_utils_src/ (additional Fedora packages redhat-rpm-config and gcc-c++ may be needed).
  • Prerequisites: Python 3, CUDA (for GPU training), and significant disk space for data extraction.
  • Data Extraction: Requires running ./extraction/full_preprocess.sh to obtain Wikipedia-to-Wikidata mappings and anchor tags for multiple languages.
  • Documentation: Blog post and paper are linked in the README.

Highlighted Details

  • Achieves 98.6-99% accuracy on CoNLL (YAGO) and TAC KBP 2010 with oracle-provided types.
  • Supports type system evolution using methods like CEM, greedy, beam, and genetic algorithms.
  • Includes tools for creating trainable neural type systems from evolved type solutions.
  • Offers pre-written configurations for training on multiple languages.

Maintenance & Community

The project is marked as "Archive" and no updates are expected. It was authored by Jonathan Raiman & Olivier Raiman.

Licensing & Compatibility

The repository does not explicitly state a license. Given the lack of a specified license, commercial use and linking with closed-source projects are not recommended without explicit permission.

Limitations & Caveats

The project is archived, indicating no ongoing development or support. The data extraction process is extensive and requires significant disk space. The setup and training procedures involve multiple complex steps and configuration files.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.