kb  by allenai

Knowledge enhanced contextual embeddings via BERT

created 6 years ago
375 stars

Top 76.8% on sourcepulse

GitHubView on GitHub
Project Summary

KnowBert enhances BERT's contextual word representations by integrating knowledge from external knowledge bases like Wikipedia and WordNet. This project provides pretrained models, training, and evaluation scripts for researchers and practitioners aiming to improve NLP tasks through knowledge-enhanced language models.

How It Works

KnowBert embeds knowledge bases into BERT by introducing a knowledge-aware attention mechanism. This allows the model to attend to relevant entities and relations from the knowledge base, enriching word representations with structured semantic information. This approach aims to improve performance on tasks requiring world knowledge and semantic understanding.

Quick Start & Requirements

  • Install: pip install -r requirements.txt, pip install --editable .
  • Prerequisites: Python 3.6.7, PyTorch 1.2.0, NLTK (wordnet), spaCy (en_core_web_sm).
  • Setup: Requires downloading pretrained models and potentially datasets for evaluation.
  • Docs: Programmatic Usage, Evaluation

Highlighted Details

  • Offers pretrained models for WordNet, Wikipedia, and combined W+W knowledge bases.
  • Provides scripts for intrinsic evaluations (perplexity, KG probe, WSD) and downstream task fine-tuning (relation extraction, entity typing, classification).
  • Detailed instructions for reproducing results and fine-tuning KnowBert on custom datasets.
  • Includes extensive guidance on pretraining KnowBert from scratch, including knowledge base preparation and model training steps.

Maintenance & Community

This project originates from Allen Institute for AI (AI2). The primary contributors are listed in the citation. No explicit community channels (Discord/Slack) are mentioned.

Licensing & Compatibility

The repository does not explicitly state a license. The use of allennlp suggests compatibility with its Apache 2.0 license, but the project's specific licensing requires verification for commercial use.

Limitations & Caveats

Requires specific older versions of PyTorch (1.2.0) and Python (3.6.7), which may pose compatibility challenges with modern environments. The pretraining process is complex and resource-intensive.

Health Check
Last commit

5 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
11 more.

sentence-transformers by UKPLab

0.2%
17k
Framework for text embeddings, retrieval, and reranking
created 6 years ago
updated 3 days ago
Feedback? Help us improve.