Knowledgeable language model pretrained with document links
Top 68.7% on sourcepulse
LinkBERT enhances transformer-based language models by incorporating knowledge from document links, such as hyperlinks and citations, into the pre-training process. This approach aims to improve performance on knowledge-intensive and cross-document NLP tasks for researchers and practitioners in general and biomedical domains.
How It Works
LinkBERT extends BERT by processing linked documents within the same model context during pre-training, unlike BERT's single-document approach. This allows it to capture inter-document knowledge, leading to improved performance on tasks requiring broad contextual understanding and factual recall.
Quick Start & Requirements
conda create -n linkbert python=3.8
), activate it (source activate linkbert
), and install dependencies (pip install torch==1.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
, pip install transformers==4.9.1 datasets==1.11.0 fairscale==0.4.0 wandb sklearn seqeval
).michiyasunaga/LinkBERT-base
, michiyasunaga/LinkBERT-large
, michiyasunaga/BioLinkBERT-base
, michiyasunaga/BioLinkBERT-large
).Highlighted Details
Maintenance & Community
The project is associated with ACL 2022 and provides a Codalab worksheet for reproducibility. No specific community channels or active maintenance indicators are mentioned in the README.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project requires specific older versions of PyTorch (1.10.1) and Transformers (4.9.1), which may pose compatibility challenges with current ecosystems. The license is not specified, potentially impacting commercial adoption.
3 years ago
Inactive