Protein language model for protein-related tasks
Top 59.5% on sourcepulse
ProteinBERT is a deep learning model for protein sequence analysis, offering state-of-the-art performance on various benchmarks. It's designed for researchers and developers working with protein data, enabling rapid training of protein predictors and feature extraction for downstream tasks.
How It Works
ProteinBERT is inspired by BERT but incorporates innovations like global-attention layers with linear complexity, allowing it to process extremely long protein sequences efficiently. It uses a self-supervised pretraining scheme combining language modeling with Gene Ontology (GO) annotation prediction. The model can accept protein sequences and optional GO annotations as input, producing both local and global representations.
Quick Start & Requirements
python setup.py install
after cloning the repository and initializing submodules.Highlighted Details
Maintenance & Community
nadavbra/protein_bert
.lucidrains/protein-bert-pytorch
.Licensing & Compatibility
Limitations & Caveats
setup.py
script.4 months ago
Inactive