Python library for generating protein embeddings from sequences
Top 63.4% on sourcepulse
This project provides a unified interface and reproducible workflows for generating protein embeddings from sequences using various deep learning models. It targets researchers and developers needing to leverage these embeddings for downstream tasks like transfer learning, visualization, and property prediction, simplifying complex model integration and offering abstraction for resource management.
How It Works
The library offers a pipeline that converts protein sequences into per-amino-acid or per-sequence embeddings. It supports a wide array of pre-trained models (e.g., SeqVec, ProtTrans, UniRep, ESM) and provides tools for dimensionality reduction (UMAP, t-SNE) and visualization of these embeddings. The pipeline abstracts away model-specific complexities, including CUDA out-of-memory errors, and offers robust error handling.
Quick Start & Requirements
pip install bio-embeddings[all]
or via Docker (ghcr.io/bioembeddings/bio_embeddings:v0.1.6
).mmseqs2
is required for mmseqs_search
protocol.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 years ago
Inactive