Discover and explore top open-source AI tools and projects—updated daily.
Benchmarking framework for protein representation learning
Top 99.4% on SourcePulse
This repository provides a comprehensive benchmarking framework for protein representation learning, targeting researchers and practitioners in structural bioinformatics and machine learning. It offers a unified platform for evaluating various featurization schemes, datasets, and models, enabling reproducible research and facilitating the development of new protein representation learning methods.
How It Works
ProteinWorkshop employs a modular design, allowing users to combine different components for pre-training and downstream tasks. It supports both invariant and equivariant graph neural networks, along with a rich set of featurization schemes that capture varying levels of structural detail. The framework automates data downloading and processing, and its configuration-driven approach, powered by Hydra and Weights & Biases, simplifies experiment management and hyperparameter tuning.
Quick Start & Requirements
pip install proteinworkshop
(for library usage) or clone and pip install -e .
(for development).workshop install pyg
. Linux-like systems with NVIDIA CUDA are officially supported; Windows and macOS are not.workshop download <DATASET_NAME>
. The PDB dataset requires ~23 GB.Highlighted Details
Maintenance & Community
The project is associated with the ICLR 2024 paper "Evaluating Representation Learning on the Protein Structure Universe." Community interaction channels are not explicitly mentioned in the README.
Licensing & Compatibility
Licenses vary by dataset, including GPL-3.0, CC-BY 4.0, MIT, Apache 2.0, and CC0 1.0. The GPL-3.0 license for some datasets may impose copyleft restrictions on derivative works. Commercial use compatibility depends on the specific dataset licenses.
Limitations & Caveats
The framework officially supports only Linux-like systems with NVIDIA CUDA. Windows and macOS are not officially supported, which may hinder adoption on these platforms. Some datasets have large storage requirements (up to 1 TB).
4 months ago
Inactive