DeepPurpose by kexinhuang12345

Deep learning toolkit for drug discovery & bioinformatics

Created 5 years ago

1,115 stars

Top 34.3% on SourcePulse

View on GitHub

2 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Phil Wang

Prolific Research Paper Implementer

Project Summary

DeepPurpose is a PyTorch-based deep learning toolkit designed for bioinformatics and drug discovery tasks. It simplifies complex molecular modeling and prediction, enabling researchers to perform Drug-Target Interaction (DTI), Drug Property Prediction, Protein-Protein Interaction (PPI), and Protein Function Prediction with minimal code. The library offers a wide range of molecular encodings and pre-trained models, facilitating applications like drug repurposing and virtual screening.

How It Works

DeepPurpose leverages a flexible architecture that supports over 15 drug and protein encodings, including traditional cheminformatics fingerprints, CNNs, Transformers, and Graph Neural Networks (GNNs) via DGL. This allows users to combine various encoding strategies for diverse modeling tasks. The toolkit provides streamlined data loading, preprocessing, model training, and evaluation, abstracting away much of the boilerplate code typically associated with deep learning in bioinformatics.

Quick Start & Requirements

Installation: pip install DeepPurpose or build from source via conda env create -f environment.yml.
Prerequisites: Python 3.6+ (recommended), Conda. GPU support is recommended for performance.
Resources: Datasets can be large; download scripts are provided.
Documentation: https://deeppurpose.readthedocs.io/
Demos: Extensive demos available in the DEMO folder and on GitHub.

Highlighted Details

Supports 15+ drug and protein encodings, including novel combinations like GNNs with Transformers.
Offers simplified "10 lines of code" frameworks for DTI, Drug Property, DDI, PPI, and Protein Function prediction.
Includes data loaders for numerous public benchmarks (BindingDB, DAVIS, KIBA) and repurposing datasets.
Provides over 10 pre-trained models for various tasks and datasets.

Maintenance & Community

The project is actively seeking user feedback and contributions. Contact information for developers is provided.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The README notes that pre-trained models cover limited datasets and may not generalize perfectly to new, unseen proteins. Outputs should be manually inspected by experts before wet-lab validation, as the work is still under active development.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days