DeepPurpose  by kexinhuang12345

Deep learning toolkit for drug discovery & bioinformatics

created 5 years ago
1,068 stars

Top 36.0% on sourcepulse

GitHubView on GitHub
Project Summary

DeepPurpose is a PyTorch-based deep learning toolkit designed for bioinformatics and drug discovery tasks. It simplifies complex molecular modeling and prediction, enabling researchers to perform Drug-Target Interaction (DTI), Drug Property Prediction, Protein-Protein Interaction (PPI), and Protein Function Prediction with minimal code. The library offers a wide range of molecular encodings and pre-trained models, facilitating applications like drug repurposing and virtual screening.

How It Works

DeepPurpose leverages a flexible architecture that supports over 15 drug and protein encodings, including traditional cheminformatics fingerprints, CNNs, Transformers, and Graph Neural Networks (GNNs) via DGL. This allows users to combine various encoding strategies for diverse modeling tasks. The toolkit provides streamlined data loading, preprocessing, model training, and evaluation, abstracting away much of the boilerplate code typically associated with deep learning in bioinformatics.

Quick Start & Requirements

  • Installation: pip install DeepPurpose or build from source via conda env create -f environment.yml.
  • Prerequisites: Python 3.6+ (recommended), Conda. GPU support is recommended for performance.
  • Resources: Datasets can be large; download scripts are provided.
  • Documentation: https://deeppurpose.readthedocs.io/
  • Demos: Extensive demos available in the DEMO folder and on GitHub.

Highlighted Details

  • Supports 15+ drug and protein encodings, including novel combinations like GNNs with Transformers.
  • Offers simplified "10 lines of code" frameworks for DTI, Drug Property, DDI, PPI, and Protein Function prediction.
  • Includes data loaders for numerous public benchmarks (BindingDB, DAVIS, KIBA) and repurposing datasets.
  • Provides over 10 pre-trained models for various tasks and datasets.

Maintenance & Community

The project is actively seeking user feedback and contributions. Contact information for developers is provided.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The README notes that pre-trained models cover limited datasets and may not generalize perfectly to new, unseen proteins. Outputs should be manually inspected by experts before wet-lab validation, as the work is still under active development.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
38 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.