DeepPurpose  by kexinhuang12345

Deep learning toolkit for drug discovery & bioinformatics

Created 5 years ago
1,083 stars

Top 35.1% on SourcePulse

GitHubView on GitHub
Project Summary

DeepPurpose is a PyTorch-based deep learning toolkit designed for bioinformatics and drug discovery tasks. It simplifies complex molecular modeling and prediction, enabling researchers to perform Drug-Target Interaction (DTI), Drug Property Prediction, Protein-Protein Interaction (PPI), and Protein Function Prediction with minimal code. The library offers a wide range of molecular encodings and pre-trained models, facilitating applications like drug repurposing and virtual screening.

How It Works

DeepPurpose leverages a flexible architecture that supports over 15 drug and protein encodings, including traditional cheminformatics fingerprints, CNNs, Transformers, and Graph Neural Networks (GNNs) via DGL. This allows users to combine various encoding strategies for diverse modeling tasks. The toolkit provides streamlined data loading, preprocessing, model training, and evaluation, abstracting away much of the boilerplate code typically associated with deep learning in bioinformatics.

Quick Start & Requirements

  • Installation: pip install DeepPurpose or build from source via conda env create -f environment.yml.
  • Prerequisites: Python 3.6+ (recommended), Conda. GPU support is recommended for performance.
  • Resources: Datasets can be large; download scripts are provided.
  • Documentation: https://deeppurpose.readthedocs.io/
  • Demos: Extensive demos available in the DEMO folder and on GitHub.

Highlighted Details

  • Supports 15+ drug and protein encodings, including novel combinations like GNNs with Transformers.
  • Offers simplified "10 lines of code" frameworks for DTI, Drug Property, DDI, PPI, and Protein Function prediction.
  • Includes data loaders for numerous public benchmarks (BindingDB, DAVIS, KIBA) and repurposing datasets.
  • Provides over 10 pre-trained models for various tasks and datasets.

Maintenance & Community

The project is actively seeking user feedback and contributions. Contact information for developers is provided.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The README notes that pre-trained models cover limited datasets and may not generalize perfectly to new, unseen proteins. Outputs should be manually inspected by experts before wet-lab validation, as the work is still under active development.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.