Chemformer by MolecularAI

Transformer model for chemistry tasks

Created 4 years ago

280 stars

Top 93.0% on SourcePulse

Project Summary

Chemformer provides pre-trained BART transformer models for molecular tasks, specifically designed for chemists and researchers in drug discovery and chemical synthesis. It aims to improve generalization, performance, training speed, and validity on downstream tasks by pre-training on molecular SMILES strings using a denoising objective.

How It Works

Chemformer leverages a BART transformer architecture pre-trained on a large corpus of molecular SMILES strings with a denoising objective. This pre-training allows the model to learn rich representations of molecular structures and chemical transformations. The project offers implementations for various downstream tasks, including reaction prediction, retrosynthetic prediction, molecular optimization, and molecular property prediction, utilizing seq2seq and disconnection-aware approaches.

Quick Start & Requirements

Installation: Clone the repository, create a Conda environment (conda env create -f env-dev.yml), activate it (conda activate chemformer), and install dependencies (poetry install).
Prerequisites: Linux, Windows, or macOS with Python 3.7 and Anaconda/Miniconda. A workaround for GLIBCXX_3.4.21 errors is provided by adjusting LD_LIBRARY_PATH.
Usage: Fine-tune models by downloading pre-trained checkpoints and datasets, then updating and running provided shell scripts (e.g., fine_tune.sh). Configuration is managed via Hydra.
Resources: Links to public models and datasets are available.

Highlighted Details

Pre-trained BART transformer for molecular SMILES.
Supports reaction prediction, retrosynthesis, molecular optimization, and property prediction.
Implements seq2seq and disconnection-aware retrosynthesis.
Includes FastAPI services for predictions and integration with tools like AiZynthFinder.
Offers custom callbacks and scorers for flexible evaluation.

Maintenance & Community

The project welcomes contributions via issues and pull requests. Support is provided through the issue tracker, with limited time for direct support questions.

Licensing & Compatibility

The software is licensed under the MIT license, allowing for free use and commercial compatibility.

Limitations & Caveats

Users may need to update checkpoints for new versions. The project relies on specific versions of dependencies, and environment setup might require attention to LD_LIBRARY_PATH for certain systems.

Chemformer by MolecularAI

Explore Similar Projects

ProteinWorkshop by a-r-j

dots.llm1 by rednote-hilab

torch-molecule by liugangcode

awesome-pretrain-on-molecules by junxia97

awesome-protein-representation-learning by LirongWu

SwissArmyTransformer by THUDM

tape by songlab-cal

papers_for_protein_design_using_DL by Peldom

Machine-learning-for-proteins by yangkky

reformer-pytorch by lucidrains

alphafold2 by lucidrains

lightseq by bytedance